Assume we’re operating in a business that is delivering products from terminals to stations. *(These products, of course, are then sold at stations but that’s not the topic of this post.)* We have a large dataset showing how much of each product was delivered to each station from each terminal. We are interested in showing, for each station, how much of each product, in absolute terms and percentagewise, was delivered from a station’s top two terminals. We are not interested in smaller terminals (third, fourth etc.), so we want to accurately show the summary statistics for the secondary terminal and reallocate the remaining volume to the primary terminal.

I’ll illustrate the approach on the following dataset in an Excel table called **Data**:

Station |
Product |
Terminal |
Volume |

1 | A | T1 | 110102 |

1 | A | T2 | 97448 |

1 | A | T3 | 107668 |

1 | B | T1 | 97631 |

1 | C | T1 | 96481 |

2 | A | T4 | 100970 |

2 | B | T3 | 104833 |

2 | B | T4 | 111438 |

2 | B | T5 | 101601 |

3 | A | T2 | 96478 |

3 | A | T4 | 98585 |

3 | B | T1 | 106350 |

3 | C | T3 | 102795 |

3 | C | T5 | 102129 |

4 | A | T1 | 101276 |

4 | B | T1 | 99078 |

4 | C | T1 | 96343 |

We have four stations (1, 2, 3 and 4) where we deliver three products (A, B and C) from five terminals (T1 through T5).

The resulting table should look in the following way:

Station |
Product |
Terminal 1 |
Terminal 1 Vol |
Terminal 1 % |
Terminal 2 |
Terminal 2 Vol |
Terminal 2 % |

1 | A | ||||||

1 | B | ||||||

1 | C | ||||||

2 | A | ||||||

2 | B | ||||||

3 | A | ||||||

3 | B | ||||||

3 | C | ||||||

4 | A | ||||||

4 | B | ||||||

4 | C |

Terminal 1 is the primary terminal for a given combination of station and product and Terminal 2 is the secondary terminal.

There are, of course, several ways to tackle this problem. I’ll illustrate an approach that does not create any intermediate tables. But for it to work, we’ll have to add a few columns to the data table. These are:

**TotalVol**column: shows the total volume of a product delivered to a station from all terminals; the formula is**=SUMIFS(**[Volume]**,**[Station]**,**[@Station]**,**[Product]**,**[@Product]**)****MaxVol**column: shows the maximum volume of a product delivered to a station across terminals (this is the quantity delivered from the primary terminal before adjustments); the formula is**{=MAX(IF((**[Station]**=**[@Station]**) * (**[Product]**=**[@Product]**) = 1,**[Volume]**, “”))}**. Note that this is an array formula that applies an advanced on-the-fly filter to calculate the maximum value from a correct subset of station–product combinations**IsPrimary**column: determines whether a given row contains data about the primary terminal of a given station–product combination; the formula is**=(**[@Volume]**=**[@MaxVol]**)****SMaxVol**column: shows the second maximum volume for a given station–product combination across terminals (this is the quantity delivered from the secondary terminal); the formula is**{=MAX(IF((**[Station]**=**[@Station]**) * (**[Product]**=**[@Product]**) ***([Volume]**<**[@MaxVol]**) = 1,**[Volume]**, “”))}**. This is the same formula as for**MaxVol**column but an additional filtering condition is that we’re only taking terminals where volume delivered is less than maximum volume delivered for a station–product combination. In other words, we’re computing the maximum volume across all non-primary terminals**IsSecondary**column: determines whether a given row contains data about the secondary terminal of a given station–product combination; the formula is**=(**[@Volume]**=**[@SMaxVol]**)****UID**column: a column that will uniquely identify a combination of station and product, which we’ll refer to in the results table; the formula is**=**[@Station]**& “-” &**[@Product]**UID-Primary**column: a column that will uniquely identify a row containing the primary terminal for a given station–product combination; the formula is**=**[@UID]**& “-” &**[@IsPrimary]**UID-Secondary**column: a column that will uniquely identify a row containing the secondary terminal for a given station–product combination; the formula is**=**[@UID]**& “-” &**[@IsSecondary]

After these manipulations, the **Data** table becomes:

Then the formulae in the results table are quite straightforward. Just two remarks are in place. Firstly, recall that we have to adjust the volumes from the primary terminals when deliveries are made from more than two terminals. Secondly, we have to wrap the formulae for the secondary terminal in the **IFERROR **function, because not everywhere we have a secondary terminal, and without wrapping, we’ll have **#ERROR!** in our results table. Here the formulae are:

**Terminal 1**column:**=INDEX(**Data[Terminal],**MATCH(**[@Station]**& “-” &**[@Product]**& “-TRUE”,**Data[UID-Primary]**, 0))****Terminal 1 Vol**column:**=INDEX(**Data[TotalVol],**MATCH(**[@Station]**& “-” &**[@Product]**& “-TRUE”,**Data[UID-Primary]**, 0)) – INDEX(**Data[SMaxVol],**MATCH(**[@Station]**& “-” &**[@Product]**& “-TRUE”,**Data[UID-Primary], 0))**Terminal 1 %**column:**=**[@[Terminal 1 Vol]]**/ INDEX(**Data[TotalVol],**MATCH(**[@Station]**& “-” &**[@Product]**& “-TRUE”,**Data[UID-Primary], 0))**Terminal 2**column:**=IFERROR(INDEX(**Data[Terminal],**MATCH(**[@Station]**& “-” &**[@Product]**& “-TRUE”,**Data[UID-Secondary]**, 0)), “”)****Terminal 2 Vol**column:**=IFERROR(INDEX(**Data[SMaxVol],**MATCH(**[@Station]**& “-” &**[@Product]**& “-TRUE”,**Data[UID-Secondary]**, 0)), “”)****Terminal 2%**column:**=IFERROR(**[@[Terminal 2 Vol]]**/ INDEX(**Data[TotalVol],**MATCH(**[@Station]**& “-” &**[@Product]**& “-TRUE”,**Data[UID-Secondary]**, 0)), “”)**

Here’s how the results table looks like: