We’re all frequently asked to compute statistics of micro-level data at group level. If you have to base statistics on micro-level data then there’s not much of a problem. But what if you have to base some statistics on micro-level data and some at subgroup-level data with too many subgroups? How to do that without creating additional tables?

You have a large table called **Data** with shipment-level data. It shows which shipment was delivered from which terminal (A or B) on which date:

Shipment ID |
Terminal |
Date |

S1 | B | 22-Nov-16 |

S2 | B | 18-Nov-16 |

S3 | A | 8-Nov-16 |

S4 | A | 4-Nov-16 |

S5 | A | 11-Nov-16 |

S6 | B | 21-Nov-16 |

S7 | A | 1-Nov-16 |

S8 | B | 30-Nov-16 |

S9 | A | 19-Nov-16 |

S10 | A | 21-Nov-16 |

S11 | A | 30-Nov-16 |

S12 | A | 9-Nov-16 |

S13 | B | 26-Nov-16 |

S14 | A | 3-Nov-16 |

S15 | B | 29-Nov-16 |

S16 | B | 4-Nov-16 |

S17 | A | 21-Nov-16 |

S18 | A | 15-Nov-16 |

S19 | A | 28-Nov-16 |

S20 | A | 18-Nov-16 |

S21 | A | 23-Nov-16 |

S22 | B | 14-Nov-16 |

S23 | A | 14-Nov-16 |

S24 | B | 12-Nov-16 |

S25 | A | 13-Nov-16 |

S26 | A | 6-Nov-16 |

S27 | A | 2-Nov-16 |

S28 | A | 27-Nov-16 |

S29 | B | 10-Nov-16 |

S30 | A | 1-Nov-16 |

S31 | A | 22-Nov-16 |

S32 | A | 19-Nov-16 |

S33 | B | 12-Nov-16 |

S34 | A | 12-Nov-16 |

S35 | A | 11-Nov-16 |

S36 | A | 27-Nov-16 |

S37 | A | 20-Nov-16 |

S38 | B | 24-Nov-16 |

S39 | B | 17-Nov-16 |

S40 | A | 5-Nov-16 |

S41 | A | 10-Nov-16 |

S42 | B | 15-Nov-16 |

S43 | A | 27-Nov-16 |

S44 | A | 24-Nov-16 |

S45 | B | 16-Nov-16 |

S46 | A | 9-Nov-16 |

S47 | A | 17-Nov-16 |

S48 | A | 20-Nov-16 |

S49 | B | 16-Nov-16 |

S50 | A | 9-Nov-16 |

S51 | B | 11-Nov-16 |

S52 | B | 21-Nov-16 |

S53 | B | 29-Nov-16 |

S54 | A | 15-Nov-16 |

S55 | B | 12-Nov-16 |

S56 | A | 12-Nov-16 |

S57 | B | 5-Nov-16 |

S58 | B | 10-Nov-16 |

S59 | A | 29-Nov-16 |

S60 | A | 26-Nov-16 |

S61 | A | 30-Nov-16 |

S62 | A | 3-Nov-16 |

S63 | A | 20-Nov-16 |

S64 | A | 30-Nov-16 |

S65 | A | 14-Nov-16 |

S66 | A | 11-Nov-16 |

S67 | A | 7-Nov-16 |

S68 | A | 12-Nov-16 |

S69 | A | 5-Nov-16 |

S70 | B | 17-Nov-16 |

S71 | A | 5-Nov-16 |

S72 | A | 27-Nov-16 |

S73 | B | 12-Nov-16 |

S74 | A | 21-Nov-16 |

S75 | A | 6-Nov-16 |

S76 | B | 13-Nov-16 |

S77 | A | 22-Nov-16 |

S78 | A | 5-Nov-16 |

S79 | B | 21-Nov-16 |

S80 | A | 15-Nov-16 |

S81 | A | 16-Nov-16 |

S82 | A | 6-Nov-16 |

S83 | A | 18-Nov-16 |

S84 | A | 21-Nov-16 |

S85 | A | 3-Nov-16 |

S86 | B | 22-Nov-16 |

S87 | A | 27-Nov-16 |

S88 | A | 14-Nov-16 |

S89 | B | 1-Nov-16 |

S90 | B | 9-Nov-16 |

S91 | A | 27-Nov-16 |

S92 | B | 22-Nov-16 |

S93 | A | 7-Nov-16 |

S94 | A | 18-Nov-16 |

S95 | A | 27-Nov-16 |

S96 | A | 2-Nov-16 |

S97 | B | 12-Nov-16 |

S98 | A | 30-Nov-16 |

S99 | A | 15-Nov-16 |

S100 | A | 4-Nov-16 |

You’re asked to compute the total number of shipments as well as minimum, maximum and average number of shipments grouped by terminal and weekday. In other words, you have to fill the following table named **Summary**:

Terminal |
Weekday |
Total Shps |
Min Shps |
Max Shps |
Avg Shps |

A | Monday | ||||

A | Tuesday | ||||

A | Wednesday | ||||

A | Thursday | ||||

A | Friday | ||||

A | Saturday | ||||

A | Sunday | ||||

B | Monday | ||||

B | Tuesday | ||||

B | Wednesday | ||||

B | Thursday | ||||

B | Friday | ||||

B | Saturday | ||||

B | Sunday |

You also have to create a chart showing average number of shipments per weekday per terminal.

There’s absolutely no problem with filling the **Total Shps** column. Of course, you start by adding a column **Weekday** to the **Data** table that computes the name of the weekday from the **Date** column:

=SWITCH(WEEKDAY([@Date], 2), 1, “Monday”, 2, “Tuesday”, 3, “Wednesday”, 4, “Thursday”, 5, “Friday”, 6, “Saturday”, 7, “Sunday”)

The **SWITCH** function is available since Excel 2016, so if you have an earlier version, you’ll have to create a separate table containing the weekdays and link the **Weekday** column to it.

Then the **Total Shps** column of the **Summary** table contains a simple **COUNTIFS **formula:

=COUNTIFS(Data[Terminal], [@Terminal], Data[Weekday], [@Weekday])

It is somewhat more demanding to fill in the **Min Shps **and **Max Shps** columns, because first we have to compute the number of shipments on every date (given the terminal) and then look on those dates that occur on a given weekday and compute minimum and maximum. It is not that difficult to do, however, and does not require an additional table at all. For minimum and maximum, we just have to add a **Day Shp Cnt** column to the **Data** table that gives the number of shipments on a given date from a given terminal:

=COUNTIFS([Terminal], [@Terminal], [Date], [@Date])

and then just base the summary columns on it (the formula for **Max Shps** is equivalent):

=MINIFS(Data[Day Shp Cnt], Data[Terminal], [@Terminal], Data[Weekday], [@Weekday])

Again, I’m using the **MINIFS**/**MAXIFS** functions available in Excel 2016. In earlier versions, you have to resort to the standard **MIN**/**MAX** and use advanced on-the-fly filtering.

The most interesting case is with the last column, **Avg Shps**. You can’t just compute an average of **Day Shp Cnt** column of the **Data** table. Why? Well, if you have three shipments on a given date from a given terminal, you have three rows for those shipments, and every row contains 3 in that column. Effectively, you’ll be averaging weighted data with weights proportional to the values you’re trying to average. Average of 1 and 3 is 4 / 2 = 2, and average of 1, 3, 3, 3 is 10 / 4 = 2.5 — feel the difference.

There are two solutions here. The first one is to create a technical table just to aggregate the number of shipments over date–terminal pairs. I don’t like it, because it requires to create another long table (and create it again if the data in the **Data** table change). So I’ll focus on the second one.

Recall that the key problem is the repetition of numbers. We can solve the problem by taking only one of them for each terminal–date combination. The easiest is, of course, to take the first one. So the idea is to mark the first occurrence and then add it to the filter.

The implementation is somewhat similar to that I used in the post about ranking offers with ties. I’ll also count the number of occurrences similar to a given one, but I’ll have to use advanced on-the-fly filtering, because I’ll be comparing a formula depending on a value to another formula depending on another value, not two values themselves.

To be more specific, I’ll check whether in the **Data** table, there are some rows with the same date and terminal but *above a given row*. The given row is determined by the **ROW** function, and the complete formula in the new **Day First Shp** column is:

{=(SUM(IF(([Terminal] = [@Terminal]) * ([Date] = [@Date]) * (ROW([Date]) <= ROW([@Date])) = 1, 1, 0)) = 1)}

The **SUM** formula here returns 1 if there are no rows above a given row with the same terminal and date, 2 if there is one row and so on. If the **SUM** formula returns 1, the whole formula returns **TRUE**; otherwise, it returns **FALSE**.

Then in the **Avg Shps** column of the **Summary **table, write the following:

=AVERAGEIFS(Data[Day Shp Cnt], Data[Terminal], [@Terminal], Data[Weekday], [@Weekday], Data[Day First Shp], TRUE)

The last filtering condition ensures that we take only one value for each terminal–date pair and the result is, thus, correct.

As a result of all these manipulations, the top of the **Data** table looks like:

Note that we have 4 shipments from terminal A on 21-Nov-16, shipment S10 is marked as the first shipment but shipment S17 as not the first (**Day First Shp** is **FALSE**).

The **Summary** table looks like:

It remains to create a PivotTable based on the **Summary** table and add a PivotChart based on the PivotTable. After doing some design work, the chart looks like: