Aggregating Shipment Statistics Into Day Groups

We’re all frequently asked to compute statistics of micro-level data at group level. If you have to base statistics on micro-level data then there’s not much of a problem. But what if you have to base some statistics on micro-level data and some at subgroup-level data with too many subgroups? How to do that without creating additional tables?

You have a large table called Data with shipment-level data. It shows which shipment was delivered from which terminal (A or B) on which date:

Shipment ID Terminal Date
S1 B 22-Nov-16
S2 B 18-Nov-16
S3 A 8-Nov-16
S4 A 4-Nov-16
S5 A 11-Nov-16
S6 B 21-Nov-16
S7 A 1-Nov-16
S8 B 30-Nov-16
S9 A 19-Nov-16
S10 A 21-Nov-16
S11 A 30-Nov-16
S12 A 9-Nov-16
S13 B 26-Nov-16
S14 A 3-Nov-16
S15 B 29-Nov-16
S16 B 4-Nov-16
S17 A 21-Nov-16
S18 A 15-Nov-16
S19 A 28-Nov-16
S20 A 18-Nov-16
S21 A 23-Nov-16
S22 B 14-Nov-16
S23 A 14-Nov-16
S24 B 12-Nov-16
S25 A 13-Nov-16
S26 A 6-Nov-16
S27 A 2-Nov-16
S28 A 27-Nov-16
S29 B 10-Nov-16
S30 A 1-Nov-16
S31 A 22-Nov-16
S32 A 19-Nov-16
S33 B 12-Nov-16
S34 A 12-Nov-16
S35 A 11-Nov-16
S36 A 27-Nov-16
S37 A 20-Nov-16
S38 B 24-Nov-16
S39 B 17-Nov-16
S40 A 5-Nov-16
S41 A 10-Nov-16
S42 B 15-Nov-16
S43 A 27-Nov-16
S44 A 24-Nov-16
S45 B 16-Nov-16
S46 A 9-Nov-16
S47 A 17-Nov-16
S48 A 20-Nov-16
S49 B 16-Nov-16
S50 A 9-Nov-16
S51 B 11-Nov-16
S52 B 21-Nov-16
S53 B 29-Nov-16
S54 A 15-Nov-16
S55 B 12-Nov-16
S56 A 12-Nov-16
S57 B 5-Nov-16
S58 B 10-Nov-16
S59 A 29-Nov-16
S60 A 26-Nov-16
S61 A 30-Nov-16
S62 A 3-Nov-16
S63 A 20-Nov-16
S64 A 30-Nov-16
S65 A 14-Nov-16
S66 A 11-Nov-16
S67 A 7-Nov-16
S68 A 12-Nov-16
S69 A 5-Nov-16
S70 B 17-Nov-16
S71 A 5-Nov-16
S72 A 27-Nov-16
S73 B 12-Nov-16
S74 A 21-Nov-16
S75 A 6-Nov-16
S76 B 13-Nov-16
S77 A 22-Nov-16
S78 A 5-Nov-16
S79 B 21-Nov-16
S80 A 15-Nov-16
S81 A 16-Nov-16
S82 A 6-Nov-16
S83 A 18-Nov-16
S84 A 21-Nov-16
S85 A 3-Nov-16
S86 B 22-Nov-16
S87 A 27-Nov-16
S88 A 14-Nov-16
S89 B 1-Nov-16
S90 B 9-Nov-16
S91 A 27-Nov-16
S92 B 22-Nov-16
S93 A 7-Nov-16
S94 A 18-Nov-16
S95 A 27-Nov-16
S96 A 2-Nov-16
S97 B 12-Nov-16
S98 A 30-Nov-16
S99 A 15-Nov-16
S100 A 4-Nov-16

You’re asked to compute the total number of shipments as well as minimum, maximum and average number of shipments grouped by terminal and weekday. In other words, you have to fill the following table named Summary:

Terminal Weekday Total Shps Min Shps Max Shps Avg Shps
A Monday
A Tuesday
A Wednesday
A Thursday
A Friday
A Saturday
A Sunday
B Monday
B Tuesday
B Wednesday
B Thursday
B Friday
B Saturday
B Sunday

You also have to create a chart showing average number of shipments per weekday per terminal.

There’s absolutely no problem with filling the Total Shps column. Of course, you start by adding a column Weekday to the Data table that computes the name of the weekday from the Date column:

=SWITCH(WEEKDAY([@Date], 2), 1, “Monday”, 2, “Tuesday”, 3, “Wednesday”, 4, “Thursday”, 5, “Friday”, 6, “Saturday”, 7, “Sunday”)

The SWITCH function is available since Excel 2016, so if you have an earlier version, you’ll have to create a separate table containing the weekdays and link the Weekday column to it.

Then the Total Shps column of the Summary table contains a simple COUNTIFS formula:

=COUNTIFS(Data[Terminal], [@Terminal], Data[Weekday], [@Weekday])

It is somewhat more demanding to fill in the Min Shps and Max Shps columns, because first we have to compute the number of shipments on every date (given the terminal) and then look on those dates that occur on a given weekday and compute minimum and maximum. It is not that difficult to do, however, and does not require an additional table at all. For minimum and maximum, we just have to add a Day Shp Cnt column to the Data table that gives the number of shipments on a given date from a given terminal:

=COUNTIFS([Terminal], [@Terminal], [Date], [@Date])

and then just base the summary columns on it (the formula for Max Shps is equivalent):

=MINIFS(Data[Day Shp Cnt], Data[Terminal], [@Terminal], Data[Weekday], [@Weekday])

Again, I’m using the MINIFS/MAXIFS functions available in Excel 2016. In earlier versions, you have to resort to the standard MIN/MAX and use advanced on-the-fly filtering.

The most interesting case is with the last column, Avg Shps. You can’t just compute an average of Day Shp Cnt column of the Data table. Why? Well, if you have three shipments on a given date from a given terminal, you have three rows for those shipments, and every row contains 3 in that column. Effectively, you’ll be averaging weighted data with weights proportional to the values you’re trying to average. Average of 1 and 3 is 4 / 2 = 2, and average of 1, 3, 3, 3 is 10 / 4 = 2.5 — feel the difference.

There are two solutions here. The first one is to create a technical table just to aggregate the number of shipments over date–terminal pairs. I don’t like it, because it requires to create another long table (and create it again if the data in the Data table change). So I’ll focus on the second one.

Recall that the key problem is the repetition of numbers. We can solve the problem by taking only one of them for each terminal–date combination. The easiest is, of course, to take the first one. So the idea is to mark the first occurrence and then add it to the filter.

The implementation is somewhat similar to that I used in the post about ranking offers with ties. I’ll also count the number of occurrences similar to a given one, but I’ll have to use advanced on-the-fly filtering, because I’ll be comparing a formula depending on a value to another formula depending on another value, not two values themselves.

To be more specific, I’ll check whether in the Data table, there are some rows with the same date and terminal but above a given row. The given row is determined by the ROW function, and the complete formula in the new Day First Shp column is:

{=(SUM(IF(([Terminal] = [@Terminal]) * ([Date] = [@Date]) * (ROW([Date]) <= ROW([@Date])) = 1, 1, 0)) = 1)}

The SUM formula here returns 1 if there are no rows above a given row with the same terminal and date, 2 if there is one row and so on. If the SUM formula returns 1, the whole formula returns TRUE; otherwise, it returns FALSE.

Then in the Avg Shps column of the Summary table, write the following:

=AVERAGEIFS(Data[Day Shp Cnt], Data[Terminal], [@Terminal], Data[Weekday], [@Weekday], Data[Day First Shp], TRUE)

The last filtering condition ensures that we take only one value for each terminal–date pair and the result is, thus, correct.

As a result of all these manipulations, the top of the Data table looks like:

Aggregating Shipment Statistics Into Day Groups - Data Final.png

Note that we have 4 shipments from terminal A on 21-Nov-16, shipment S10 is marked as the first shipment but shipment S17 as not the first (Day First Shp is FALSE).

The Summary table looks like:

Aggregating Shipment Statistics Into Day Groups - Summary.PNG

It remains to create a PivotTable based on the Summary table and add a PivotChart based on the PivotTable. After doing some design work, the chart looks like:

Aggregating Shipment Statistics Into Day Groups - PivotChart.PNG



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s