Looking Up Data by Several Columns

There are many examples when we have to look up data in an Excel table.

Typically, it’s done by a family of LOOKUP functions. I’m not a fan of them, because they are completely not transparent — unless you see the table against which the LOOKUP function is applied, you can’t understand what exactly it does.

An alternative is a combination of INDEX and MATCH functions, which I call an INDEX–MATCH approach. The easiest and simplest case is when we have to find the value of a column in a row where another column has a specific value:

=INDEX(Data[resulting column], MATCH(value to find, Data[filtering column], 0))

Note how readable this formula is.

Another alternative is to use the SUMIFS function. There are two drawbacks of this approach:

  • If, due to any error, there are several rows where the filtering column has the value we’re searching for, the function will return the sum of the resulting column over these rows, while the INDEXMATCH approach will return the value in the first such row
  • When there are no such rows, the SUMIFS function will return zero, while the INDEXMATCH approach will result in an error. If the resulting column does not contain zeroes, this is not a problem, but otherwise, you’ll be unable to determine whether zero is a legitimate sum or no rows were found

When there are several filtering columns, the SUMIFS function can be used straightforwardly, as it was built to be scalable. That’s not the case with the INDEX–MATCH approach, however, as the MATCH function can search only in one column. In database parlance, the MATCH function can work only with simple keys but we want it to make it work also with compound keys.

However, nothing precludes us from generating a new column in the data that uniquely identifies rows for our purposes. I typically call this column UID for unique identifier. Let’s take a look at an example:

Product Month Sales
A Jan 49752
A Feb 50259
A Mar 49573
B Jan 54834
B Feb 50356
B Mar 52300
C Jan 54579
C Feb 54361
C Mar 53924
D Jan 51562
D Feb 47391
D Mar 54883

Here we have monthly sales of products in a table called Data. We want to reshape it into the following table:

Sales of \ in Jan Feb Mar
A
B
C
D

Because we’ll look up values in the Data table by product and month, we generate the UID column in the Data table to uniquely identify each row using the following formula:

=[@Product] & “-” & [@Month]

Here’s what you get as a result:

looking-up-data-by-several-columns-data-modified

Then in the results table, we just have to apply the INDEX–MATCH approach. To create a nicer formula, first name the top row of the results table containing month names as Months and the first column containing product names as Products. Then the formula that should fill the empty cells in the results table is:

=INDEX(Data[Sales], MATCH(Products & “-” & Months, Data[UID], 0))

Note that in this example, the value the MATCH function will try to find is determined by an implicit intersection of two ranges (Products and Months).

Here is the results table:

looking-up-data-by-several-columns-result

Advertisements

3 thoughts on “Looking Up Data by Several Columns

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s