Latest Snapshot in Tableau and Tableau Prep (Maestro)
Often, you’ll have data that looks something like this:
Every day, there are new records for projects that give the latest status and count of hours used. Often a project spans multiple days and the status can change and count of hours might go up. But if I want to get the most recent status of all projects, I cannot simply filter to the latest date – because some projects don’t get records every day (especially once they are complete).
So, what if I need to do analysis on the latest snapshot of each project? That is, you need to have the most recent record for each project.
Latest Snapshot in Tableau
Those of you who have used Tableau for a while probably think of a few ways to solve this kind of problem. In the broadest sense there are two basic approaches:
- Table Calculation
- Level of Detail (LoD) expression
A table calculation, such as Last(), can be used to find the last record fairly easily. Although not necessary for the table calculation to work, conceptually we might prefer to reorder the view to better see how it might work:
Here, we’ve computed Last() along Date and Status, which makes Project the partition. Where Last() = 0, we’ve found the most recent record for the project – and we could use the field as a filter if desired.
There are some downsides to a table calculation approach. Table calculations require all the necessary data to be present in the cache (that is not filtered with a non-table calc filter). If you or the end user filter out any of the dates or statuses, you might not get the correct results. Also, if the data set is large, you might experience some performance issues. And finally, you’ve got a level of complexity that might be compounded by additional analysis.
So, you might turn to the second option: Level of Detail.
A simple FIXED LoD calculation like this:
finds the maximum date (most recent date) per project. And then, at a record level compares it to the Date for that record. Where the maximum date and the record level date match, you have the most recent snapshot of data. You can use this field to filter the data or aid in other analysis.
The benefit here is that the solution is not brittle. The LoD won’t break with additional filtering – although you have to be aware the LoD calculations are context sensitive, so there may be cases where you want a filter to change the results and you’ll need to add it to context.
Latest Snapshot in Tableau Prep
Project Maestro is in public beta! Tableau Prep is here! And I’m excited! In fact, when I recently tweeted a screenshot of a Tableau Prep Maestro flow that had me pumped, Isaac Kunen took a look and immediately recognized an LOD calc in the Maestro flow:
Awesome! I see you’ve figured out how to get a LOD calc in Maestro. https://t.co/WjfUBANxFO
— Isaac Kunen (@isaackunen) January 26, 2018
So what gives away an LOD calc? Something like this:
What is this? It’s an aggregation joined to the previous step in the flow. In other words, a calculation done at another level of detail that is brought back to the original record level of detail. Let’s break it down:
Here’s what the Projects step looks like:
It’s just our original data set (with a nice view of the domain of all the fields!)
The Max Date aggregation step groups by Project and gets the MAX date for each one:
Then we drag the connector from Max Date back to Projects to create the join that might look like this:
By using an inner join on Project and Date, we end up with the records that are the latest snapshot:
What if we need the other records, but just want to easily identify the latest snapshot? Then I can just change the join to right or left (depending on which table became right or left based on your drag/drop of the connection) and you get a data set where the join found matches on the latest snapshot records, but you still have the other records too:
And you can add a calculation like this:
And do some additional cleanup to have a nice data set, ready for easy analysis in Tableau!
I’d love to hear if you have any thoughts, additional solutions, or ways that you’ve captured latest snapshots in your data.
Also, what do you think of having Tableau Prep
Maestro posts on this blog? I’m thinking about having one or two a month in addition to one or two a month focused solely on Tableau. I’d love to hear your feedback!
Keep posting the Maestro info.
[…] already covered how to get the latest snapshot of records elsewhere, but it fits nicely here because it is a case of duplicate data that needs to be deduped. […]