• Latest Snapshot in Tableau and Tableau Prep (Maestro)

    Often, you’ll have data that looks something like this:

    The Data

     

    Every day, there are new records for projects that give the latest status and count of hours used.  Often a project spans multiple days and the status can change and count of hours might go up.  But if I want to get the most recent status of all projects, I cannot simply filter to the latest date – because some projects don’t get records every day (especially once they are complete).

    So, what if I need to do analysis on the latest snapshot of each project?  That is, you need to have the most recent record for each project.

    Latest Snapshot in Tableau

    Those of you who have used Tableau for a while probably think of a few ways to solve this kind of problem.  In the broadest sense there are two basic approaches:

    1. Table Calculation
    2. Level of Detail (LoD) expression

     

    A table calculation, such as Last(), can be used to find the last record fairly easily.  Although not necessary for the table calculation to work, conceptually we might prefer to reorder the view to better see how it might work:

    Table Calculation for Latest Snapshot

    Here, we’ve computed Last() along Date and Status, which makes Project the partition.  Where Last() = 0, we’ve found the most recent record for the project – and we could use the field as a filter if desired.

    There are some downsides to a table calculation approach.  Table calculations require all the necessary data to be present in the cache (that is not filtered with a non-table calc filter). If you or the end user filter out any of the dates or statuses, you might not get the correct results.  Also, if the data set is large, you might experience some performance issues.  And finally, you’ve got a level of complexity that might be compounded by additional analysis.

     

    So, you might turn to the second option: Level of Detail.

    A simple FIXED LoD calculation like this:

    Level of Detail Calculation for Latest Snapshot

    finds the maximum date (most recent date) per project.  And then, at a record level compares it to the Date for that record.  Where the maximum date and the record level date match, you have the most recent snapshot of data.  You can use this field to filter the data or aid in other analysis.

    The benefit here is that the solution is not brittle.  The LoD won’t break with additional filtering – although you have to be aware the LoD calculations are context sensitive, so there may be cases where you want a filter to change the results and you’ll need to add it to context.

    Latest Snapshot in Tableau Prep

    Project Maestro is in public beta! Tableau Prep is here!  And I’m excited!  In fact, when I recently tweeted a screenshot of a Tableau Prep Maestro flow that had me pumped, Isaac Kunen took a look and immediately recognized an LOD calc in the Maestro flow:

     

    So what gives away an LOD calc?  Something like this:

    LOD calc in Maestro

    What is this?  It’s an aggregation joined to the previous step in the flow. In other words, a calculation done at another level of detail that is brought back to the original record level of detail.  Let’s break it down:

    Here’s what the Projects step looks like:
    Projects Step in Tableau Prep
    It’s just our original data set (with a nice view of the domain of all the fields!)

     

    The Max Date aggregation step groups by Project and gets the MAX date for each one:

    Grouped Fields in Maestro

    Then we drag the connector from Max Date back to Projects to create the join that might look like this:

    Join in Maestro

    By using an inner join on Project and Date, we end up with the records that are the latest snapshot:

    Latest Records in Tableau Prep

    What if we need the other records, but just want to easily identify the latest snapshot?  Then I can just change the join to right or left (depending on which table became right or left based on your drag/drop of the connection) and you get a data set where the join found matches on the latest snapshot records, but you still have the other records too:

    Latest Snapshot records and others

    And you can add a calculation like this:

    Calculated Field in Tableau Prep

    And do some additional cleanup to have a nice data set, ready for easy analysis in Tableau!

    Final recordset

    I’d love to hear if you have any thoughts, additional solutions, or ways that you’ve captured latest snapshots in your data.

    Also, what do you think of having Tableau Prep Maestro posts on this blog?  I’m thinking about having one or two a month in addition to one or two a month focused solely on Tableau.  I’d love to hear your feedback!

3 Responsesso far.

  1. Chris Drost says:

    Keep posting the Maestro info.

  2. […] already covered how to get the latest snapshot of records elsewhere, but it fits nicely here because it is a case of duplicate data that needs to be deduped.  […]

Leave a Reply

Your email address will not be published. Required fields are marked *