• Periodic Snapshots with Tableau Incremental Refresh

    2017: The Year of Data

    Happy New Year!

    Even as 2016 was winding down, I found myself with a challenge dealing with time.  And I thought I’d share some of my thoughts about a possible approach.  Here’s the challenge:

    • You have a database with transactional data, that is data where the rows are inserted, updated and deleted based on events.  One generator goes offline and two others are brought up, the operating status for those records is updated.  Money is allocated to different departments this month, you get updates to rows for the departments.  An employee is promoted, the position value changes in the database.
    • You need to be able to see that state of the data at certain historical points in time and even see trends.  The data as it exists in the transactional system is always current: how many generators are online right now, how much money is allocated right now, how many vice presidents do you have right now.  You cannot see how many generators were online last month or track trends in the money allocation.
    • The database is locked down.  You can read the data, but you cannot create new tables or structures in the database or schedule some kind of automated process in the database.

    Periodic Snapshots

    If you could build a data warehouse, one possible structure for the data that you would consider would be a periodic snapshot.  That is, you’d capture the current state of the data and you’d do so periodically (hourly, daily, weekly, etc…).  Each snapshot would be stored (as fact table records) with a time-stamp, allowing you to tell when the snapshot was taken.

    Periodic Snapshots of Data

    It’s kind of like capturing snapshots of your kids: “And look, here’s when little Generator 3 came online for the first time!”

    Point Tableau at your fact/dimension tables, and analyzing and visualizing the data is a piece of cake!  And you’ll have a rich history for trends, forecasting, and seeing the changes over time.

    …But what if you don’t have the ability to build a data warehouse?

    First of all, it’s 2017! Building data warehouses is easier, faster, and cheaper than ever… and there are other tools out there for blending data and building data structures… but I get it… there are times when it isn’t an option (at least in the short-run).

    But guess what? It’s 2017!  And Tableau continues to add data prep and data integration features that make life easier.  And in fact, what I’m about to share isn’t even a new feature — it’s been there since version 6.  But when you pair it with some newer features (such as Level of Detail calcs, cross database joins, unions) you start to get the ability to create your own data warehouse right in Tableau! (okay, there are some limitations and even <scary> dangers </scary>, but we’ll gloss over those look at those at the end of the post.

    Tableau Incremental Refresh

    Incremental refresh is an option when you create a data extract:

    Incremental Refresh

     

    What the incremental refresh does is use the field you specify and looks for rows of data having values for that field that are greater than the values already existing in the extract.  There is no requirement that these values be unique (for example, you could have 100 rows all having a 1 value today and then a few more new rows with a 2 that would be added on the next incremental refresh — or you could have 1,2…100 and then new rows with 101,102… that would be incrementally added) So, as long as there are rows where the value in the field are greater than what you had previously, those rows would get included in an incremental refresh.

     Be aware that any new rows that have the same or lesser values as those which already exist in the extract will not be added in an incremental refresh.  Also, incremental refresh is only good for adding rows, so it won’t catch anything that was deleted or changed at the source.

    Building the Periodic Snapshot in Tableau

    But for the purposes of creating a periodic snapshot data structure, it’s okay if rows change or are deleted, because we’re going to capture all available data every period — so we get a snapshot of it’s current state now and then a snapshot of the current state the next period and then the next and so on.  So, we’re building a history as we go.  If records change, are deleted, or we have new ones — no problem we’ll just get what’s available each time.

    The main key is to create a field  that we can use as the field that identifies the new rows.  We can’t create a new table, but there’s nothing that prevents us from using a custom SQL statement to create a new field in the SELECT.  Consider a custom SQL statement like:

    SELECT 
    [Generator], 
    [Status],
    GETDATE() as [Snapshot_Date] 
    FROM Generators

    What we’ll get is a snapshot of the data that includes a time-stamp of when the snapshot was taken.  Here a three such snapshots:

    Periodic Snapshots - Tables

    I’ve simplified the time-stamp to be just the date, but based on the SQL above, it would include the time too.

    And if you use that Snapshot_Date field as your incremental refresh field, then each time you run the incremental refresh you’ll get all the rows appended to the extract with the most recent time-stamp (which of course is greater than any values already in the extract).

    So, you end up with a data structure looking like this:

    Period Snapshots Table

     

    Now you’ve got historical data and can do all kinds of fun analysis!  Want to see how many generators were online/offline over time?  Your boss can only see the current state from the transactional database.  But you’ll impress her with the ability to quickly and seamlessly go through the entire history (at least since you’ve been capturing the snapshots)!

     

    Tips and Tricks (and Dangers!)

    <scary class=’very’> After your initial extract, don’t ever, ever, ever run a full extract again.  If you do, you’ll overwrite all your historical snapshots with the current snapshot of the data.  Read that again and let your blood run cold as you realize that everything you built will collapse like a house of cards with one false step. </scary>  Only run incremental refreshes!

    To mitigate that risk, consider some options:

    • Don’t use the incremental refresh to build periodic snapshots in Tableau (if you have other options). I know, I just wrote a whole blog post on how and why. But if you have the option for building a data warehouse or even just exporting a periodic snapshot of the data as a series of flat-files to union together in Tableau, you’ll avoid the risk altogether.  I know those options aren’t always available which is why I detailed the incremental refresh option. And I hit an occasion recently where this was really the only way. So, if you do use this technique, do the following:
    • Backup your extracts periodically.  If you’re scheduling them on Tableau Server, you should be running a backup anyway.  But always keep a backup of your extract for the last few periods in case you need to restore it.
    • Create a view that allows you to see how many snapshots you have.  Subscribe to this view (use conditional alerts in 10.1) so you find out fast if your snapshots get wiped out.

    Beyond that, consider these tips:

    • When you write your custom SQL statement, you’ll want to truncate the date to the level of the frequency you are capturing.  Don’t capture minutes and seconds if you’re capturing hourly.  Don’t capture days if you’re capturing monthly.  It makes the SQL a bit more complex, but it’ll save you from some headache if you accidentally run more than one incremental refresh per period.
    • Use filters, Level of Detail expressions (LOD), or carefully determined aggregations to roll up to time periods at a higher level than your periodic snapshots.  For example, if you’re capturing daily but want to report monthly, you won’t be able to fully aggregate measures unless you determine how monthly reporting should work.
    • Want to show the status of things as the 1st of the month?  Then a filter based on a calculated field like: DAY([Snapshot_date]) == 1 should work.
    • Want the last day of data available for each month? {FIXED [Generator], DATETRUNC(‘month’,[Snapshot_Date]) : MAX([Snapshot_Date])} == [Snapshot_Date]

     

    Hope that helps!
    Joshua

Leave a Reply

Your email address will not be published. Required fields are marked *