Happy New Year!
Even as 2016 was winding down, I found myself with a challenge dealing with time. And I thought I’d share some of my thoughts about a possible approach. Here’s the challenge:
If you could build a data warehouse, one possible structure for the data that you would consider would be a periodic snapshot. That is, you’d capture the current state of the data and you’d do so periodically (hourly, daily, weekly, etc…). Each snapshot would be stored (as fact table records) with a time-stamp, allowing you to tell when the snapshot was taken.
Point Tableau at your fact/dimension tables, and analyzing and visualizing the data is a piece of cake! And you’ll have a rich history for trends, forecasting, and seeing the changes over time.
…But what if you don’t have the ability to build a data warehouse?
First of all, it’s 2017! Building data warehouses is easier, faster, and cheaper than ever… and there are other tools out there for blending data and building data structures… but I get it… there are times when it isn’t an option (at least in the short-run).
But guess what? It’s 2017! And Tableau continues to add data prep and data integration features that make life easier. And in fact, what I’m about to share isn’t even a new feature — it’s been there since version 6. But when you pair it with some newer features (such as Level of Detail calcs, cross database joins, unions) you start to get the ability to create your own data warehouse right in Tableau! (okay, there are some limitations and even <scary> dangers </scary>, but we’ll
gloss over those look at those at the end of the post.
Incremental refresh is an option when you create a data extract:
What the incremental refresh does is use the field you specify and looks for rows of data having values for that field that are greater than the values already existing in the extract. There is no requirement that these values be unique (for example, you could have 100 rows all having a 1 value today and then a few more new rows with a 2 that would be added on the next incremental refresh — or you could have 1,2…100 and then new rows with 101,102… that would be incrementally added) So, as long as there are rows where the value in the field are greater than what you had previously, those rows would get included in an incremental refresh.
Be aware that any new rows that have the same or lesser values as those which already exist in the extract will not be added in an incremental refresh. Also, incremental refresh is only good for adding rows, so it won’t catch anything that was deleted or changed at the source.
But for the purposes of creating a periodic snapshot data structure, it’s okay if rows change or are deleted, because we’re going to capture all available data every period — so we get a snapshot of it’s current state now and then a snapshot of the current state the next period and then the next and so on. So, we’re building a history as we go. If records change, are deleted, or we have new ones — no problem we’ll just get what’s available each time.
The main key is to create a field that we can use as the field that identifies the new rows. We can’t create a new table, but there’s nothing that prevents us from using a custom SQL statement to create a new field in the SELECT. Consider a custom SQL statement like:
SELECT [Generator], [Status], GETDATE() as [Snapshot_Date] FROM Generators
What we’ll get is a snapshot of the data that includes a time-stamp of when the snapshot was taken. Here a three such snapshots:
And if you use that Snapshot_Date field as your incremental refresh field, then each time you run the incremental refresh you’ll get all the rows appended to the extract with the most recent time-stamp (which of course is greater than any values already in the extract).
So, you end up with a data structure looking like this:
Now you’ve got historical data and can do all kinds of fun analysis! Want to see how many generators were online/offline over time? Your boss can only see the current state from the transactional database. But you’ll impress her with the ability to quickly and seamlessly go through the entire history (at least since you’ve been capturing the snapshots)!
<scary class=’very’> After your initial extract, don’t ever, ever, ever run a full extract again. If you do, you’ll overwrite all your historical snapshots with the current snapshot of the data. Read that again and let your blood run cold as you realize that everything you built will collapse like a house of cards with one false step. </scary> Only run incremental refreshes!
To mitigate that risk, consider some options:
Beyond that, consider these tips:
Hope that helps!