I’ve been participating the beta testing of Tableau 9.3 and I’m loving the innovation the Tableau developers have been bringing to recent releases of Tableau – especially in the area of integration! Unions are an incredible new feature in Tableau 9.3 that will make life much easier. So, as we eagerly wait the release of the new version, here are a few tips and tricks for using unions in Tableau 9.3:
Tips and Tricks for the Unions in Tableau 9.3
- Start with one file, then drag all others into connection: You could drag each file one-by-one into the connection editor, but start with one and then select the others and drag them all at once to create unions quickly:
- If needed, use the Customize Union option to see your union and add/remove files. The option is available on the drop down menu of the union in the connection editor. You can remove individual files and you can drag files from the left sidebar into the list and they will appear at the bottom:
- You can union a file to itself. Just drag the file same file into the connection editor multiple times. Why would you do this? Well, there are times that you might need to “reshape” the data a little and it’s helpful to have two rows instead of the original single row. For example, if you have an origin city and destination city in the same row, but want to draw a path between the two, you’re stuck. But union the file to itself, work a little calculation magic, and you now have a row for source and a row for destination and you can draw the line between them. (Of course, for certain situations there are alternatives!)
- Make note of the Path and Table Name fields that Tableau automatically adds to your unions. These fields give you the filenames and table names and associate each row of data with the source file/table. As we’ll see, this can be quite useful.
- Use the Path and Table Name fields to your advantage! Here’s a scenario: you get a new data dump file each month. Since the file is named based on the month (yyyy-mm.csv), the guy who provides the files (we’ll call him “Fred”) has never felt it necessary to include the date in the contents of the file. But Fred never thought about the fact that when you union these together in a single source, you actually want to know the dates. So, you, being quite clever, use the [Path] or [Table Name] field to work-around Fred’s lapse in judgment:
- Check the Data Integrity! (part 1) That is, make sure the files in your union contain what you think they contain. Start simple – check to see if each file has the number of records you expect. Look for anything that falls outside the norm:
- Check the Data Integrity! (part 2) The first check was at a very high level, now we’ll go deep and check the fields.
With a view like this, I can immediately tell what issues exist in the union:
- We didn’t start tracking Returns until January 2015
- The field “Product ID” appears to have changed names to “ID” in January 2015
- The field “Cost” appears to have been split into two separate fields, “Cost (internal)” and “Cost (external)” in July 2015.
- The last time the field “Margin” appeared was in March 2015.
But how do you create the view to check your union? I’m glad you asked…
Creating the Field Level View to check the State of the Union
Creating the above view for checking unions is not difficult, but requires a bit of setup. Here’s how you can do it:
- Start by creating a view with Path on Rows, Measure Names on Columns, and Measure Values on Size:
- Switch the aggregations for all Measure Values from SUM to COUNT. Do this because you want to know whether a field exists in a given file in the union – not the total value. By the way, you can do this quickly by selecting all fields on the Measure Values shelf (click one, hold Shift, and click another to get everything in between) and changing at once using the right-click context menu:
- Now, edit the Size (double click the Size legend or use the drop down menu on the legend to edit) and set the End value of the Range to 1. This way, if there is at least one value for the field in a file (NULLs aren’t counted), then we’ll get a large square versus a little tiny square.
- Now, you can see all the fields that were classified by Tableau as Measures. What about the dimensions? Simply add them to the Measure Values shelf as COUNT aggregations. You can accomplish this relatively quickly by dragging and dropping them while holding down the right mouse button (or Option key on Mac) to get a selection menu:
And now you can see the issues in the data!
But how do you fix them? Well, some are going to require a fix of the source. You’ll have to ask Jason to produce files with Returns for 2014. But others are a little easier.
Bonus Tip #1: Use the Merge feature in the Data Source editor!
Since I can see that Product ID changed names to ID in 2015 files,
then I might want to merge the two fields into one. It’s as simple as selecting the two fields in the Data Source editor and using the drop down to select Merge.
The resulting field (named Coalesce by default) contains the value of either Product ID or ID (whichever was not null). The field can be renamed and then used in any view:
Bonus Tip #2: Use calculations to fix other union issues
What about the case where Cost got split into 2 fields? That’s not as simple as using Merge because in some cases there are 2 non-null values.
But we can use a simple calculation to get the Total. Notice, the use of the ZN() function (zero null – that is, replace any nulls with zero) to make sure the addition works. And now you have a field you can use consistently throughout the data source:
And there you have it – 9 tips on using the Union feature in Tableau 9.3. So what cool things will you accomplish with unions? How will unions make your life easier? Let me know in the comments below!