Cross database joins have been available in Tableau for a while now – and I love the ability to join data at a row-level across various databases and files! And union functionality is available, but for now, only if you are dealing with tables in the same database (or files or Excel tabs). But what if you want to union tables that are in different databases? There is a way…
Jonathan Drummey has an excellent post covering how to use a cross database join to effectively union together almost any data sources you want. My goal is not to replicate his post here, but to provide a similar example and then springboard to a discussion of the “other” max() and min() functions in Tableau that you may never have used!
So let’s get there with an example. Let’s say you have three tables containing data that needs to be unioned together, but it’s not currently possible to do so because one is a SQL Server table, one is a text file, and one lives in the cloud.
Here are the three tables:
Without the option to accomplish as cross database union, you can create a union helper data source to simulate the union using a cross database join! Here’s the helper table, whipped up in Excel or as a text file:
It’s a single field with a record for each table I want to include in the union. Now, I’ll just use that as the left table in the Tableau connection screen and join the Table field three times, once each to the three tables. And I can use a Join Calculation that matches for each table:
Now, I’ve got all the rows from each table concatenated together. It’s almost a clean union, but the one drawback is that the fields are not merged together. With a normal union, I’d end up with 3 columns (Name, Occupation, and Bank Account Balance) with all the values from all the unioned tables. But here, I get separate fields with the values sliced up between them.
Fortunately, it’s all pretty easy to put back together…
Normally, you use MIN and MAX as aggregations in Tableau. If you aggregate Sales as a MAX, you’re finding out what the highest sales amount was given the level of detail in the view (e.g. highest sales value per region or shipping mode). You might even use it in a calculation like MIN([Order Size]) >= [Low Threshold] to see if you met a goal or not.
But it turns out, you can use Min() and Max() as row level functions too! It looks like this:
And when you write the function like that, with two arguments, it will return the lowest (for MIN) or highest (for MAX) of the two values. If Value1 = 100 and Value2 = 200 then the statement above would return 100 and the statement MAX([Value1], [Value2]) would return 200.
Now, there’s all kinds of use for this! Let’s say you had 6 different fields, F1 through F6, and for each row of data, you wanted to know which of the fields was the highest. In fact, you wanted a label for each row that would tell you the field name of the highest value (e.g. “F1” or “F3”).
Imagine writing that with an IF/THEN statement:
IF F1 > F2 AND F1 > F3 AND F1 > F4 AND F1 > F5 AND F1 > F6 THEN "F1" ELSEIF F2 > F1 AND F2 > F3 AND F2 > F4 AND F2 > F5 AND F2 > F6 THEN "F2" ...
Maybe not too bad… for 6 fields. But what if there were 9? And of course someone’s going to add a few more after that…
In that case, we can use the “other” MAX function to find the largest value and then compare it to the values of the fields. Unfortunately, MAX() doesn’t take more than 2 arguments. Fortunately, we can nest MAX functions like this:
CASE MAX(MAX(MAX(F1, F2), MAX(F3, F4)), MAX(F5, F6)) WHEN F1 THEN "F1" WHEN F2 THEN "F2" WHEN F3 THEN "F3" WHEN F4 THEN "F4" WHEN F5 THEN "F5" WHEN F6 THEN "F6" END
That’s not too bad at all! Now, you’ll have to decide what to do with ties, but the pattern is far less complex than a messy IF/THEN/ELSEIF statement.
And now, we circle back around to the simulated union…
When I use the Union feature in Tableau, I have the option to Merge mismatched fields with a few clicks. But, with a simulated union that’s really a join, Tableau doesn’t allow this (although Tableau’s cousin Maestro does!) In SQL, I’d use a COALESCE function that would take a list of values and return the first non-null value. Tableau does have an IFNULL statement it would work really well. In fact, it’s almost identical to my use of MIN or MAX here. So it really comes down to preference.
But if I want to merge the [Name], [Name (Prequels.txt)] and [Name (Sequels.txt)] fields from the join above, here’s the function I would write:
MAX([Name], MAX([Name (Prequels.txt)], [Name (Srequels.txt)]))
Since MIN or MAX will eliminate the NULL values, then only the non-null value for whichever field is applicable to the given row will be returned. I just have to do that for each field that got split up.
And who knows, this technique may give you all kinds of unnatural abilities.