• Cross Database Union & the other MAX/MIN functions in Tableau

    Cross database joins have been available in Tableau for a while now – and I love the ability to join data at a row-level across various databases and files!  And union functionality is available, but for now, only if you are dealing with tables in the same database (or files or Excel tabs).  But what if you want to union tables that are in different databases?  There is a way…

    Rey Kylo and Cross Database Joins

    Simulating a cross database union with a join

    Jonathan Drummey has an excellent post covering how to use a cross database join to effectively union together almost any data sources you want.  My goal is not to replicate his post here, but to provide a similar example and then springboard to a discussion of the “other” max() and min() functions in Tableau that you may never have used!

    So let’s get there with an example. Let’s say you have three tables containing data that needs to be unioned together, but it’s not currently possible to do so because one is a SQL Server table, one is a text file, and one lives in the cloud.

    R2D2, you know better than to trust a strange computer... especially in the cloud!

    Here are the three tables:

    tables to union in Tableau

    Without the option to accomplish as cross database union, you can create a union helper data source to simulate the union using a cross database join!  Here’s the helper table, whipped up in Excel or as a text file:

    Union Helper

    It’s a single field with a record for each table I want to include in the union.  Now, I’ll just use that as the left table in the Tableau connection screen and join the Table field three times, once each to the three tables.  And I can use a Join Calculation that matches for each table:

    A Star Wars Union

     

    Now, I’ve got all the rows from each table concatenated together.  It’s almost a clean union, but the one drawback is that the fields are not merged together.  With a normal union, I’d end up with 3 columns (Name, Occupation, and Bank Account Balance) with all the values from all the unioned tables.  But here, I get separate fields with the values sliced up between them.

    Chopped Up

    Fortunately, it’s all pretty easy to put back together…

    Min and Max in Tableau

    Normally, you use MIN and MAX as aggregations in Tableau.  If you aggregate Sales as a MAX, you’re finding out what the highest sales amount was given the level of detail in the view (e.g. highest sales value per region or shipping mode).  You might even use it in a calculation like MIN([Order Size]) >= [Low Threshold] to see if you met a goal or not.

    But it turns out, you can use Min() and Max() as row level functions too!  It looks like this:

    MIN([Value1], [Value2])

    And when you write the function like that, with two arguments, it will return the lowest (for MIN) or highest (for MAX) of the two values.  If Value1 = 100 and Value2 = 200 then the statement above would return 100 and the statement MAX([Value1], [Value2]) would return 200.

    Now, there’s all kinds of use for this!  Let’s say you had 6 different fields, F1 through F6, and for each row of data, you wanted to know which of the fields was the highest.  In fact, you wanted a label for each row that would tell you the field name of the highest value (e.g. “F1” or “F3”).

    Imagine writing that with an IF/THEN statement:

    IF F1 > F2 AND F1 > F3 AND F1 > F4 AND F1 > F5 AND F1 > F6
    THEN "F1"
    ELSEIF F2 > F1 AND F2 > F3 AND F2 > F4 AND F2 > F5 AND F2 > F6
    THEN "F2"
    ...
    
    

    Maybe not too bad… for 6 fields.  But what if there were 9?  And of course someone’s going to add a few more after that…

    Star Wars Final Movie

    In that case, we can use the “other” MAX function to find the largest value and then compare it to the values of the fields.  Unfortunately, MAX() doesn’t take more than 2 arguments.  Fortunately, we can nest MAX functions like this:

    CASE MAX(MAX(MAX(F1, F2), MAX(F3, F4)), MAX(F5, F6))
    WHEN F1 THEN "F1"
    WHEN F2 THEN "F2"
    WHEN F3 THEN "F3"
    WHEN F4 THEN "F4"
    WHEN F5 THEN "F5"
    WHEN F6 THEN "F6"
    END

    That’s not too bad at all!  Now, you’ll have to decide what to do with ties, but the pattern is far less complex than a messy IF/THEN/ELSEIF statement.

    And now, we circle back around to the simulated union…

    Using MIN or MAX to Merge mismatched fields

    When I use the Union feature in Tableau, I have the option to Merge mismatched fields with a few clicks.  But, with a simulated union that’s really a join, Tableau doesn’t allow this (although Tableau’s cousin Maestro does!)  In SQL, I’d use a COALESCE function that would take a list of values and return the first non-null value.  Tableau does have an IFNULL statement it would work really well.  In fact, it’s almost identical to my use of MIN or MAX here.  So it really comes down to preference.

    But if I want to merge the [Name], [Name (Prequels.txt)] and  [Name (Sequels.txt)] fields from the join above, here’s the function I would write:

    MAX([Name], MAX([Name (Prequels.txt)],  [Name (Srequels.txt)]))

    Since MIN or MAX will eliminate the NULL values, then only the non-null value for whichever field is applicable to the given row will be returned.  I just have to do that for each field that got split up.

    And who knows, this technique may give you all kinds of unnatural abilities.

    Darth Plagueis

     

3 Responsesso far.

  1. Ryan Calovich says:

    excellent post, great stuff!

  2. Jake says:

    Thanks for the post! I gave this a shot using the exact same data as your example (I created these tables in an excel workbook), replicated your schema and used your MAX expression for each field. When I bring the MAX calc fields into the view… I get NULL values! Not sure what is going on here, but I followed your exact steps. I was however able to return the correct values by altering the formula and replacing MAX with IFNULL– inspired by Drummey’s coalesce technique.

    • Joshua Milligan says:

      Hi Jake,

      IFNULL() is a great alternative that would work in much the same way. I’m not sure why the MAX() function wouldn’t work for you, unless both values were null. But that doesn’t sound like it’s the case.

Leave a Reply

Your email address will not be published. Required fields are marked *