• State of the Unions in Tableau Prep: MINUS, EXCEPT and INTERSECT

    Recently I gave a “State of the Unions” address for Tableau’s Think Data Thursday (the recording is here) in which I discussed new features for unions in Tableau Prep and Tableau Desktop.  One thing I didn’t cover, but has subsequently come up is the question “What about INTERSECT and MINUS or EXCEPT Unions”?  Can we achieve these types of unions in Tableau Prep?  Can we do anything other than a UNION ALL?  Yes!  Yes, we can!State of the Unions in Tableau Prep

    Unions in Tableau Prep

    In Tableau Prep, both the Union Step itself and also the Input Step using a wildcard union are going to function as a UNION ALL.  That is, all records from all tables will be included in the results. However, many databases support logic beyond this kind of union:

    • UNION keeps only one copy of each unique record.  That is, if records in the resulting set are exactly identical, only one copy is kept.
    • INTERSECT unions return records that exist in both tables
    • MINUS unions (sometimes also called EXCEPT) keep records that are only present in the first table.  That is, if a record exists in both tables or only in the second, it is not retained in the resulting set.

    Now, if your data source is a relational database such as Oracle, SQL Server, MySQL, or Snowflake that supports a Custom SQL Input step, then you could write your own SQL to perform UNION, INTERSECT or MINUS / EXCEPT unions in Tableau Prep.

    But what if you are using a data source that doesn’t support the logic?  Or you need to implement it somewhere in the flow other than the input?

    Don’t worry!  Tableau Prep has you covered!  It has all the foundational tools to allow you to implement all kinds of data shaping logic!

    UNION in Tableau Prep

    If you want to achieve a UNION (as opposed to UNION ALL) that eliminates duplicate records, then you can simply use a Union Step and then something like what’s described in this post.

    But what if you want to achieve and INTERSECT or MINUS/EXCEPT union? We’ll walk through some possibilities using this sample data:

    Products in Inventory

    Products in Inventory

     

    Products On Order

     

     

    The Inventory table contains all products in the inventory while the On Order table contains all the products that have been ordered.  It is possible for there to be overlap at times, such as is highlighted above.  This may be legitimate or possibly due to a system synchronization issue.  It may be necessary to evaluate all products that are inventory but not on order or all products that are both on order and in inventory.

    INTERSECT unions in Tableau Prep

    If you want to get an INTERSECT (rows that exactly match between the tables), then you’ve got some options.  You could simply join the rows together with a join condition that matches all fields (or at least all fields that you consider a unique record).  The danger is that if you end up with a many to many match you could duplicate records without meaning to and maybe even without realizing.

    So instead, you could use a flow like this:

    Flow for Intersect Unions in Tableau Prep

    Tableau Prep generates the Table Names field from the Union and you can do a distinct count on that, grouped by all other fields, to determine how many tables that exact row came from.  You can see the desired results in the highlighting of the profile pane.

    In the next step, you can eliminate everything that only came from one table:

    Finishing the Intersect Union in Tableau Prep

    Intersect unions in Tableau Prep? No problem!

    MINUS unions in Tableau Prep (EXCEPT Union)

    For a MINUS Union (where rows only from the inventory table that are not contained in the on order table), you could do something similar:

     

     

    After unioning together the two tables, we’ve added a step to create a copy of the Table Names field.  That’s because we’ll want to aggregate it in two different ways (As a side-note, I’d love to see Tableau Prep allow me to use a field more than once in an aggregate step, so duplicating it in a previous step wasn’t necessary.  If you agree, consider voting for this idea!)

    In the aggregate step, we’ll get the minimum and maximum table names per all values of all fields in the record and then only kept records where the min and max table is Inventory (i.e. those records were not in the On Order table).

     

    I’ve attached both flows so you can walk through them in more detail as you explore unions in Tableau Prep.  Here they are! INTERSECT and MINUS Unions for Tableau Prep

     

    Happy New Year!
    Joshua

Leave a Reply

Your email address will not be published. Required fields are marked *