Excel and Text File Landmines to Avoid in Tableau
Excel spreadsheets and flat text files can be incredible data sources for Tableau. They are ubiquitous, easy to use, easy to create, portable. But, if you use Excel or text files as Tableau data sources, then you may not be aware of some landmines that can undermine your analysis. Step on one of these and your credibility as a data analyst or data visualization expert might be blown to smithereens:
Joining on Blank values What makes this landmine particularly dangerous is that Tableau 9.3 changed the way Tableau handles joins on blank (NULL) values in Excel and text files. Tableau now follows the ANSI standard which states that NULL values will not match any value (even other NULLs). I’m glad to see Tableau following convention! (And I’m sure it was necessary to standardize for cross-database joins in Tableau 10.0) But, earlier versions of Tableau would allow a join to find matches when the two values were blank. So if you were used to that behavior, you might not even know you are in the minefield. And many Excel users just think “blank” and not “null”, again unaware of the danger that lurks just beneath the surface. Bottom line: Make sure fields you use in a join don’t have blanks. If necessary, change blank values in the source.
Real Bottom Line: I was wrong! Read more here and stay tuned to see what Tableau 10.0 has in store…
2. Unioning columns of different data types
This landmine is relatively new as unions were just introduced in Tableau 9.3. (By the way, I’m presenting a Think Data Thursday webinar on Unions in Tableau this week and another webinar on new features in Tableau 9.3 next week!)
What happens with Income when you union these three files?
It turns out that it all depends on which file you use to start the union.
Start with Originals.txt and the total income is $58,502.23:
Start with Prequels.txt and the total income is just $11,500!
So what happened? In the first case, Tableau understood the data type to be a decimal value (because the first values in the first file were decimal) and so all values came through. In the second case, Tableau understood the value to be whole (integer) and any subsequent decimal values came through as NULL!
Bottom Line #1: To avoid this landmine, double check your data. You can simply change the data type (even in the data connection screen, using the icon – the hash sign for number – for the data type) to tell Tableau specifically how to read values. Setting it to Number (Decimal) fixes the problem:
Bottom Line #2: This conclusively settles the debate. Start with the originals, then the prequels (if you must), then the sequels.
3. Fields where almost every value is a whole number
This landmine can remain hidden and blow up when you least expect it. It’s very similar to the above issue, in that it occurs when most values are whole numbers, but you have a few decimals. In this case, it can occur with a single file data source because Tableau will sample the top n rows of data to determine the data type. If all it sees are whole numbers, decimals later on will come in as NULL (similar to above).
Bottom Line: Double check your values. Change data type manually if needed.
I’m certain there are other landmines out there. Which ones have you encountered? Share your war stories in the comments below!
hi josh — great post and I also watched your you tube state of the unions webinar on unions which was very helpful.
Question regarding union of files like Peter’s monthly sales in your webinar where the data is formatted the same but varies by date or time period.
I want to union these files so I can now have one data viz showing changes in sales by month (ie a bar chart with Jan, Feb, Mar, etc) as x axis labels.
Do I need to add a column to each of my original files with the time period (ie for the data from March 2015 add a column called Date and each field would be March 2015?)
Or can I somehow use the new fields that tableau creates during the union process?
Thanks! and rew @ agxmarketing . com
Thanks! You can indeed use those fields! For example, if the the files were named something like 2016-01.csv, then you could use something like DATEPARSE(‘yyyy-MM’, [Table Name]) to get the date. Hope that helps!
Hi! Sometimes when most of values are Integer and some values String – Tableau shows GENERAL as values when String is chosen as column type… Or when most values are string and some of them are integer and you choose Integer… I could not find this GENERAL value descriptions in forums. Could you? Thank you, Den
I have encountered number 3. I had a column with some decimals and whole numbers. Tableau 10.4 was rounding up the values to 1. I made a csv version of the file and imported that instead. There were no issues with the CSV file. I would recommend using csv files instead of excel files because there seems to be issues with importing Excel files.