If you are using Tableau for data analysis, there is one question you really should answer for any data source before you can do any analysis and be 100% certain you are getting the results you think you are getting. Here it is:
What is this number?
Or, more specifically, what is one record out of the 9,426 records in the data? When you connect to a new data source in Tableau, before you do anything else, build the view above (simply drag and drop Number of Records – Tableau’s internally generated field that gives you the number of underlying records – onto the view) and say to yourself: “I have 9,426 records. I have one record for every _____” and fill in the blank.
I have one record for every:
Customer for every month
Web page view
Order of a product
Line item for every order of a product
If you can’t answer the question:
Ask the team who developed the data model
Look for documentation of the source
Use Tableau to do some data discovery and figure it out for yourself
Why you need to answer the question:
So you will know if you have the right data to answer the questions you’d like to ask or if you need to look for or create other sources, re-structure the data , roll-up the data via extract, etc…
So you will know which aggregations, calculations, and filters give the right answers:
Have a record for every uniquecustomer and want to know the overall number of unique customers? It’s Number of Records or COUNT(Customer ID)
Have a record for every uniquecustomer per month and want to know the overall number of unique customers? It’s COUNTD(Customer ID)
Have a record for every uniquecustomer per month per store where the same customer might have different records (and different IDs) for different stores and want to know the overall number of unique customers? It’s COUNTD(Customer SSN) – or whatever uniquely identifies an individual.
Have a record for every apartment available to rent and want to know square feet per city? It’s SUM(SqFt) sliced by city.
Have a record for every apartment available to rent for each month and want to know square feet per city? It’s AVG(SqFt) sliced by city or MIN(SqFt) sliced by city or MAX(SqFt) sliced by city, or SUM(SqFt) sliced by city and filtered to the current month, or use a different data source – it all depends on the exact question.
So you will understand when data blending is possible/necessary, when it isn’t, and how it will work with your data sources.
So you will know at what level row-level calculations are operating and that will keep you from unnecessarily complicating things by jumping to aggregate or table calculations.
That’s it. That’s the first question. It’s answered when you know the level of detail for your data source!You’ll never be sorry to take a moment to answer it.