Data Prep and the New Tableau 9 Data Interpreter

TwitterFacebookGoogle+Share

The Tableau 9 Data Interpreter and other new data prep options, such as pivot and split, open up new possibilities for connecting to data.  No longer will you have to spend all your time fixing data structure and cleaning up headers, footers, and merged cells in Excel.  Tableau 9.0 makes that (and more) so much easier so you can get to asking and answering questions and visualizing the data!

The video below is first in a series I will be doing for Teknion Data Solutions. In this video I’ll demonstrate the new data preparation features of Tableau 9, including the new data interpreter.

<shameless plug> I have been working as a consultant with Teknion Data Solutionsfor over 10 years and love it! If you enjoy Tableau (or any aspect of BI) and love to help others, consider joining Teknion’s team of consultants. </shameless plug>

I had a great time presenting this feature and others at DFW TUG! What a turn out!  Thanks to everyone who came and for those who made it come together.  If you were there (or even not) and have any additional questions leave a comment on this post and I’ll gladly try to answer.  I also presented on other new Tableau 9.0 features, including:

  • New Data Connection Options (Pivot, Split, Data Interpreter)
  • Level of Detail (LOD) calculations
  • Regular Expressions
  • New Calculation Editor & Ad  Hoc Calculations
  • Drag and Drop Analytics
  • Fun and Useful Enhancements (color picker, thumbnail preview, map search, new selection options)
  • Tableau Server Updates

And, now, the video:

(Subscribe to Teknion’s YouTube channel to get updates as new videos are posted)

One Trick to Hide Titles in Tableau

Sometimes you want to customize titles or even hide titles in Tableau.   When shown on a dashboard, titles will always appear, even when you filter out all the data in the view:

The Title is not hidden, but stays when data is filtered.

Notice how the  title stays, even when all the data is filtered and the view isn’t drawn.   This can be unsightly, especially in dashboards where I want to guide the user through some exploration.  Who wants to see titles of things that aren’t meant to be seen until you take an action?

 

One way to hide titles in Tableau

1. Create a calculated field with hard-coded string:

Customizing a Title with a CalculationI used the aggregation of MIN here to force it to be an aggregation to avoid it interfering with table calcs. (I also suspect that this is would be more performant than a row-level calculation in most cases, but I’m not sure. If you know, please leave a comment…)

2. Place the resulting calculated field on Columns to give you a custom “title” at the top of the view.

3. On the dashboard, you can hide the real title so that everything disappears when there is no data:

Custom Title that will be hidden

That’s not really a “title” it’s just a column header with some formatting.  Here are the steps I took to get it to look like a title:

  • Right-click the column header and select Format
  • Change the font size
  • Switch to Borders on the format window and adjust the level of Column Headers.

And now, the title is hidden when the view has no data!Title is hidden!

That’s it for this Tableau tip.  There’ll be more soon.  Subscribe to keep up-to-date!

Tableau Mapping Tip: Geographic Drill Down

Tableau Tip Tuesday

Previously,  we saw how to capture and use Tableau’s automatically generated latitude and longitude for custom geocoding.  Now, we’ll extend that technique to give us the ability to drill up and down on various levels of geographic detail in a single visualization:

Drilling down and up again on geographic levels of detail.
Drilling down and up again on geographic levels of detail.

 

The data is fairly standard.  We have Regions which contain 2 or 3 Areas which contain multiple Zip Codes:

The data

Only Zip Code is recognized as a standard geographic field.  Tableau is nice enough to supply the fields Latitude (generated) and Longitude (generated) based on the Zip code.  However, we can’t use those fields in calculations or blending.  So, we’ll capture them in a separate data source (along with the associated Region, Area, and Zip Code) just like we saw in the previous tip.

Data Blending to Control the Level of Detail

Now we can blend between the two sources on any of the fields they share in common:

We can blend on Area, Region, and/or Zip Code to control the level of detail.  The AVG Lat and Lon from this data source will give us the geographic center of the level of detail in the blend.
We can blend on Area, Region, and/or Zip Code to control the level of detail. The AVG Lat and Lon from this data source will give us the geographic center of the level of detail in the blend.

 

But why set the level of detail of the blend ahead of time?  This gets much more exciting if we create a parameter and a couple of calculated fields to allow the end-user to control the level of detail of the blend.

Here the parameter:

Parameter
We’ll let the user select Region, Area or Zip code. Typically, I’ll use an integer parameter with the Display As text. It’s more efficiently calculated and if I want to change the display, I don’t have to go change a bunch of calculated fields.

Then, I’ll create a calculated field that returns the desired dimension based on the parameter selection:

Calculation

We’ll create this calculation in both the primary and secondary data sources.  Why?  So we can blend on that field instead of any of the others:

Blending with Parameter

 

And now, the level of detail of the blend changes with the user selection.  Instant geographic drill down (and up again!)

Here it is in action:

Poor Man’s Custom Geocoding in Tableau (Part 2)

Tableau Tip Tuesday

Last time we looked at a simple way to implement custom geocoding in Tableau using the ability to assign latitude and longitude to unknown locations.  That’s a fairly good option if you only have a few locaitons and only need to plot marks on a map or two in a single workbook.  But what happens if you have a lot of locations?  Or what if you want to reuse the solution in multiple workbooks?

There is a way!  We’ll use the tried and true Superstore sample data to illustrate.

This data set includes some fields with recognized geographic roles such as City, State, and Postal Code (Zip), but it also includes a Region field that does not have a geographic role.  Now, this is purely an example and this data set only has 4 regions — so it wouldn’t be too difficult to lookup latitudes and longitudes and plug them into Tableau like we saw last week.

(by the way, Shawn Wallwork got me to thinking and I realized you don’t need to leave Tableau and go to Google to find out the lat/lon for a coordinate on a map.  Just right-click the map and add a point annotation then Include latitude and longitude to see where that point is.)

But what if you had ten regions?  Or 15 territories?  And what if you want to reuse the results in more than one workbook?

Well, the good thing is, Tableau is ready to help!  We just have to coax it a little…

Get the Data Set with Latitudes and Longitudes

I’ll start by creating a geographic view that includes Region but also a few other geographic fields so Tableau will use the Latitude and Longitude fields it automatically generates:

Region with Zip Codes

Here I have a mark for each Postal Code (Zip code) color coded by Region.  You’ll notice that Regions contain multiple zip codes and a single zip code is contained in only one region.  This data is really clean — but this technique will work even if the data isn’t quite so nice.   Just have some kind of geographic relationship.

Here’s the trick — we’re going to capture Tableau’s values for latitude and longitude — along with the associated Region and then create another data source with that data.

I press Ctrl + A to select all the marks on the map (or Area select them) and then right-click a mark and select “View Data”.  And there it is, the data set I want!

Lat and Lon for all Zip Codes and Regions
Lat and Lon for all Zip Codes and Regions

Now I have every zip code within a region along with the associated Latitude and Longitude.  Now, simply press Ctrl + A to select all rows, then Copy, then close the dialog box, then Ctrl + V to paste back into Tableau as a new data source!

New Data Source In Tableau

And now, the magic:

Use Data Blending to use the Latitudes and Longitudes

We’ll start a new sheet with our original data set and put Region on the Marks card to define the level of detail.  Then, we’ll switch to our new data source and bring out Latitude and Longitude to Rows and Columns.  We’ll use an aggregation of AVG for both.  Guess what? When we blend on Region, that’s the geographic center of all the zip codes within that Region!

Finished

And there you have it!  You now have an easy way to plot marks for any number of regions or territories or unknown geographic field based on the geographic center of a known geographic fields.

Like it?  Let me know in the comments and also check back next week for one more step that makes this even more exciting!

Tableau Tip Tuesday: Poor Man’s Custom Geocoding (Part 1)

Tableau Tip

Tableau has outstanding built-in geocoding capabilities.  If you have countries, states, zip codes, congressional districts, statistical areas, etc… in your data then Tableau knows where to plot marks on a map.  But there are times when you have data that doesn’t match up with standard geographic locations.  There are great tools and techniques out there for generating geocoded data or custom polygons.  But over the next few weeks, I’m going to look at various simple techniques that work when you don’t have the time or resources to get too fancy or you simply just need to get the job done.

For example, let’s say you are working with an NGO in Papua New Guinea and want to track shipments of supplies to three different regions in the country.  Your data looks like this:

Data

No states or counties, etc… only Regions.  And those won’t be recognized as anything geographic.  Indeed, Region comes through as a text field and there are no automatically generated Latitude or Longitude fields:

Data Window

But set the geographic role of the field (right-click Region and select Geographic Role > Area Code).  We’ll use area code because there’s no possibility that our regions are valid area codes.  Now we see Latitude (generated) and Longitude (generated) as new fields in the data window and double clicking Region gives us a Map!

Map

Of course, Tableau lets us know via an Indicator that three of the area codes are “unknown.”  Well, of course they are!  But we can fix it…  Just click the Indicator and then Edit Locations…

Then, just enter the latitude and longitude for your regions:

Edit Locations

It’s a good thing you’ve memorized latitudes and longitudes for various locations in PNG!

 

Wait… you haven’t?

 

Oh.  Well, you could always use Google Maps (or Bing).  Just find a location that represents where you’d like your Region to be, right click and ask Google, “What’s Here?”

What's Here?

And Google will tell you (and give you the latitude and longitude, which you can select and copy/paste)

Repeat that a couple more times, plug the results into Tableau, and you can now plot your data exactly where you want it.

 

 

Touch it up with a little transparency, some borders, and notice the dual axis that let me plot text in the center of the circles.  And there you have it!  A simple technique to get things mapped where you want.

Finished Map

 

Now, you’ll ask, “What if I have a lot of things to plot and I don’t want to look up all the latitude/longitude values?” or “What if I want to use this in multiple workbooks and I don’t want to have to edit locations each time?”

 

Great questions… you’ll just have to check back next Tuesday for the next tip…

Other Favorite New Tableau 9.0 Features

I recently blogged about my favorite Tableau 9.0 feature: LOD Calculations.  This feature really blows open the possibilities for analysis and answering complex questions.  But there are dozens of little features, enhancements and tweaks that might initially go un-noticed.  And they make a huge difference in the look, feel, and overall experience.  These are some of my favorites:

You know you want to sing it...

Go ahead and sing it (to the tune of “Raindrops on roses and whiskers on kittens…”) The guy in the next cubical won’t mind…

 

  • Responsive Tooltips: I remember seeing this feature proposed by Tim Wahl (http://community.tableausoftware.com/ideas/2643) and thinking it would be impressive. Well, the Tableau developers implemented the idea and it’s even more impressive in action.  And they kept the option for using the old wait for hoverI love it when developers don’t assume new ways are always best in all circumstances.  The Tableau engineers are the best!

Tooltips

 

  • Speed: Everything seems faster. The UI is more responsive, the startup time is shorter, parallel query processing dramatically speed up dashboards, and quite a few behind the scenes engine enhancements make for an overall pleasant experience.  It feels like I’ve upgraded my computer hardware just by upgrading to Tableau 9.0.

 

  • Speed: Not only speed of the platform, but the speed of design is faster too. Need a reference line?  You used to have to right-click an axis, select Add Reference Line, and then select your options.  Now: switch to the Analytics Tab and you have drag and drop analytics!  Need a quick calculation?  You used to have to right click in the data window and select Create Calculation, write some code, click OK, find the new calculated field in the data window and then drop it into the view.  Now: double click a blank space in Rows, Columns, or Marks and type some code to create an Ad-hoc calculation.

ad-hoc calculations

The new 9.0 calculation editor, which can stay open while you edit the view is an incredible feature too!

 

  • Home Screen: It looks sharp and polished.  It is also slick and responsive.  Thumbnails change sizes and you see more or less depending on the size of the window.  It’s easy to quickly connect to data or open workbooks.

home screen

 

 

  • Data Connection: The new data connection menu is polished and sports a new search box (important when new types of connections are added with each new release! I remember when the Google Analytics connection was new…)

 

  • Data Preparation: We’ll soon be saying, “Remember when you had to use the Excel ad-in to reshape your data? (Oh, and sorry, Mac didn’t have it.)” No longer.  Now the ability to pivot and reshape data from Excel is built-in.  Throw in the new Data Interpreter which figures out excessive headers, footers, and merged cells, along with the ability to split fields and data structure issues become a lot more manageable.

 

  • Color Picker: Love the new design and especially the ability to pick a color on screen.  Ever want to easily and exactly match your website colors for embedding a view?  Now you can!

Color Picker

  • Sheet Preview: Ever forget whether you wanted to insert Sheet 18 or Sheet 19 in the dashboard?  Hover over a sheet tab at the bottom or on the left side-bar when designing a dashboard or story and see a thumbnail preview.

 

  • Tableau Server: This one deserves its own blog post. I work with Desktop day by day it’s easy to overlook Tableau Server except when I want to share my work.  But the developers have shown Server the respect it deserves.  The interface is completely redesigned and the experience is very pleasant.  You’ll especially love the update if you have to administer users, groups and permissions.

 

 

What new features are your favorites?  Tell me in the comments below.

Tableau Tip Tuesday: Always Manually Backup Tableau Server Before Upgrading

Tableau Tip

Last night I upgraded a Tableau Server for one of our clients1.  They weren’t ready to go to 8.3 yet (they’re holding out for 9.0,), but we had agreed to upgrade from 8.2.1 to 8.2.7. Since an upgrade requires an uninstall of the existing Tableau Server, we also agreed to do it after hours.

When you uninstall Tableau Server it makes a backup of the internal database (including workbooks, extracts, users, and settings) and when you install a newer version it restores everything so the new version is ready to go with everything the old version had.  Usually…

Backup
(You probably see where this is going…)

 

Last night, I downloaded the installation package, ran a manual backup of Tableau Server, uninstalled the server, and ran the installation package.  But about half-way through the installation, I realized I’d made a mistake.  I had downloaded the latest installation package, which was 8.3 instead of the desired 8.2.7.

I rolled back the installation, got the right package and re-installed.  Everything seemed to go smoothly… until the install finished and I logged into the server – there were no workbooks, no extracts, and no users (except me).  It was a clean install.  Rolling back the installation had lost the automatic backup.

I sure was glad to have the backup of the server I had just run manually.  With that backup, it took only one command to restore the workbooks, extracts, users, and settings.  If I hadn’t made the backup, I would still be picking up the pieces this morning2.

It may be tempting to rely on the fact that Tableau Server backs up on uninstall and restores on re-install – but don’t!  Always make a manual backup prior to an upgrade.
The command is simple:

tabadmin backup backupfile.tsbak

 

And if you need to restore:

tabadmin restore backupfile.tsbak

 

You can learn more about backup and restore here.


1 I have been a consultant for Teknion Data Solutions for over 10 years!  It’s a great company and we’re hiring!

2 Of course, I have regular backups scheduled too, so it might not have been too bad.  ( that’s another tip :) )


 

Check back each Tuesday for another Tableau Tip…

Tableau Tip Tuesday: Don’t Use Table Calcs

Tableau Tip

No, not really!  I love table calculations! They are powerful and very useful.  They are one of the best things in Tableau.  Don’t be afraid of them and don’t avoid them, except…

  • Avoid table calculations when a simpler aggregate calculation will work.
  • Avoid aggregate calculations when a simpler row level calculation will work.

Why?  Because you lose control, flexibility, and efficiency.  When I first started to get the hang of table calcs, I found myself using them to solve every problem.  But that’s not the best approach.  The bottom line for this tip: keep it simple!

“Your table calc complexity will only grow.  There is beauty in it, but also great danger.”
“Your table calc complexity will only grow. There is beauty in it, but also great danger.”

Row Level Instead of Aggregate

If you use an aggregate calculation, when you could have used a row level, then you will never* be able to use row level logic on that calculation.  Additionally, aggregate calculations must be computed at run-time.  Row level calculations may be materialized in an extract, meaning that they are only computed once.

*the only exception is when you use a Set or fixed Level of Detail calc

Example:
You want the cost of goods sold when you have sales and profit.

Don’t do this:

Cost of Goods Sold as an Aggregate Calculation

You won’t be able to change it to an average cost of goods sold or compare a minimum and a maximum in the same view.

Instead, do this:

Cost of Goods Sold as an Row Level Calculation

As a row-level calculation, you’ll be able to specify the aggregation in the view.  It might also get materialized (computed once and stored) in an extract which means better efficiency.

Exception: When you specifically want to limit a designer using a published connection or web editing to a specific level of aggregation (e.g. you always want the designer to use average Cost of Goods Sold.)

Aggregate (or Row Level) Instead of Table Calculation

If you use a table calculation when you could have used a simple aggregate, then you introduced a level of complexity that was not needed and made it harder for others to understand.  Furthermore, table calculations are calculated based on field arrangement in a view. So any changes to the level of detail or layout of the view can fundamentally change the way they work.  Simple aggregates are done at the level of detail defined by the dimensions in the view, but the fundamental calculation remains unchanged when you rearrange, add, or remove dimensions.

Example:
You want to add a reference line for the SUM of Sales for Appliances across the table to give a visual benchmark.

Sales by Category

 

Don’t do this:

Table Calc to get SUM(Sales) for Appliances (assuming it's the first category)

Even if you can make the assumption that Appliances will always be first in terms of Category, how can you be certain that it will always be first based on the scope and direction of the First() table calculation when other dimensions are introduced?

Instead do this:

Aggregate to get SUM(Sales) for Appliances

As a simple aggregate, the calculation is less brittle and can be performed at the data source level.

Or, do this:

Row Level Calculation to get Sales for Appliances

Each row will keep the value for sales if it was an “Appliances” row (and NULL by default otherwise).  As a row level calculation, the results could potentially be materialized in an extract and is the most flexible.  You can choose to aggregate later.  And you can choose how.

A reference line at the value of Sales for Appliances

When creating calculated fields, always look for the simplest approach.  Check back next Tuesday for another Tableau Tip!

My Favorite Tableau 9.0 Feature

Tableau 9.0 has moved from alpha to beta and I’m loving it!  There are many new features to enjoy:

  • Improved calculation editor
  • Ad-hoc calculations
  • New functions (like regular expressions)
  • Data interpreter and cleanup tools like pivot and split
  • Drag and drop analytics
  • Map Search
  • New selection options (radial, lasso)
  • Responsive tool-tips
  • Formatting for Story Points
  • And more!

Many of these are incredibly cool and make Tableau that much more smooth and easy to work with.  I’m sure I’ll blog on many of these in the coming weeks.

But there is one new feature that is a true game changer

(I don’t use the term lightly)

Level of Detail Calculations

Previously, in Tableau, every aggregation was done at the level of detail based on dimensions in the view.  You could roll things up using totals and subtotals.  You could even get a little fancy using sets or table calcs, but these all had their limits.

Here, SUM(Sales) is calculated at the level of detail of Department / Category
Here, SUM(Sales) is calculated at the level of detail of Department / Category

No longer!

Now you can use Level of Detail aggregations.  Here are some basic examples:


 

Fixed: calculates the aggregation at the level of detail specified by the list of regardless of any dimensions in the view.

{FIXED [Department] : AVG([Sales])}

The above code calculates the average sales per Department, regardless what other dimensions are in the view.

Include: calculates the aggregation at the level of detail specified by the dimensions in the view and the dimensions included in the code.

{INCLUDE [Item] : AVG([Sales])}

The above code calculates the average sales at the level of detail defined in the view but includes the dimension Item, even if Item is not in the view.

Exclude: calculates the aggregation at the level of detail specified by the dimensions in the view, excluding any listed in the code.

{EXCLUDE [Category] : AVG([Sales])}

The above code calculates the average sales at the level of detail defined in the view but does not include Category the dimension as part of the level of detail, even if Category is in the view.


 

How about a real-life example?

Binning By Aggregate

You may remember that I had previously shared four possible ways of Slicing by Aggregate.  This is when you ask questions of “How many had how many?”

  • How many patients had how many readmissions? (e.g. I want to know that 500 patients were readmitted 1 time, 335 were readmitted 2 times, 23 were readmitted 3 times, etc…)
  • How many students had how many classes?
  • How many products had how many sales?

These types of questions were very difficult (almost impossible) to answer in Tableau without custom scripting or using some crazy options (non-Cartesian visualizations, table calculations, sets) because you have to aggregate the first answer and then use that to slice (or bin) another aggregate.

Let’s say I have a list of students and classes they are taking:

Students and Classes

I want to know how many students took how many classes (e.g. 9 students took 6 classes, 5 students took 5 classes, etc…)

In steps Level of Detail (specifically fixed) and solves my problem.  I simply create a calculation like this:

Classes per student

The field shows up as a Measure by default (but I can actually re-characterize fixed level of detail calcs as dimensions!)  Now I just create a bin (right–click the new field and select Create > Bins)

Create bins

And then I can build the view that answers my question:

Answer

This solution doesn’t have any of the drawbacks of the previous work-arounds.  It has nice UI, plays nice with action filters, and isn’t incredibly complex.

And this just barely scratches the surface of what Level of Detail calculations can do!  I’m sure we’ll be seeing a lot more of them in the future.

Tableau Tip Tuesday: The First Question to Ask

Tableau Tip

If you are using Tableau for data analysis, there is one question you really should answer for any data source before you can do any analysis and be 100% certain you are getting the results you think you are getting.  Here it is:

What is this number?
Number of Records

Or, more specifically, what is one record out of the 9,426 records in the data?  When you connect to a new data source in Tableau, before you do anything else, build the view above (simply drag and drop Number of Records – Tableau’s internally generated field that gives you the number of underlying records – onto the view) and say to yourself: “I have 9,426 records.  I have one record for every _____”  and fill in the blank.

I have one record for every:

  • Customer
  • Customer for every month
  • Website visit
  • Website visitor
  • Web page view
  • Product
  • Order of a product
  • Line item for every order of a product
  • Etc…

 

If you can’t answer the question:

  • Ask the team who developed the data model
  • Look for documentation of the source
  • Use Tableau to do some data discovery and figure it out for yourself

 

 

Why you need to answer the question:

  • So you will know if you have the right data to answer the questions you’d like to ask or if you need to look for or create other sources, re-structure the data , roll-up the data via extract, etc…
  • So you will know which aggregations, calculations, and filters give the right answers:
  1. Have a record for every unique customer and want to know the overall number of unique customers? It’s Number of Records or COUNT(Customer ID)
  2. Have a record for every unique customer per month and want to know the overall number of unique customers? It’s COUNTD(Customer ID)
  3. Have a record for every unique customer per month per store where the same customer might have different records (and different IDs) for different stores and want to know the overall number of unique customers? It’s COUNTD(Customer SSN) – or whatever uniquely identifies an individual.
  4. Have a record for every apartment available to rent and want to know square feet per city?  It’s SUM(SqFt) sliced by city.
  5. Have a record for every apartment available to rent for each month and want to know square feet per city? It’s AVG(SqFt) sliced by city or MIN(SqFt) sliced by city or MAX(SqFt) sliced by city, or SUM(SqFt) sliced by city and filtered to the current month, or use a different data source – it all depends on the exact question.
  • So you will understand when data blending is possible/necessary, when it isn’t, and how it will work with your data sources.
  • So you will know at what level row-level calculations are operating and that will keep you from unnecessarily complicating things by jumping to aggregate or table calculations.

 

That’s it.  That’s the first question.  It’s answered when you know the level of detail for your data source!  You’ll never be sorry to take a moment to answer it.

 

Check back each Tuesday for a new #Tableau Tip!

Tableau Tips and Tricks ● Story Telling ● Beautiful Data Visualizations