Python strikes again! More Python scripts for Tableau Prep

I wrote previously about some very simple, but useful, Python scripts for Tableau Prep (and start with that post to see how the scripts are called). There have been some others, such as Tom Christian, who have explored some other exciting use cases of Python scripts for Tableau Prep, such as using a python script as a data source for stock data. I’m not quite there yet (same disclaimer on being a Python scripting newbie and same request for corrections and advise in comments), but I had a few other useful Python scripts to share such as Title Case, filling down, and counting business days between two dates…

Python in Tableau Prep

Fill Down Missing Values

If you’ve ever had data like this:

Sparse Data requiring fill down
Wouldn’t it be nice to “Fill Down” the date from one row to the next?

or any case where a value appears on one row and then applies to subsequent rows, then you’ve wished for the ability to pull that value down into the following rows (often referred to as Fill Down). In Tableau, assuming you can order the rows and you only care about the final view, you might consider using a table calculation such as PreviousValue() or Lookup(). But table calculations aren’t available in Tableau Prep, yet (though I’m hoping!)

For now, Python scripts for Tableau Prep give us a possibility to fill in the data. In fact, it’s a very simple script because there is a fillna() method built-in to the dataframe object:

FillDown() function implementing the fillna() method

The function simply replaces the value of the Date field with the filled in values. You have various options for how to fill. Here I chose ffill (forward fill), but you could even work backwards if your data required.

A Python Script for Tableau Prep that converts a string to Title Case

Now we get a bit more complex in our Python scripting.

Title Case Python
Pythons sometimes exhibit Title Case characteristics. But that’s not larger letters in the middle of the snake…

Here’s a script to convert a string to Title Case in Tableau Prep. There are actually two functions. The first is ToTitleCase(), which is the Python function I will call from Tableau Prep:

def ToTitleCase(df):
    df['String'] = df.apply(TitleCase, axis=1)
    return df

This function sets the value of the field String to the results of applying the function TitleCase(). This makes use of a lambda function, allowing me to apply the following custom function:

def TitleCase(x):
     import re
     exceptions = ['a', 'an', 'the', 'at', 'by', 'for', 'in', 'of', 'on', 
'to', 'up', 'and', 'as', 'but', 'or', 'nor']

    if x.String is None:
        return None

    word_list = re.split(' ', x.String)
    capitalized_title = [word_list[0].capitalize()]

    for word in word_list[1:]:
        capitalized_title.append(word if word in exceptions else word.capitalize())
 
    return ' '.join(capitalized_title)

This piece of Python script imports the regular expressions module and uses that to step through each word, capitalizing it as long as it isn’t an exception. The results are what we want!



A Python Script for Tableau Prep that gets number of Business Days between two dates

One relatively common request I see on the Tableau forums is a way to emulate Excel’s NETWORKDAYS function to get the number of business days between two dates – excluding weekends and holidays. Tableau / Tableau Prep doesn’t have a built-in function, but you can use Python scripts!

Don't count Holidays in Python scripts for Tableau Prep
Pythons don’t often celebrate holidays in the wild, but it has been observed

The python script for determining holidays also consists of two functions. The first is GetBusinessDays(), is called from Tableau Prep (with fields named/renamed Date1, Date2, and BusinessDays in the flow prior to the script):

def GetBusinessDays(df):
     df['BusinessDays'] = df.apply(BusinessDays, axis=1)
     return df

This function sets the value of the field BusinessDays to the results of applying the function BusinessDays(). This also makes use of a lambda function, allowing me to apply the following custom function:

def BusinessDays(x):
     import numpy as np
     import datetime 

    date1 = datetime.datetime.strptime(x.Date1, '%Y-%m-%dZ')
    date2 = datetime.datetime.strptime(x.Date2, '%Y-%m-%dZ') 

    start = datetime.date(date1.year, date1.month, date1.day) 
    end =   datetime.date(date2.year, date2.month, date2.day) 

    days = np.busday_count(start, end) 
    return days 

This function uses the numpy module’s method busday_count to return the number of business days between Date1 and Date2.

Note: there’s a bit of extra fiddling with trying to cast the date and stripping out the time component that may not be necessary in the future. There is a bug in the current beta version where dates from Tableau prep come through as UTC but get converted to local time by the Python script. The code above should be even simpler when release comes!

I’ve barely scratched the surface of the power Python (and R) bring to Tableau.  Statistical models, predictive analytics, importing previously untouchable data… the possibilities are staggering!

Thoughts, questions, complaints? Leave a comment! I’d love to hear form you!

Leave a Reply

Your email address will not be published. Required fields are marked *