I’ve been hearing about DataRobot for a while now (in fact, Teknion Data Solutions, where I’ve been a consultant for nearly 15 years, recently became a DataRobot partner) but I hadn’t had the chance to be involved in actually using DataRobot…
…until now!And in conjunction with my colleague and fellow Tableau Zen Master, Bridget Cogley, I was able to leverage Tableau’s visual analytics and Tableau Prep’s visual data transformation to begin to understand a bit of how DataRobot generates predictions and how the data impacts those predictions. And I’ve started to peel back the layers of automated machine learning and artificial intelligence. It’s not a magic black box! There’s a science and an art that can be, and should be, seen and understood.
(And, if you want to see all of this come together, from building the predictive models in DataRobot to exploring the predictions, reasons, and features, to an actionable – and beautiful – Tableau dashboard, join the webinar on Wednesday at 1pm Eastern)
Training Data, Scored Data, and Features
DataRobot takes two kinds of data sets: a training data set and a data set to score. The training data set is often historical data – where the outcome has occurred and is known. The data to score is usually current data where the outcome has not yet happened and we want to know the probability that it will. Both data sets are nearly identical in structure, except the data set to score does not contain any result. DataRobot allows you to then build and select predictive model(s) based on the training data set that will be used to score the “to-be-scored” data sets. The final output of feeding a “to-be-scored” data set through the model is a data set with a probability score, prediction, and various “Reasons” for why the probability was assigned.
It’s tempting to give the end-user of your Tableau dashboard a list of predictions and ask them to implicitly trust them without context or reason – but that’s neither fair nor realistic. Knowing this, I watched my first demo of Data Robot demo specifically looking for insights that I would be able to communicate to end-users of Tableau dashboards. I wanted to be able to communicate some context, explain why the predictions were made, what were the reasons, indications or even contraindications for a given prediction.
And so, when I saw the Features ranked in DataRobot, I instantly recognized something that might communicate well to end-users looking for explanations and context:
The Features are actually columns in the training data set; the values of which offer weight toward a prediction score. Based on the model you select in DataRobot, you can see the relative importance of the features towards the final prediction and probability score.
This is exactly the kind of context that would be useful to an end-user to understand what potentially drives predictions and to help them gain trust in the predictions. And with the ability of DataRobot to export the features to a .csv, I knew there was a lot of potential for enriching the data in Tableau!
Restructuring the DataRobot data with Tableau Prep
The first step was to bring together various aspects of the data and restructure it in a way where I could more deeply explore it in Tableau. While I could have used any number of tools (one of my colleagues used Alteryx, for example, to automate running the DataRobot model), I chose to use Tableau Prep! Here’s what my flow looks like:
My goals was to take both the training data and the scored data and bring them together into a single data set. I accomplished this with a Union. The resulting data is still a single record per patient. Now I just have a mix of historical (training) and predicted (scored) records.
I cleaned up a few things (for example, the features are just the columns from the data sets – but these didn’t have very user-friendly names – so I fixed that) and then restructured the data a bit and created two resulting outputs:
- The Features output contains a record for each feature for each patient. Basically it’s just a pivot of all the columns that make up the feature list. This will allow me to compare the Feature Impact .csv export from DataRobot with the value of the actual data.
- The Reasons output contains one record for each Reason for each patient. Reasons in DataRobot are actually multiple fields (Reason 1, Reason 2, etc…) and each reason has various attributes such as a description and a strength – the positive or negative value indicating whether the reason weighed in favor of or against the prediction. Pivoting these will allow me to order them in Tableau based on strength and will allow me to potentially filter to records with a given reason more easily in a dashboard (as prior to the pivot, the reason might have existed in any one of multiple columns).
With my new data structures, I was ready to tackle an exploration of the predictions in Tableau and start working towards an end-result.
Exploring the Predictive Analytics in Tableau
Leveraging the restructured data and some of the feature exports from DataRobot, I was able to create a series of visualizations and dashboards that gave me insight into the data and reasons for the predictions. It also gave me confidence in the predictive model. Being able to see it and slice it in various ways allowed me to understand why certain predictions were made – or not.
But beyond that, it helped me start to understand what elements might be useful to communicate to various audiences of the dashboards. For example, when I considered how many patients had historically been readmitted versus not readmitted for various values of Number of Inpatient Visits, I realized the potentially importance of communicating the historical readmission rate as part of the reason why a certain patient might have a higher probability. A doctor or nurse in charge of discharging patients might need to know more than just a percentage value. They need to have insight into why – and what might be done to prevent a patient from returning to the hospital.
Communicating Actionable Insight to the End User
My data exploration moved towards some insight that would be important to share with end-users and I even created a very rough sketch of an actionable dashboard to aid in seeing which patients are likely to readmit and why.
But moving seamlessly from analysis and insight to beautiful and useful design is something Tableau Zen Master Bridget Cogley does incredibly well. See how she takes the end-result to the next level here:
- Just the dashboard (smaller size): https://public.tableau.com/profile/bridget#!/vizhome/DRTableauDashboardOnly/DataRobotTableau
- Storypoints (best downloaded & designed for large screen display): https://public.tableau.com/profile/bridget#!/vizhome/PredictiveandProactive/PredictiveandProactive