How to handle new input files in Tableau Prep

New Periodic Files in Tableau Prep

How to handle new input files in Tableau Prep

New Periodic Files in Tableau Prep

Let’s say you’ve got a situation where you get new input files for Tableau Prep every month (or, maybe in your case it’s every week or every day or every so often). These aren’t files to be added to your data source in addition to existing files.  They are replacements with a brand new, up-to-date data set that needs to be processed with the same logic and flow every time.  How do you handle this scenario?  Here are some options:

1. Stick with the same filename month after month.

That is, literally replace the old file with the new input file of the exact same name (archiving the old ones is probably a good practice).  The first month you have a file named data.txt.  The next month you replace it with the new file also named data.txt.

  • Pro: all you have to do is just open the Tableau Prep, run the flow it and it should work fine.  In my experience, you may have to click the refresh button in Tableau Prep
  • Pro: when scheduling is released in future versions, you won’t have to do anything other than replace the file and let the flow keep running on it’s schedule.
  • Con: it’s a bit harder to keep track of the files and you may find yourself asking, “Did I replace it already?”  “Is that last month’s file or this month’s?”

2. Add the new input files to the directory and remove the old files

(or move the old file to archive directory)

When you open Tableau Prep or click Refresh, Tableau Prep will detect that the source file is no longer available and prompt you to locate it.  Simply point Tableau Prep to the new file instead.

  • Pro: fairly easy
  • Pro: you know which file you just selected, so no second-guessing
  • Con: a bit more manual, scheduling of flows (when available) won’t automatically pick this up
  • Con: the input node retains it’s name (e.g. “January” even though you just selected “February”) so a bit of confusion might ensue.

3. Add the new input files to the directory and swap out the input node

Here, you’ll add the new connection and replace the input node with the new file.

Swapping Inputs in Tableau Prep

  • Pro: There’s no question in your mind what you’ve done and the input node keeps the name of the file so the flow is showing you exactly which month/file you’ve used.
  • Con: Definitely a bit more involved than other options.
  • Con: Won’t allow any flow schedules to continue without intervention.

4. Just keep adding new input files to the directory and use features of Tableau Prep to filter to latest

The idea is that you’ll use a wildcard union to pull data from all the files, past and present. Then you can filter the File Paths field to keep only the newest:

Filtering to keep latest file in Tableau Prep

 

  •  Pro: You can compare new data to previous months if needed because it’s all there in the flow and you may quickly spot errors or differences that you might miss otherwise
  • Con: Definitely manual, forget setting a schedule and letting it just go
  • Con: higher potential for human error (“oops, I forgot to filter”)
  • Con: potential performance issues depending on size and number of files

4a. Modify the previous approach to automate selecting the latest file

Let’s assume your filename or something in the file (a timestamp or date field) allows you to identify the latest file.  Now it becomes a latest snapshot type of situation:

Automatic Filtering to keep latest file in Tableau Prep

You use an aggregate step to identify the MAX(File Path) or MAX(Timestamp), then join it back to the previous step to keep records that match that maximum.  Now you have a flow that automatically picks up the latest file each time without ever having to come back to change the flow!

  • Pro: Fully automatic, just drop the file in and run the flow (okay, so you have to click the refresh button to get the wildcard union to pick up the new file- I hope this changes in later versions.  Please upvote Owen Price’s idea here)
  • Pro: Should work without having to modify the flow when scheduling is available.
  • Con: potential performance issues depending on size and number of files

 

Some final thoughts on swapping new input files in Tableau Prep:

Personally, I’ve used option 1 and it works really well in cases where it’s just me and I don’t care that it’s not fully automated.  Option 2 is also attractive.  When a scheduling feature is released, I’ll be evaluating cases where option 4 makes sense.

With any of these approaches the assumption would be that all the fields (columns) in each new file stay the same.  If they change, you could run into the following:

  • Newly Added column – no problem.  it will just show up in the flow
  • Changed name – could break parts of the flow downstream.  You’ll have to fix it (potentially by just changing it’s name back to the original in the input node)
  • Removed column – could break parts of the flow downstream.  You’ll have to decide how to handle this one.  Might need to add a calculated field of same name and type to fill in values for the flow

The order of the columns shouldn’t matter, unless you have specified that the first row does not contain column names and thus Prep is naming the columns automatically based on position.  In that case, the file would need to maintain the same order of columns month after month.

What do you think?  Have you used any of these options?  What’s worked and what hasn’t?  Let me know in the comments!

-Joshua

Related Posts

4 thoughts on “How to handle new input files in Tableau Prep

  1. A trick I have used when I may want to clean files (.csv) that have different content but the same columns, is create a “input folder” and point Tableau Prep to that folder. I create a file that only has the column headings and include it in the folder. I called this file “master.csv”. I then select the “master” file as my input and use the wildcard union for the other files. As long as I do not move the “master.csv” file from the input folder, I can swap in and out the other files and have Tableau Prep update the files. Again, this only works in the files being swapped in and out have the same columns.

  2. One more – slightly ugly but it works.
    My datasource was an Excel spreadsheet with the date in its filename.
    While Tableau Prep wasn’t running, I renamed the source file.
    Then I launched Tableau Prep and opened the flow.
    The flow opened with red exclamation points over the connection and all tables.
    Error… unable to connect to file…
    In the Connections column, select the spreadsheet and choose Edit.
    The program then displays a File Open dialog box which will allow you to select a different file.
    Since the new file has the same structure, everything downstream continued to work.
    (And going forward, with this new filename, I’ll be following method #1)

Leave a Reply

Your email address will not be published. Required fields are marked *