• Random Numbers (Even with Extracts)

     

    Ever wish you could use random numbers in Tableau?  Of course, when you are connected to a data source such as SQL Server or Oracle, you can use a RAWSQL function to pass through raw SQL statements, such as RAND(), to the underlying data source.  But what if you are connected to a data source that doesn’t include a random function or allow pass through SQL such as Excel, Access, or Salesforce?  And what if you need to extract your data?  Extracts could store a random number, but it will be a static random number and will only be “re-randomized” on the next extract refresh.

    There is a way!

    A number of approaches could be taken, but here, I’ve used the same method Microsoft uses for generating pseudo-random numbers in Visual Basic (See the full documentation here: https://support.microsoft.com/kb/231847)

    x1 = ( x0 * a + c ) MOD (2^24)

    where:
    x1 = new value
    x0 = previous value (an initial value of 327680 is used by Visual Basic)
    a = 1140671485
    c = 12820163

    The expression x1/(2^24) will then return the floating-point number between 0.0 and 1.0 that is returned by the RND function.

     

     

    When I re-write that as a table calculation it looks like this:

    ((PREVIOUS_VALUE(MIN([Seed])) * 1140671485 + 12820163) % (2^24))

    In the calculation above, [Seed] could be

    • A hard-coded value (such as 327680 used in VB) to give a consistent series of pseudo- random numbers
    • A parameterized value to allow the end-user to change the value or a URL to control the value
    • A randomized seed to give a different set of random numbers each time the data is refreshed.

    I’ve demonstrated a couple of the options in the above workbook.  The final step is to create an additional calculated field to return the desired range and precision.  I’ve chosen to return an integer from 1 to an Upper Limit parameter value:

    INT([Random Number] / (2^24) * [Random Upper Limit]) + 1

    Be sure to set the addressing of the of the calculated field to the level at which you want the randomization.

     

    Final Notes:

    1. I haven’t yet tested to make sure the Visual Basic algorithm is duplicated with precision.  It would be fairly easy to compare the results with output from VB for various Seed values.
    2. It would also be fair to test the distribution of random numbers to make sure it does indeed represent an equal distribution of values.  There is, of course, extensive statistical analysis available for the VB algorithm.  For now, I’m satisfied with a quick “visual check” using Tableau to graph the distribution of output random values.

3 Responsesso far.

  1. Tom says:

    Brilliant. With a little bit of tweaking this can be used for data sampling of an extract.

Leave a Reply

Your email address will not be published. Required fields are marked *