How to random dates

  • I have start date and end date fields in database where I want to scramble the dates. How to create random dates for start date and end date should be 1 day less than start date. For example start date be 10/2/1990 and end date will be 10/3/1990.

    Thanks in advance

  • One way would be to generate a random number, see this thread for one way, and then use start_date = DateAdd(day, random_number, getdate()) and end_date = DateAdd(day, random_number + 1, start_date)

  • I borrowed a bit of logic from Pinal Dave http://blog.sqlauthority.com/2007/04/29/sql-server-random-number-generator-script-sql-query/

    It will generate a random startdate and enddate between a range of dates. So in this example, setting @DateRangeStart to 1990, and @DateRangeEnd to now.

    There are plenty of ways to skin this cat. I have a feeling there might be a more efficient way, but if performance is not a huge concern, give it a whirl.

    DECLARE @DateRangeStart DATETIME

    DECLARE @DateRangeEnd DATETIME

    DECLARE @random INT

    DECLARE @Upper INT

    DECLARE @Lower INT

    SELECT @DateRangeStart = '1-1-1990'

    SELECT @DateRangeEnd = getdate ()

    ;WITH dates (startdate, enddate, row) as

    (SELECT CAST (@DateRangeStart as datetime), DATEADD (dd,1, CAST (@DateRangeStart as datetime)), 1

    UNION ALL

    SELECT DATEADD (dd,1,startdate), DATEADD (dd,1,enddate), row + 1 FROM dates

    WHERE startdate < @DateRangeEnd)

    SELECT Startdate, Enddate FROM dates

    WHERE row = ROUND((((select max(row) from dates)- 1 -1) * RAND() + 1), 0)

    OPTION (MAXRECURSION 10000)

    Todd Carrier
    MCITP - Database Administrator (SQL 2008)
    MCSE: Data Platform (SQL 2012)

  • Au4848 (5/13/2008)


    I have start date and end date fields in database where I want to scramble the dates. How to create random dates for start date and end date should be 1 day less than start date. For example start date be 10/2/1990 and end date will be 10/3/1990.

    Thanks in advance

    With varying times or just whole dates (midnight only)? And what do you want for a range of dates?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jeff just looking for whole dates !!!

  • Thanks Todd for your reply I will try it and let know how it goes

  • Million random date pairs in <30 seconds ok?

    SELECT SomeDate AS StartDate,

    SomeDate+1 AS EndDate

    FROM (

    SELECT TOP 1000000

    SomeDate = CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 )d

    You can use a similar formula in a straight update...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • One other thing... why do you want to scramble the dates in such a fashion?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    You're the master of set based thinking. Thanks for the contributions and the recent article on the tally tables. It is starting to form in my head (as you can see above... almost but not quite). Your posts are helping us all use SQL in the most efficient way.

    One question... what's the psuedo-code logic that you use in

    SELECT CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)

    How does that work?... the modulo? why dividing by 3653? and the addition of 36524?

    I was attempting this yesterday, and I guess my main confusion point is that I can cast and integer as datetime. Although, in the process of asking the question, I think I get it... the number of days since day #1

    SELECT DATEDIFF (DD, '1-1-1900',GETDATE ())

    SELECT CAST (39580 AS DATETIME)

    OK, so I understand 36524 start at 2001... but dividing by 3653? Help my pea-brain!

    Todd Carrier
    MCITP - Database Administrator (SQL 2008)
    MCSE: Data Platform (SQL 2012)

  • Todd --

    The short answer is that %3653 will produce 10 years of days. That's 365*10 + 2 for leap years.

    The thing to remember is that the modulo returns the REMAINDER of the division. Don't think of the fraction we see after the decimal point, rather think of it in terms of long division.

    Now that you have that in your head, think what is the largest remainder you will ever have?

    Consider x % y. The result will always be somewhere between 0 and (y-1).

    Make sense?

    GL!

  • John Beggs (5/14/2008)


    Todd --

    The short answer is that %3653 will produce 10 years of days. That's 365*10 + 2 for leap years.

    The thing to remember is that the modulo returns the REMAINDER of the division. Don't think of the fraction we see after the decimal point, rather think of it in terms of long division.

    Now that you have that in your head, think what is the largest remainder you will ever have?

    Consider x % y. The result will always be somewhere between 0 and (y-1).

    Make sense?

    GL!

    ..and in the same way - the 36524 represents the 100 years it takes to move from 1/1/1900 to 1/1/2000. So now - you are getting random dates in a 10-year range between 1/1/2000 and 1/1/2010.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Well done John and Matt! Exactly correct on both parts!

    Todd, here's the long answer... Funny thing... I'm actually writing a series of articles on the generation of constrained random data for use as test data... and, I have Matt Miller to thank for showing me a quicker way to generate whole numbers than the way I was (which was still damned fast!). Here we go...

    SELECT NEWID() returns the only "seed" that's guaranteed to be random, even across servers. As you know, it returns a GUID that looks like the following...

    C8F64E22-843F-45A0-BE13-8B40FB989027

    You could do a conversion to VarBinary and use RAND() on that result, but that takes 4 times longer than the method I used. Since we really want WHOLE days, we really need an random INT... CHECKSUM of a random GUID returns a random INT with very few possibilities of a dupe. For our purposes, the 1 in a billion chance of a dupe won't kill us... so the following returns a random INT...

    SELECT CHECKSUM(NEWID())

    Notice that a "random INT" may be a random negative number... the ABS ("Absolute" function) will force all negative numbers to be positive... so we end up with this as a "random positive integer generator"...

    SELECT ABS(CHECKSUM(NEWID()))

    Let's just check and generate 10 "RPI's" (Random Positive Integer) using that code (using Master.dbo.SysColumns as a "tally" table of sorts to provide a source of rows without a loop)...

    SELECT TOP 10

    ABS(CHECKSUM(NEWID()))

    FROM Master.dbo.SysColumns

    ... and that will return some like the following (numbers will be different because it IS random)...

    1013928630

    644557237

    1296524116

    928604416

    1136642647

    264331209

    248166678

    596092447

    39105552

    163661140

    The "range" of the smallest number is more than enough for most of what we want to get for a "number of whole days". And that's the next thing... how many whole days do you want to constrain the random whole date generator to? I wanted 10 years from 2000-01-01 to 2009-12-31... a little date math helps me figure that out...

    SELECT DATEDIFF(dd,'2000-01-01','2009-12-31')+1

    RESULT: 3653

    ... and that's where the 3653 comes from. The +1 comes from the subtraction error... if I have ten days numbered from 1 to 10, 10-1 would only return 9... to include both end-points, you have to add 1 back in like we did.

    We use Modulo to "constrain" or convert the output of the RPI generator to all the whole numbers from 0 to 3652 which is a range of 3653 whole numbers if you include the "0". I call this the "Range" number

    The final number is easy... it's the date serial for the 2000-01-01 or start date. It's easily calculated using more simple date math...

    SELECT DATEDIFF(dd,0,'2000-01-01')

    RESULT: 36524

    I call that number the "StartNumber".

    If you add the minimum range number of "0" to the StartNumber, you end up with 36524 which is the date serial number for 2000-01-01.

    If you add the maximum range number of 3652 to the StartNumber, you end up with 40176 which is the date serial number for 2009-12-31.

    After that, you just need to convert the Constrained Random Positive Integer (CRPI) that represents a date serial to a date...

    SELECT TOP 10

    CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)

    FROM Master.dbo.SysColumns

    RESULTS:

    2002-06-21 00:00:00.000

    2007-02-05 00:00:00.000

    2006-12-25 00:00:00.000

    2007-03-22 00:00:00.000

    2006-06-25 00:00:00.000

    2004-11-05 00:00:00.000

    2008-08-23 00:00:00.000

    2009-03-06 00:00:00.000

    2009-06-06 00:00:00.000

    2005-03-01 00:00:00.000

    Next is to do it a million times... we use a cross-join on a table that has at least 4000 rows in it even for a brand new installation to provide the capability of at least 16 million rows... surely enough for our 1 million row requirement...

    SELECT TOP 1000000

    CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    ... heh... sorry if I don't post THAT result...

    Finally, we use that as a source of 1 million random "StartDate"s... the OP asked for "EndDate"s that were 1 whole day later. Natural choice is a Derived Table or a CTE and we end up with the following as I posted...

    SELECT SomeDate AS StartDate,

    SomeDate+1 AS EndDate --1 Day more than the random start date

    FROM (--==== Derived table "d" returns 1 million random whole day start dates

    -- in a range from 2000-01-01 to 2009-12-31. Dupes are, of course,

    -- quite possible as with any random number.

    SELECT TOP 1000000

    SomeDate = CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 )d

    The "generic" formula for generating random whole dates (dates with a midnight time) is...

    CAST(ABS(CHECKSUM(NEWID()))%(desired_number_of_days)+(start_date) AS DATETIME)

    You can, of course, substitute various expressions (calculations) for the desired number of days and the start date. I just hardcoded my example so it wouldn't "blur" what the code was actually doing for people.

    Heh... did I miss anything? 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Absolutely brilliant. Thanks again. I've got a new wrench in the toolbox!

    Todd Carrier
    MCITP - Database Administrator (SQL 2008)
    MCSE: Data Platform (SQL 2012)

  • Thanks for the compliment, Todd... :blush:

    Au4848... are you all set?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff

    I'm not sure if I can use this because it took so long to run. I mean, I can think of better things to do with my seven seconds it took to produce 1M rows of data.

    You know I'm kidding, like posted earlier, a new tool to add to the tool box.

    Thanks

    Marvin Dillard
    Senior Consultant
    Claraview Inc

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply