May 13, 2008 at 6:55 am
I have start date and end date fields in database where I want to scramble the dates. How to create random dates for start date and end date should be 1 day less than start date. For example start date be 10/2/1990 and end date will be 10/3/1990.
Thanks in advance
May 13, 2008 at 7:10 am
One way would be to generate a random number, see this thread for one way, and then use start_date = DateAdd(day, random_number, getdate()) and end_date = DateAdd(day, random_number + 1, start_date)
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
May 13, 2008 at 3:56 pm
I borrowed a bit of logic from Pinal Dave http://blog.sqlauthority.com/2007/04/29/sql-server-random-number-generator-script-sql-query/
It will generate a random startdate and enddate between a range of dates. So in this example, setting @DateRangeStart to 1990, and @DateRangeEnd to now.
There are plenty of ways to skin this cat. I have a feeling there might be a more efficient way, but if performance is not a huge concern, give it a whirl.
DECLARE @DateRangeStart DATETIME
DECLARE @DateRangeEnd DATETIME
DECLARE @random INT
DECLARE @Upper INT
DECLARE @Lower INT
SELECT @DateRangeStart = '1-1-1990'
SELECT @DateRangeEnd = getdate ()
;WITH dates (startdate, enddate, row) as
(SELECT CAST (@DateRangeStart as datetime), DATEADD (dd,1, CAST (@DateRangeStart as datetime)), 1
UNION ALL
SELECT DATEADD (dd,1,startdate), DATEADD (dd,1,enddate), row + 1 FROM dates
WHERE startdate < @DateRangeEnd)
SELECT Startdate, Enddate FROM dates
WHERE row = ROUND((((select max(row) from dates)- 1 -1) * RAND() + 1), 0)
OPTION (MAXRECURSION 10000)
Todd Carrier
MCITP - Database Administrator (SQL 2008)
MCSE: Data Platform (SQL 2012)
May 13, 2008 at 6:20 pm
Au4848 (5/13/2008)
I have start date and end date fields in database where I want to scramble the dates. How to create random dates for start date and end date should be 1 day less than start date. For example start date be 10/2/1990 and end date will be 10/3/1990.Thanks in advance
With varying times or just whole dates (midnight only)? And what do you want for a range of dates?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2008 at 10:11 pm
jeff just looking for whole dates !!!
May 13, 2008 at 10:12 pm
Thanks Todd for your reply I will try it and let know how it goes
May 13, 2008 at 10:41 pm
Million random date pairs in <30 seconds ok?
SELECT SomeDate AS StartDate,
SomeDate+1 AS EndDate
FROM (
SELECT TOP 1000000
SomeDate = CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 )d
You can use a similar formula in a straight update...
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2008 at 11:06 pm
One other thing... why do you want to scramble the dates in such a fashion?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2008 at 8:50 am
Jeff,
You're the master of set based thinking. Thanks for the contributions and the recent article on the tally tables. It is starting to form in my head (as you can see above... almost but not quite). Your posts are helping us all use SQL in the most efficient way.
One question... what's the psuedo-code logic that you use in
SELECT CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)
How does that work?... the modulo? why dividing by 3653? and the addition of 36524?
I was attempting this yesterday, and I guess my main confusion point is that I can cast and integer as datetime. Although, in the process of asking the question, I think I get it... the number of days since day #1
SELECT DATEDIFF (DD, '1-1-1900',GETDATE ())
SELECT CAST (39580 AS DATETIME)
OK, so I understand 36524 start at 2001... but dividing by 3653? Help my pea-brain!
Todd Carrier
MCITP - Database Administrator (SQL 2008)
MCSE: Data Platform (SQL 2012)
May 14, 2008 at 10:28 am
Todd --
The short answer is that %3653 will produce 10 years of days. That's 365*10 + 2 for leap years.
The thing to remember is that the modulo returns the REMAINDER of the division. Don't think of the fraction we see after the decimal point, rather think of it in terms of long division.
Now that you have that in your head, think what is the largest remainder you will ever have?
Consider x % y. The result will always be somewhere between 0 and (y-1).
Make sense?
GL!
May 14, 2008 at 10:35 am
John Beggs (5/14/2008)
Todd --The short answer is that %3653 will produce 10 years of days. That's 365*10 + 2 for leap years.
The thing to remember is that the modulo returns the REMAINDER of the division. Don't think of the fraction we see after the decimal point, rather think of it in terms of long division.
Now that you have that in your head, think what is the largest remainder you will ever have?
Consider x % y. The result will always be somewhere between 0 and (y-1).
Make sense?
GL!
..and in the same way - the 36524 represents the 100 years it takes to move from 1/1/1900 to 1/1/2000. So now - you are getting random dates in a 10-year range between 1/1/2000 and 1/1/2010.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 14, 2008 at 11:09 am
Well done John and Matt! Exactly correct on both parts!
Todd, here's the long answer... Funny thing... I'm actually writing a series of articles on the generation of constrained random data for use as test data... and, I have Matt Miller to thank for showing me a quicker way to generate whole numbers than the way I was (which was still damned fast!). Here we go...
SELECT NEWID() returns the only "seed" that's guaranteed to be random, even across servers. As you know, it returns a GUID that looks like the following...
C8F64E22-843F-45A0-BE13-8B40FB989027
You could do a conversion to VarBinary and use RAND() on that result, but that takes 4 times longer than the method I used. Since we really want WHOLE days, we really need an random INT... CHECKSUM of a random GUID returns a random INT with very few possibilities of a dupe. For our purposes, the 1 in a billion chance of a dupe won't kill us... so the following returns a random INT...
SELECT CHECKSUM(NEWID())
Notice that a "random INT" may be a random negative number... the ABS ("Absolute" function) will force all negative numbers to be positive... so we end up with this as a "random positive integer generator"...
SELECT ABS(CHECKSUM(NEWID()))
Let's just check and generate 10 "RPI's" (Random Positive Integer) using that code (using Master.dbo.SysColumns as a "tally" table of sorts to provide a source of rows without a loop)...
SELECT TOP 10
ABS(CHECKSUM(NEWID()))
FROM Master.dbo.SysColumns
... and that will return some like the following (numbers will be different because it IS random)...
1013928630
644557237
1296524116
928604416
1136642647
264331209
248166678
596092447
39105552
163661140
The "range" of the smallest number is more than enough for most of what we want to get for a "number of whole days". And that's the next thing... how many whole days do you want to constrain the random whole date generator to? I wanted 10 years from 2000-01-01 to 2009-12-31... a little date math helps me figure that out...
SELECT DATEDIFF(dd,'2000-01-01','2009-12-31')+1
RESULT: 3653
... and that's where the 3653 comes from. The +1 comes from the subtraction error... if I have ten days numbered from 1 to 10, 10-1 would only return 9... to include both end-points, you have to add 1 back in like we did.
We use Modulo to "constrain" or convert the output of the RPI generator to all the whole numbers from 0 to 3652 which is a range of 3653 whole numbers if you include the "0". I call this the "Range" number
The final number is easy... it's the date serial for the 2000-01-01 or start date. It's easily calculated using more simple date math...
SELECT DATEDIFF(dd,0,'2000-01-01')
RESULT: 36524
I call that number the "StartNumber".
If you add the minimum range number of "0" to the StartNumber, you end up with 36524 which is the date serial number for 2000-01-01.
If you add the maximum range number of 3652 to the StartNumber, you end up with 40176 which is the date serial number for 2009-12-31.
After that, you just need to convert the Constrained Random Positive Integer (CRPI) that represents a date serial to a date...
SELECT TOP 10
CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)
FROM Master.dbo.SysColumns
RESULTS:
2002-06-21 00:00:00.000
2007-02-05 00:00:00.000
2006-12-25 00:00:00.000
2007-03-22 00:00:00.000
2006-06-25 00:00:00.000
2004-11-05 00:00:00.000
2008-08-23 00:00:00.000
2009-03-06 00:00:00.000
2009-06-06 00:00:00.000
2005-03-01 00:00:00.000
Next is to do it a million times... we use a cross-join on a table that has at least 4000 rows in it even for a brand new installation to provide the capability of at least 16 million rows... surely enough for our 1 million row requirement...
SELECT TOP 1000000
CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
... heh... sorry if I don't post THAT result...
Finally, we use that as a source of 1 million random "StartDate"s... the OP asked for "EndDate"s that were 1 whole day later. Natural choice is a Derived Table or a CTE and we end up with the following as I posted...
SELECT SomeDate AS StartDate,
SomeDate+1 AS EndDate --1 Day more than the random start date
FROM (--==== Derived table "d" returns 1 million random whole day start dates
-- in a range from 2000-01-01 to 2009-12-31. Dupes are, of course,
-- quite possible as with any random number.
SELECT TOP 1000000
SomeDate = CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME)
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 )d
The "generic" formula for generating random whole dates (dates with a midnight time) is...
CAST(ABS(CHECKSUM(NEWID()))%(desired_number_of_days)+(start_date) AS DATETIME)
You can, of course, substitute various expressions (calculations) for the desired number of days and the start date. I just hardcoded my example so it wouldn't "blur" what the code was actually doing for people.
Heh... did I miss anything? 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2008 at 11:39 am
Absolutely brilliant. Thanks again. I've got a new wrench in the toolbox!
Todd Carrier
MCITP - Database Administrator (SQL 2008)
MCSE: Data Platform (SQL 2012)
May 14, 2008 at 11:54 am
Thanks for the compliment, Todd... :blush:
Au4848... are you all set?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2008 at 12:46 pm
Jeff
I'm not sure if I can use this because it took so long to run. I mean, I can think of better things to do with my seven seconds it took to produce 1M rows of data.
You know I'm kidding, like posted earlier, a new tool to add to the tool box.
Thanks
Marvin Dillard
Senior Consultant
Claraview Inc
Viewing 15 posts - 1 through 15 (of 31 total)
You must be logged in to reply to this topic. Login to reply