February 20, 2013 at 2:00 am
The following query gives me a random date between 1 and 28 days after the arrival date:
SELECT ArrivalDate, DATEADD(day, 1 + RAND(checksum(NEWID()))
* LengthOfStay.LengthofStay, ArrivalDate) AS DepartureDate
FROM Bookings, LengthOfStay
However when I use the update query below it only gives me between 1 and 2 days after the arrival date
USE Occupancy
Update B
Set DepartureDate = DATEADD(day, 1 + RAND(checksum(NEWID()))*1.5 * L.LengthofStay, B.ArrivalDate)
FROM LengthOfStay L, Bookings B
Does anyone know why, and if so how do I change it?
Thanks
Wayne
February 20, 2013 at 3:38 am
wafw1971 (2/20/2013)
The following query gives me a random date between 1 and 28 days after the arrival date:SELECT ArrivalDate, DATEADD(day, 1 + RAND(checksum(NEWID()))
* LengthOfStay.LengthofStay, ArrivalDate) AS DepartureDate
FROM Bookings, LengthOfStay
However when I use the update query below it only gives me between 1 and 2 days after the arrival date
USE Occupancy
Update B
Set DepartureDate = DATEADD(day, 1 + RAND(checksum(NEWID()))*1.5 * L.LengthofStay, B.ArrivalDate)
FROM LengthOfStay L, Bookings B
Does anyone know why, and if so how do I change it?
Thanks
Wayne
If you run this multiple times:
SELECT 1 + RAND(checksum(NEWID()))
You should see that the number returned is always between 1 and 2.
If you want it to return a number between 1 and 28, use this:
SELECT 1 + ABS(checksum(NEWID())) % 28
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 20, 2013 at 7:02 am
I have just been told I have done it completely wrong by my boss and the query above hasn't randomised anything on our data, I have got 48000 records with a 2 night stay, 48000 records for a 27 night stay etc.
What I need is 30% of the departure dates to be 2 days in length, 10% to be 3 Days and the rest to be randomised amongst 1,4 to 28.
Can anyone help?
February 20, 2013 at 7:09 am
wafw1971 (2/20/2013)
I have just been told I have done it completely wrong by my boss and the query above hasn't randomised anything on our data, I have got 48000 records with a 2 night stay, 48000 records for a 27 night stay etc.What I need is 30% of the departure dates to be 2 days in length, 10% to be 3 Days and the rest to be randomised amongst 1,4 to 28.
Can anyone help?
Possibly. What you need is to generate random numbers based on a multinomial distribution. See the second article in my signature links (about random number generators in SQL) for a function that will do this.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 20, 2013 at 7:20 am
Hi Dwain
Thanks for the link but to be honest its way above where I am in my training (I'm 4 weeks in) and to be honest that looks like a different language.
My boss said I should be able to find a case statement along the lines if a Random number genrte between 0 and 1 then between 0 and 0.3 is a 2 day stay and between 0.3 to 0.4 is a 3 day stay else 0.4 to 1 will all other days between 1 and 28.
But I don't know how to write the code.
Can you help or point me in the right direction.
Ta
Wayne
February 20, 2013 at 8:11 am
OK. Give me a few minutes and I'll code up an example.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 20, 2013 at 8:27 am
Hi Dwain
I have tried to breakdown what I need below, I hope this helps.
Thanks
Wayne
Step 1
Arrival Date (Already generated) – 1.35 Million Times
Step 2
Randomise a number between 0 and 1
Step 3
Use the Randomised number produced above to create the script below
UPDATE BOOKINGS
SET DepartureDate
CASE WHEN RAND() Result = Between 0 and 0.3 = Departure Date will be 2 Nights Later
CASE WHEN RAND() Result = Between 0.3 and 0.4 = Departure Date will be 3 Nights Later
CASE WHEN RAND ()Result >0.4 = Departure Date will be either 1,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 Nights Later
February 20, 2013 at 8:39 am
Here is how to generate a sample set of multinomially distributed random numbers. First, you need to create a TYPE and a FUNCTION by running this script:
CREATE TYPE Distribution AS TABLE (EventID INT, EventProb FLOAT, CumProb FLOAT)
GO
CREATE FUNCTION dbo.RN_MULTINOMIAL
(@Multinomial Distribution READONLY, @URN FLOAT)
RETURNS INT --Cannot use WITH SCHEMABINDING
AS
BEGIN
RETURN
ISNULL(
( SELECT TOP 1 EventID
FROM @Multinomial
WHERE @URN < CumProb
ORDER BY CumProb)
-- Handle unlikely case where URN = exactly 1.0
,( SELECT MAX(EventID)
FROM @Multinomial))
END
Next, you need to set up your multinomial probability distribution table as follows:
DECLARE @MultinomialProbabilities Distribution
;WITH Tally (n) AS (
SELECT TOP 28 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns)
INSERT INTO @MultinomialProbabilities
SELECT n
,CASE n WHEN 1 THEN .6/26. WHEN 2 THEN .3 WHEN 3 THEN .1 ELSE .6/26. END
,CASE n WHEN 1 THEN .6*1./26. WHEN 2 THEN .3+.6*1./26. WHEN 3 THEN .4+.6*1./26. ELSE .4+.6*(n-2)/26. END
FROM Tally
SELECT * FROM @MultinomialProbabilities
Note how the EventProb column shows .3 for event 2 and .1 for event 3. The rest are all the remaining probability (.6) divided by the number of events (26). The last column is the cumulative probability for all previous events (last row should show 1).
The hard part is now behind us.
Now, within the same SQL batch as the above, this test harness tests the generated random numbers so you can compare to the distribution's expected frequency.
DECLARE @TestNums INT = 1000
;WITH Tally (n) AS (
SELECT TOP (@TestNums) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
SELECT MNRN, CountOfMNRNs=COUNT(MNRN), ActualProbability=COUNT(MNRN)/(1.*@TestNums)
FROM (
SELECT MNRN=dbo.RN_MULTINOMIAL(@MultinomialProbabilities, URN)
FROM Tally
CROSS APPLY (SELECT URN=RAND(CHECKSUM(NEWID()))) a
) a
INNER JOIN @MultinomialProbabilities ON EventID=MNRN
GROUP BY MNRN
The key to generating a group of random numbers is the part I highlighted in bold/ This generates a sample set based on the value of @TestNums. The rest of it just groups by EventID and calculates the actual probability. This should center around 0.23 for all events except 2 and 3, which should be close to .3 and .1. The more numbers you generate, the closer they should be to the actual distribution.
Hope this helps.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 20, 2013 at 8:48 am
Perhaps one additional clarification.
To generate a single multinomial random number, you do it like this:
SELECT MNRN=dbo.RN_MULTINOMIAL(@MultinomialProbabilities, URN)
FROM (SELECT URN=RAND(CHECKSUM(NEWID()))) a
Provided because I wasn't sure if when you referring to your skill level you meant in SQL or statistics.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 20, 2013 at 9:23 am
Thank you for that I will try and get my head around it tomorrow.
February 20, 2013 at 5:30 pm
I'll help a little further. The script you posted (althought syntactically incorrect) does exactly what the RN_MULTINOMIAL function does:
UPDATE BOOKINGS
SET DepartureDate
CASE WHEN RAND() Result = Between 0 and 0.3 = Departure Date will be 2 Nights Later
CASE WHEN RAND() Result = Between 0.3 and 0.4 = Departure Date will be 3 Nights Later
CASE WHEN RAND ()Result >0.4 = Departure Date will be either 1,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 Nights Later
To put this in a context that might be more familiar to you, what you'll want to do is something like this (after setting up the @MultinomialProbabilities Distribution table:
UPDATE b
SET DepartureDate = DATEADD(day,
dbo.RN_MULTINOMIAL(@MultinomialProbabilities, URN), Arrivaldate)
FROM BOOKINGS b
CROSS APPLY (SELECT URN=RAND(CHECKSUM(NEWID()))) a
Your boss should be happy because you used a CASE statement too (in the setup of the @MultinomialProbabilities Distribution table). 😛
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 21, 2013 at 3:05 am
I was given a simple way of doing this:
UPDATE BOOKINGS
SET DepartureDate =
DATEADD(day,
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0 and 0.3 THEN 2 ELSE
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0.3 and 0.5 THEN 3 ELSE
Round(Rand(CHECKSUM(NEWID())) * 28,0) END END,ArrivalDate)
Thanks for all your help.
February 21, 2013 at 3:38 am
wafw1971 (2/21/2013)
I was given a simple way of doing this:UPDATE BOOKINGS
SET DepartureDate =
DATEADD(day,
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0 and 0.3 THEN 2 ELSE
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0.3 and 0.5 THEN 3 ELSE
Round(Rand(CHECKSUM(NEWID())) * 28,0) END END,ArrivalDate)
Thanks for all your help.
I suggest you double check your math but good for you.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 21, 2013 at 3:54 am
Hi Dwain,
I didn't mean to offend you, if that's the case I am sorry. My boss helped me with that query, is it not right?
1.35 Million Records of which
30% should 2 night stays
20% should be 3 night stays
and the other 50% to be randomised between 1, 4 and 28 days.
I do have another question if you could help in the same vain.
I now need to make 15% of them cancelled by inserting a random Cancelled Date. However the cancelled date must be =>Booking Date and <=Arrival Date.
I have completed the random section but I now need to know how to add greater than and less than part to the query:
Can you help?
SELECT ArrivalDate,
DATEADD(day,
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0 and 0.85 THEN NULL ELSE
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0.85 and 0.88 THEN 0 ELSE
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0.88 and 0.92 THEN -1 ELSE
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0.92 and 0.97 THEN -7 ELSE
Round(Rand(CHECKSUM(NEWID())) * -90,0) END END END END,ArrivalDate) AS DaystoReduce
FROM Bookings
Thanks
Wayne
February 21, 2013 at 4:59 am
wafw1971 (2/21/2013)
Hi Dwain,I didn't mean to offend you, if that's the case I am sorry. My boss helped me with that query, is it not right?
1.35 Million Records of which
30% should 2 night stays
20% should be 3 night stays
and the other 50% to be randomised between 1, 4 and 28 days.
I do have another question if you could help in the same vain.
I now need to make 15% of them cancelled by inserting a random Cancelled Date. However the cancelled date must be =>Booking Date and <=Arrival Date.
I have completed the random section but I now need to know how to add greater than and less than part to the query:
Can you help?
SELECT ArrivalDate,
DATEADD(day,
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0 and 0.85 THEN NULL ELSE
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0.85 and 0.88 THEN 0 ELSE
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0.88 and 0.92 THEN -1 ELSE
CASE WHEN Rand(CHECKSUM(NEWID())) BETWEEN 0.92 and 0.97 THEN -7 ELSE
Round(Rand(CHECKSUM(NEWID())) * -90,0) END END END END,ArrivalDate) AS DaystoReduce
FROM Bookings
Thanks
Wayne
No offense taken. I was referring to this line:
Round(Rand(CHECKSUM(NEWID())) * 28,0) END END,ArrivalDate)
Which I believe will throw some 0s, 2s and 3s into the mix.
I know a way to generate 15% random cancellations as you say you need, however to do it I need you to provide me with some DDL for the table and some test data in consumable form. It is not something I can just write up without testing and expect to get it right.
The way I do it might be easier or harder depending on what key fields are available to work with (e.g., if a unique booking number is present).
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply