April 30, 2021 at 1:28 pm
Im in the process of creating test data
Now I have a field in the test data called SerialNo
this has the following allowed values '111111','222222', '333333','444444','555555','666666'
Now I want to be able insert rows randomly into a target table which can take any one of the values above but
im not sure how to do this?
I thought of using a local temp table like the following
CREATE TABLE [#LoggerNotifications](
[LoggerNotifcationID] [int] IDENTITY(1,1) NOT NULL,
[SerialNumber] [nvarchar](60) NOT NULL
)
insert into #LoggerNotifications
Select '111111'
union all
Select '222222'
union all
Select '333333'
union all
Select '444444'
union all
Select '555555'
union all
Select '666666'
Then I can randonmly select from this table between 1 and 6 and grab the serial number and insert it into the table
But this is quite a bit of work to insert a serial number range and I was wondering if there is any easier way to just randomly
select from a selection of values?
April 30, 2021 at 2:16 pm
SELECT TOP (n) ID
FROM MyTable
ORDER BY NewID()
then build the insert from that. ORDER BY NEWID()
April 30, 2021 at 2:54 pm
Here's an easy way that doesn't require much in the form of prep...
--===== Demonstrate one of many "right" ways to do this.
DROP TABLE IF EXISTS #MyHead;
SELECT TOP 1000000
SomeSerialNo = CHOOSE(ca.N,'111111','222222', '333333','444444','555555','666666')
,SomeOtherRandomCols = 'SomeOtherRandomCols'
INTO #MyHead
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
CROSS APPLY (VALUES (ABS(CHECKSUM(NEWID())%6)+1))ca(N)
;
--===== Demonstrate that the distribution is random but reasonably flat.
SELECT SomeSerialNo, COUNT(*)
FROM #MyHead
GROUP BY SomeSerialNo
ORDER BY SomeSerialNo
;
Notice that I say it's one "right" way meaning that the distribution is random but reasonably flat like you'd expect random values to be.
Results from above (and the numbers will change each time you run it but will always be close to "flat" distribution)...
There are "wrong" ways to do it. CHOOSE() actually resolves to multiple cases with multiple executions of the NEWID() formula (and so will a supposed single value CASE, which is annoying as hell). That will result in NULLs and value usage being severely skewed if you use the formula directly in the CHOOSE. You can see that in the execution plan but here's the code for one of the many ways to do it incorrectly.
--===== Demonstrate one of many "wrong" ways to do this.
DROP TABLE IF EXISTS #MyHead;
SELECT TOP 1000000
SomeSerialNo = CHOOSE(ABS(CHECKSUM(NEWID())%6)+1,'111111','222222', '333333','444444','555555','666666')
,SomeOtherRandomCols = 'SomeOtherRandomCols'
INTO #MyHead
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
--===== Demonstrate that the distribution is random but improperly skewed.
SELECT SomeSerialNo, COUNT(*)
FROM #MyHead
GROUP BY SomeSerialNo
ORDER BY SomeSerialNo
;
Here are the results from that run, which shows the skews and the NULLs I was talking about.
The difference between the two snippets is that the CROSS APPLY in the first "correct" snippet forces the result of the NEWID()-based formula to materialize before it is used by the evaluation in the CHOOSE. The second "wrong" snippet allows the multi-CASE statements behind the scenes of CHOOSE to actually generate as many as 6 times if it doesn't match an evaluation right away.
So here's from the execution plan from the first "correct" snipped. You can see that it's using a pre-caclulated expression in the CASE statement behind the scenes and so the cascading CASE statement actually works correctly.
[Expr1204] = Scalar Operator(CASE
WHEN [Expr1203] = (1)
THEN '111111'
ELSE CASE
WHEN [Expr1203] = (2)
THEN '222222'
ELSE CASE
WHEN [Expr1203] = (3)
THEN '333333'
ELSE CASE
WHEN [Expr1203] = (4)
THEN '444444'
ELSE CASE
WHEN [Expr1203] = (5)
THEN '555555'
ELSE CASE
WHEN [Expr1203] = (6)
THEN '666666'
ELSE NULL
END
END
END
END
END
END)
Here's what happens (again, you can pick it up from the "Compute Scalar" block properties in the execution plan) if you do it one of the "wrong" ways and you can clearly see why it's a problem.
Scalar Operator(CASE
WHEN (abs(checksum(newid()) % (6)) + (1)) = (1)
THEN '111111'
ELSE CASE
WHEN (abs(checksum(newid()) % (6)) + (1)) = (2)
THEN '222222'
ELSE CASE
WHEN (abs(checksum(newid()) % (6)) + (1)) = (3)
THEN '333333'
ELSE CASE
WHEN (abs(checksum(newid()) % (6)) + (1)) = (4)
THEN '444444'
ELSE CASE
WHEN (abs(checksum(newid()) % (6)) + (1)) = (5)
THEN '555555'
ELSE CASE
WHEN (abs(checksum(newid()) % (6)) + (1)) = (6)
THEN '666666'
ELSE NULL
END
END
END
END
END
END)
--Jeff Moden
Change is inevitable... Change for the better is not.
April 30, 2021 at 3:00 pm
Thanks Guys
Some very useful stuff there
I elected to use this as I need to generate a 2nd column of random values
SELECT TOP 10
SomeSerialNo = CHOOSE(ca.N,'111111','222222', '333333','444444','555555','666666')
,SomeOtherRandomCols = CHOOSE(ca2.N,'Open','Closed', 'Polling','Listening','Error','Transmitting')
-- INTO #MyHead
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
CROSS APPLY (VALUES (ABS(CHECKSUM(NEWID())%6)+1))ca(N)
CROSS APPLY (VALUES (ABS(CHECKSUM(NEWID())%6)+1))ca2(N)
April 30, 2021 at 3:11 pm
Thanks for the feedback. Looks to me like you totally "get it"! 😀
Just in case you want to do even more in the area of rapidly generating random but constrained data of a bunch of different types, have a look at the following articles... the methods are also useful for generating things like on-the-fly calendar tables, etc.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply