December 19, 2013 at 6:42 pm
My overarching goal is to generate sets of random Symptom records for each Enrollee in a drug study, so that for each cycle (period of time), the code will insert a random number of random records for each enrollee.
I'm trying to return a number of random records from a table, but inside a table-valued function... (which could be my problem).
CREATE FUNCTION dbo.ufn_GetTopSymptoms (
@enrollID INT
, @CTCVersion VARCHAR(20)
, @NumRecords INT
)
RETURNS TABLE
AS
RETURN
SELECT TOP(@NumRecords) ID
, @enrollID AS EnrolledPatientID
FROM dbo.Data
WHERE [Version] = @CTCVersion
ORDER BY NEWID();
GO
but that ORDER BY NEWID()
clause is illegal apparently, because here's the error it throws:
Msg 443, Level 16, State 1, Procedure ufn_GetTopSymptoms, Line 13
Invalid use of a side-effecting operator 'newid' within a function.
I was hoping I could return a set of enrollmentIDs and then use CROSS APPLY to generate a random set of records for each enrollmentID... is this not possible with APPLY? I was trying to avoid using a cursor...
The idea is basically to create all the Symptom records for all the patients in treatment cycle at once by using Enrollee OUTER APPLY dbo.ufn_GetTopSymtoms(dbo.Enrollment.EnrolleeID)
but that's clearly not working. Is there a way to do this without resorting to a cursor? I saw Paul White's (outer and cross apply) articles, but maybe I misapplied what I read.
Any pointers on how to do this right?
Thanks!
Pieter
December 19, 2013 at 11:30 pm
Unfortunately I don't think you're going to be able to do it within the FUNCTION because of the error it is throwing but there is no reason you can't do it with a CROSS APPLY like this:
CREATE TABLE #Data
(
ID INT
,N INT
);
CREATE INDEX i1 ON #Data (ID, N);
CREATE TABLE #Patients
(
PatientID INT
)
-- Uncomment below to set up 1,000,000 rows
DECLARE @N INT = 10000; -- 10,000 = ~1,000,000 rows
WITH Tally (n) AS
(
SELECT TOP (@N) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO #Data
SELECT a.n, ABS(CHECKSUM(NEWID()))%@N
-- Always create exactly 100 IDs from the Tally table
FROM (SELECT TOP 100 n FROM Tally) a
CROSS APPLY
(
SELECT TOP (@N) n
FROM Tally
)b;
WITH Tally (n) AS
(
SELECT TOP (100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO #Patients
SELECT n
FROM Tally;
DECLARE @Nrecs INT = 10;
SELECT PatientID, ID, N
FROM #Patients a
CROSS APPLY
(
SELECT TOP (@Nrecs) ID, N
FROM #Data b
WHERE a.PatientID = b.ID
ORDER BY NEWID()
) b;
--GO
--DROP TABLE #Data;
--DROP TABLE #Patients;
Note the commented out DROP TABLE statements. You can compare the results of the first run with a second run where you only execute the statements starting with the DECLARE of @Nrecs. When I did it, it seemed to produce a different set of Ns on each run. You may want to add an ORDER BY on the query to ensure you clearly see what is getting returned (and that it is different each time).
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 20, 2013 at 8:28 pm
Dwain,
Good to have a definitive answer. Thanks! Looks like I need a good reasonably simple tutorial on CROSS APPLY. I guess I could start with Paul White's two articles. The one thing I really wanted to be able to do was to create a random number of symptoms for each (enrolleeID, TxCycle) combination... but the catch is that all symptoms in the Data table (hey, I didn't name the thing!) where the Grade=5 are fatal, so if one of those is encountered, I would stop adding symptoms for that enrollee. Do I have to resort to an RBAR solution for that? Essentially, the pseudocode would look like this:
FOR a series of TimeSlots (Cycles)
add new enrollees (grab random number of random patients)
Add New Symptom Records for all enrolled Patients
Update all Enrolled Patients (enrollees) that are still enrolled (Enrollment.LeaveDate IS NULL) Where there exists a fatal Symptom (grade 5)
NEXT TimeSlot
Is this necessary, or can I do some of this with CROSS APPLY?
Thanks!
Pieter
December 20, 2013 at 9:48 pm
I'm not sure I'm getting the full meaning of your business rules, this being a Sat morning and me having a hangover. However...
Looking at the code that I provided, within the CROSS APPLY it is returning always the same number of rows (just different ones). Is what you're looking for to return a variable number of rows, stopping when a value on the row has a certain value and then returning no more rows?
I think I have a piece of code laying around somewhere that does something like that. I'll need to find it. Never used it inside of a CROSS APPLY but I see no reason why it wouldn't work.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 21, 2013 at 8:32 am
Well, at least you have a reason for your brain not working!
Yes... you have it right. Think of a patient receiving cancer treatments. He will receive treatments and record "symptoms" until he either drops out of the program or dies (in data terms, he has a grade 5 symptom).
Thanks!
December 21, 2013 at 8:29 pm
I can generate purely random records really easily. TOP (@N) ... ORDER BY CHECKSUM(NewID)
The really hard part (for me!) is fixing my code so that as the grade increases, the frequency decreases maybe exponentially.
This is as far as I got...
USE scridb;
GO
DECLARE @NumRecs INT;
DECLARE @Lower INT = 1;
DECLARE @Upper INT = 9;
SELECT @NumRecs = ROUND(((@Upper - @Lower - 1) * RAND() * @Lower),1);
/*
SELECT @NumRecs = ABS(CHECKSUM(NEWID()))/250000000;
*/
SELECT TOP (@NumRecs)
Data.ID
, Data.Grade
, Data.Category
, Data.Subcategory
, Tally.N
FROM Data INNER JOIN Tally ON (6-Data.Grade)>=Tally.N
WHERE Grade IS NOT NULL
AND [Version] = 'CTCAE v4.03 (MedDRAv12.0)'
ORDER BY NEWID();
I think the part I'm doing wrong is not adding enough copies of the Tally table so that I get a distribution roughly like this:
Grade 5: 1 / 1000
Grade 4: 50 / 1000
Grade 3: 200 / 1000
Grade 2: 300 / 1000
Grade 1: 449/ 1000
December 22, 2013 at 5:21 pm
You can try this:
CREATE TABLE #Data
(
ID INT
,N INT
,Flag INT
);
CREATE INDEX i1 ON #Data (ID, N);
CREATE TABLE #Patients
(
PatientID INT
)
-- Uncomment below to set up 1,000,000 rows
DECLARE @N INT = 10000; -- 10,000 = ~1,000,000 rows
WITH Tally (n) AS
(
SELECT TOP (@N) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO #Data
SELECT a.n, ABS(CHECKSUM(NEWID()))%@N, CASE WHEN b.n > 4000 THEN 0 ELSE b.n%100 END
-- Always create exactly 100 IDs from the Tally table
FROM (SELECT TOP 100 n FROM Tally) a
CROSS APPLY
(
SELECT TOP (@N) n
FROM Tally
)b;
WITH Tally (n) AS
(
SELECT TOP (100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO #Patients
SELECT n
FROM Tally;
DECLARE @Nrecs INT = 100;
SELECT PatientID, ID, N, Flag
FROM #Patients a
CROSS APPLY
(
SELECT TOP (@Nrecs) ID, N, Flag
FROM
(
SELECT ID, N, Flag
FROM (
SELECT ID, N, Flag, rn
,rn0=ROW_NUMBER() OVER (PARTITION BY ID ORDER BY N)
,mrn=MIN(rn) OVER (PARTITION BY ID)
FROM (
SELECT ID, N, Flag
-- Flag = 0 is the trigger to stop returning rows
,rn=CASE WHEN Flag = 0 THEN ROW_NUMBER() OVER (PARTITION BY ID ORDER BY N) END
FROM #Data b
WHERE a.PatientID = b.ID
) a
) a
WHERE rn0 <= mrn
) b
ORDER BY NEWID()
) b;
--GO
--DROP TABLE #Data;
--DROP TABLE #Patients;
For my case, when Flag=0 the symptom is fatal.
Which makes me wonder what constitutes a fatal symptom. Presumably it is a bit more specific than "patient drops dead." 😛
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 22, 2013 at 10:35 pm
Dwain,
First off, thanks for taking the time to crank that out... Of course, it'll take me a while to figure it out!!!
The grades are from the National Institute of Health - they published a database of Cancer treatment symptoms, and they were the ones to come up with the "1 to 5" grading scale. To me, it looks more like a Richter scale, but that's my opinion! By definition, a "5" in their scale is fatal. (And hence the odd rule... I didn't make it up! I'm nowhere near that creative!)
In your example, the ID column is analogous to what in mine, a SymptomID? (I think.) So N would correspond to a time value (like a cycle number or week number).... right?
I didn't make up this craziness... I got it from here: http://www.theradex.com/ctms/downloads.aspx (I just incorporated the database into my stuff, since I figured out that it was a MS Access database in disguise.)
Thanks!
Pieter
December 22, 2013 at 11:14 pm
pietlinden (12/22/2013)
Dwain,First off, thanks for taking the time to crank that out... Of course, it'll take me a while to figure it out!!!
...
In your example, the ID column is analogous to what in mine, a SymptomID? (I think.) So N would correspond to a time value (like a cycle number or week number).... right?
...
You're welcome. Like I said, I had the code on file just needed to find and adapt it.
And I believe the answer to your question is no. My ID is your PatientID. N is the symptom ID.
The "Flag" column should then be equivalent to your symptom rating.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 22, 2013 at 11:26 pm
Okay, thanks for the clarification. I'll post again if I can't figure something else out about the solution.
December 31, 2013 at 8:19 pm
Dwain,
Okay, must be that the candy canes and eggnog are working... This seems to work okay, and pretty quickly (haven't tested it, but it's less than a second to run the stored procedure.
Here's some code that works reasonably well... not perfect, but it's fake data...
The view just lets me use the semi-random number (RandomNo) to generate a different number of Symptom records for each (enrollmentID, cycleNo) combination. It's still flawed, I guess, because I never included any code to grab the MAX(CycleNo) per enrollmentID, but the rest works, and that was the hard part!! Without further ado ...
CREATE VIEW vw_EnrolledPatients
AS
SELECTenrollmentID
, e_PatientID AS PatientID
, e_ProtocolNo AS ProtocolNo
, ROW_NUMBER() OVER(PARTITION BY e_PatientID ORDER BY e_PatientID) AS TxCycle
, ABS(CHECKSUM(NEWID())%10) AS RandomNo
FROM Enroll
WHERE LeaveDate IS NULL;
-- Generate a random number of Symptoms for each PatientID for random treatment cycles.
CREATE PROC uspAddSymptoms
AS
BEGIN
-- Generate a random number of Symptoms for each PatientID for random treatment cycles.
INSERT INTO Symptom(enrollmentID, cycle,SymptomID)
SELECT EnrollmentID
, ABS(CHECKSUM(NEWID())%10)+1 AS CycleNo
, a.ID As SymptomID
FROM vw_EnrolledPatients
CROSS APPLY (SELECT TOP(ABS(CHECKSUM(NEWID())%10)) Data.ID
FROM Data
WHERE [Version]='CTCAE v4.03 (MedDRAv12.0)'
AND Grade IS NOT NULL
ORDER BY NEWID()) a
ORDER BY enrollmentID, CycleNo, ID;
-- remove deceased patients
UPDATE Enroll
SET LeaveDate = GETDATE()
WHERE LeaveDate IS NULL -- dont update Patients who have already been removed.
AND EnrollmentID IN (SELECT enrollmentID
FROM Symptom INNER JOIN Data ON Symptom.SymptomID = Data.ID
WHERE Data.Grade = 5); -- <-- Has a fatal symptom
END
Thanks!
Pieter
January 1, 2014 at 5:27 am
Different route, similar results.
First some sample data:
SELECT TOP (50000)
ID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
[Version] = 'CTCAE v4.03 (MedDRAv12.0)',
Grade = 1+ABS(CHECKSUM(NEWID()))%5
INTO Data
FROM SYSCOLUMNS a, SYSCOLUMNS b
Second, a view which returns a suitable set from that data:
CREATE VIEW vwRandomSymptoms AS
SELECT TOP(1+ABS(CHECKSUM(NEWID()))%10)
*
FROM (
SELECT * FROM (SELECT TOP 449 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 1 ORDER BY NEWID()) d
UNION ALL
SELECT * FROM (SELECT TOP 300 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 2 ORDER BY NEWID()) d
UNION ALL
SELECT * FROM (SELECT TOP 200 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 3 ORDER BY NEWID()) d
UNION ALL
SELECT * FROM (SELECT TOP 50 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 4 ORDER BY NEWID()) d
UNION ALL
SELECT * FROM (SELECT TOP 1 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 5 ORDER BY NEWID()) d
) e
ORDER BY NEWID()
Third, an inline function to run the view. Note the funny stuff in the WHERE clause to force the optimiser to "run" the function for each outer row rather than once for the query:
CREATE FUNCTION IF_GetRandomSymptoms
(
@Param INT
)
RETURNS TABLE
AS
RETURN
(
SELECT
v.*,
CycleNo = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM vwRandomSymptoms v
WHERE v.ID = (v.ID * @Param/@Param)
)
Test it like this:
SELECT *
FROM (SELECT n = 1 UNION ALL SELECT 2) d
CROSS APPLY dbo.IF_GetRandomSymptoms (d.n)
ORDER BY n
- where derived table d represents your Enroll table. Give it a whirl Pieter, see how you get on.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 1, 2014 at 6:06 pm
ChrisM@home (1/1/2014)
Different route, similar results.First some sample data:
SELECT TOP (50000)
ID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
[Version] = 'CTCAE v4.03 (MedDRAv12.0)',
Grade = 1+ABS(CHECKSUM(NEWID()))%5
INTO Data
FROM SYSCOLUMNS a, SYSCOLUMNS b
Second, a view which returns a suitable set from that data:
CREATE VIEW vwRandomSymptoms AS
SELECT TOP(1+ABS(CHECKSUM(NEWID()))%10)
*
FROM (
SELECT * FROM (SELECT TOP 449 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 1 ORDER BY NEWID()) d
UNION ALL
SELECT * FROM (SELECT TOP 300 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 2 ORDER BY NEWID()) d
UNION ALL
SELECT * FROM (SELECT TOP 200 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 3 ORDER BY NEWID()) d
UNION ALL
SELECT * FROM (SELECT TOP 50 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 4 ORDER BY NEWID()) d
UNION ALL
SELECT * FROM (SELECT TOP 1 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 5 ORDER BY NEWID()) d
) e
ORDER BY NEWID()
Third, an inline function to run the view. Note the funny stuff in the WHERE clause to force the optimiser to "run" the function for each outer row rather than once for the query:
CREATE FUNCTION IF_GetRandomSymptoms
(
@Param INT
)
RETURNS TABLE
AS
RETURN
(
SELECT
v.*,
CycleNo = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM vwRandomSymptoms v
WHERE v.ID = (v.ID * @Param/@Param)
)
Test it like this:
SELECT *
FROM (SELECT n = 1 UNION ALL SELECT 2) d
CROSS APPLY dbo.IF_GetRandomSymptoms (d.n)
ORDER BY n
- where derived table d represents your Enroll table. Give it a whirl Pieter, see how you get on.
I see that you are raising the bar for the New Year Chris! Nice one +1.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 2, 2014 at 4:03 pm
You can also just create a view called (for example) ANewID that generates a GUID as a single column, single row result and simply join to it as a CROSS JOIN or CROSS APPLY.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2014 at 6:00 pm
Jeff,
I think that's how I got the random number of records per enrollment so that the distribution is irregular. I noticed that Chris's code returns the same number of results per Protocol. I guess this is where I have to re-read parts of Guru's Guide to T-SQL - might be an old book, but it's easy to follow and he deals with a lot of the standard problems in T-SQL (like this one).
Think I'll read that and post back. Otherwise I may have to resort to the dreaded cursor... (might have to try it just to see how bad it is!) Might work okay as I only need it to run once ...
Pieter
Viewing 15 posts - 1 through 15 (of 34 total)
You must be logged in to reply to this topic. Login to reply