Return random records in a table-valued function?

  • pietlinden (1/2/2014)


    Jeff,

    I think that's how I got the random number of records per enrollment so that the distribution is irregular. I noticed that Chris's code returns the same number of results per Protocol. I guess this is where I have to re-read parts of Guru's Guide to T-SQL - might be an old book, but it's easy to follow and he deals with a lot of the standard problems in T-SQL (like this one).

    Think I'll read that and post back. Otherwise I may have to resort to the dreaded cursor... (might have to try it just to see how bad it is!) Might work okay as I only need it to run once ...

    Pieter

    The sample code works, and I know the theory behind it works too - I regularly use something similar for returning a random number of rows at random from a list of words to construct a random sentence:

    -- Returns about 100,000 rows per second

    WITH

    [4] AS (SELECT n = 0 FROM (VALUES (0), (0), (0), (0)) d (n)),

    [16] AS (SELECT n = 0 FROM [4] a, [4] b),

    [256] AS (SELECT n = 0 FROM [16] a, [16] b),

    [65536] AS (SELECT n = 0 FROM [256] a, [256] b),

    iTally AS (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM [65536] a, [16] b)

    SELECT TOP 10000

    -- Shamelessly nicked from JBM's random data generator

    t.n,

    -- 10 years worth of dates ranging from 1/1/2000 to 12/31/2009

    [RandomDate] = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    -- 100 different account numbers

    [RandomAccount] = ABS(CHECKSUM(NEWID()))%100+1,

    -- Amounts from -99.99 to + 99.99

    [RandomValue] = CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY),

    -- Randomised number of words picked randomly from list

    iTVF.Sentence

    FROM iTally t

    CROSS APPLY ( -- note 'n' in the TOP expression forces evaluation for every row

    SELECT TOP ((t.n+ABS(CHECKSUM(NEWID())))%10+1)

    [data()] = word

    FROM (VALUES ('the'),('quick'), ('brown'), ('fox'), ('jumped'), ('over'), ('the'), ('lazy'), ('dog')) Words (word)

    ORDER BY NEWID()

    FOR XML PATH('')

    ) iTVF(Sentence)

    I'm not sure what the problem is but if you post up the code you've used I'm sure we can work something out.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (1/2/2014)


    You can also just create a view called (for example) ANewID that generates a GUID as a single column, single row result and simply join to it as a CROSS JOIN or CROSS APPLY.

    Absolutely, it works a treat and there are less hoops to go through than black-boxing the works inside an IF.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (1/3/2014)


    Jeff Moden (1/2/2014)


    You can also just create a view called (for example) ANewID that generates a GUID as a single column, single row result and simply join to it as a CROSS JOIN or CROSS APPLY.

    Absolutely, it works a treat and there are less hoops to go through than black-boxing the works inside an IF.

    It also prevents resorting to cursors of any kind. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/3/2014)


    ChrisM@Work (1/3/2014)


    Jeff Moden (1/2/2014)


    You can also just create a view called (for example) ANewID that generates a GUID as a single column, single row result and simply join to it as a CROSS JOIN or CROSS APPLY.

    Absolutely, it works a treat and there are less hoops to go through than black-boxing the works inside an IF.

    It also prevents resorting to cursors of any kind. 😀

    Some of my best friends are cursers 😀

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks for the ideas, everybody.

    I got it to work, which is good, but I had to resort to <whisper>a dreaded cursor.</whisper> I know, it's a total cheat, but I'm just trying to pump a table full of fake data. I ended up resorting to the Dark Side because I couldn't get the random number stuff working any other way. (Maybe I just did something wrong?)

    Here's how I did it... (not entirely sure the CROSS APPLY was necessary... but that's what happens when you Frankenstein a solution together.) In all fairness, except for the CROSS APPLY, I understood it all. I did a bunch of reading on CROSS APPLY, but apparently not enough to get it through my head.

    Here's what I came up with (warning, cursor ahead!):

    ALTER PROC AddSomeSymptoms

    @EnrollmentID INT

    AS

    BEGIN

    SET NOCOUNT ON;

    DECLARE @NextCycle INT;

    -- get the next cycle for this enrollment.

    SELECT @NextCycle = COALESCE(MAX(Cycle),0)+1

    FROM Symptom

    WHERE enrollmentID = @EnrollmentID;

    -- Is enrollee dead?

    IF NOT EXISTS (SELECT 1

    FROM Symptom s INNER JOIN Data d ON s.symptomID = d.ID

    WHERE d.Grade = 5

    AND s.enrollmentID = @EnrollmentID)

    BEGIN

    -- get next cycle for this enrollee

    INSERT INTO Symptom(Cycle, EnrollmentID, SymptomID)

    SELECT

    @NextCycle AS Cycle

    , @EnrollmentID

    , ID

    /*, n

    , Category

    , Subcategory

    , Term

    , Grade

    */

    FROM (SELECT n=1 /*UNION ALL SELECT 2*/) d

    CROSS APPLY dbo.IF_GetRandomSymptoms3 (d.n,@EnrollmentID)

    ORDER BY n;

    END

    SET NOCOUNT OFF;

    END

    I guess my only questions left are:

    (1) If I do a terrible thing and use a cursor, can I tell if an enrollee is deceased (has a Grade 5) without having to test for existence each time?

    (2) Do I even need the Cross Apply?

    I just couldn't figure out how to get the AddSymptom stuff to work for all the cycles and get different numbers of symptoms each time. Pretty much everything I tried didn't like using non-deterministic functions (Table-Valued Functions), so that's why I resorted to the Dark Side...

    Thanks for the schooling!,

    Pieter

  • pietlinden (1/8/2014)


    Thanks for the ideas, everybody.

    I got it to work, which is good, but I had to resort to <whisper>a dreaded cursor.</whisper> I know, it's a total cheat, but I'm just trying to pump a table full of fake data. I ended up resorting to the Dark Side because I couldn't get the random number stuff working any other way. (Maybe I just did something wrong?)

    You gave up?

    Here's how I did it... (not entirely sure the CROSS APPLY was necessary... but that's what happens when you Frankenstein a solution together.) In all fairness, except for the CROSS APPLY, I understood it all. I did a bunch of reading on CROSS APPLY, but apparently not enough to get it through my head.

    APPLY is similar to JOIN. Paul White's excellent articles are linked in my sig. Run through his examples and you're done.

    Here's what I came up with (warning, cursor ahead!):

    ALTER PROC AddSomeSymptoms

    @EnrollmentID INT

    AS

    BEGIN

    SET NOCOUNT ON;

    DECLARE @NextCycle INT;

    -- get the next cycle for this enrollment.

    SELECT @NextCycle = COALESCE(MAX(Cycle),0)+1

    FROM Symptom

    WHERE enrollmentID = @EnrollmentID;

    -- Is enrollee dead?

    IF NOT EXISTS (SELECT 1

    FROM Symptom s INNER JOIN Data d ON s.symptomID = d.ID

    WHERE d.Grade = 5

    AND s.enrollmentID = @EnrollmentID)

    BEGIN

    -- get next cycle for this enrollee

    INSERT INTO Symptom(Cycle, EnrollmentID, SymptomID)

    SELECT

    @NextCycle AS Cycle

    , @EnrollmentID

    , ID

    /*, n

    , Category

    , Subcategory

    , Term

    , Grade

    */

    FROM (SELECT n=1 /*UNION ALL SELECT 2*/) d

    CROSS APPLY dbo.IF_GetRandomSymptoms3 (d.n,@EnrollmentID)

    ORDER BY n;

    END

    SET NOCOUNT OFF;

    END

    I guess my only questions left are:

    (1) If I do a terrible thing and use a cursor, can I tell if an enrollee is deceased (has a Grade 5) without having to test for existence each time?

    (2) Do I even need the Cross Apply?

    I just couldn't figure out how to get the AddSymptom stuff to work for all the cycles and get different numbers of symptoms each time. Pretty much everything I tried didn't like using non-deterministic functions (Table-Valued Functions), so that's why I resorted to the Dark Side...

    I'd be more than happy to help you with solving the problems you were experiencing with a set-based solution, but not with the fallback. It's a little too Vichy France.

    Thanks for the schooling!,

    Pieter

    I can't help thinking you're skipping a class...

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I probably am skipping a class. The challenge I couldn't get around was this one:

    the "continue adding treatment cycles (groups of "symptom" records) until either an enrolled patient has a grade 5 (fatal) or the patient completes some number of cycles of treatment.

    I tried a TVF, but it wouldn't allow any non-determinant functions (like NEWID). Everything else I came up with had a fixed number of records per treatment cycle.

    I actually did read Paul's articles (Cross/Outer Apply I & II). I even remember he was calculating averages of grades... But I couldn't get it to do a random TOP VALUES query with different random @TOP values for each enrollee. So eventually I resorted to cheating off the guy next to me and using a cursor.

  • Try again with the variable number of rows. Here's a script to run in tempdb (so it won't screw anything up). After running the whole script, rerun the final query a few times.

    USE TEMPDB

    -- sample table

    SELECT TOP (50000)

    ID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    [Version] = 'CTCAE v4.03 (MedDRAv12.0)',

    Grade = 1+ABS(CHECKSUM(NEWID()))%5

    INTO Data -- mockup of symptom table

    FROM SYSCOLUMNS a, SYSCOLUMNS b

    go

    CREATE VIEW vwRandomSymptoms AS

    SELECT TOP(1+ABS(CHECKSUM(NEWID()))%10) -- between 1 and 10 rows

    *

    FROM (

    SELECT * FROM (SELECT TOP 449 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 1 ORDER BY NEWID()) d

    UNION ALL

    SELECT * FROM (SELECT TOP 300 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 2 ORDER BY NEWID()) d

    UNION ALL

    SELECT * FROM (SELECT TOP 200 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 3 ORDER BY NEWID()) d

    UNION ALL

    SELECT * FROM (SELECT TOP 50 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 4 ORDER BY NEWID()) d

    UNION ALL

    SELECT * FROM (SELECT TOP 1 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 5 ORDER BY NEWID()) d

    ) e

    ORDER BY NEWID()

    GO

    CREATE FUNCTION IF_GetRandomSymptoms

    (

    @Param INT

    )

    RETURNS TABLE

    AS

    RETURN

    (

    SELECT

    v.*,

    CycleNo = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM vwRandomSymptoms v

    WHERE v.ID = (v.ID * @Param/@Param)

    )

    GO

    -- usage

    SELECT *

    FROM (SELECT n = 1 UNION ALL SELECT 2 UNION ALL SELECT 3) d

    CROSS APPLY dbo.IF_GetRandomSymptoms (d.n)

    ORDER BY n

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Here's another method using just a view as Jeff suggested. Same table as previously, different view.

    CREATE VIEW vwSymptoms AS

    SELECT * FROM (SELECT TOP 449 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 1 ORDER BY NEWID()) d

    UNION ALL

    SELECT * FROM (SELECT TOP 300 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 2 ORDER BY NEWID()) d

    UNION ALL

    SELECT * FROM (SELECT TOP 200 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 3 ORDER BY NEWID()) d

    UNION ALL

    SELECT * FROM (SELECT TOP 50 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 4 ORDER BY NEWID()) d

    UNION ALL

    SELECT * FROM (SELECT TOP 1 * FROM Data WHERE [Version] = 'CTCAE v4.03 (MedDRAv12.0)' AND Grade = 5 ORDER BY NEWID()) d

    GO

    -- usage

    SELECT *

    FROM (SELECT EnrollmentID = 1 UNION ALL SELECT 2 UNION ALL SELECT 3) d

    CROSS APPLY (

    SELECT TOP(1+(d.EnrollmentID+ABS(CHECKSUM(NEWID())))%10)

    ID, [Version], Grade

    FROM vwSymptoms

    ORDER BY NEWID()

    ) s

    ORDER BY d.EnrollmentID

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Cruel Taskmaster <g>,

    I have a dumb question - if I use a non-looping solution, how do I indicate that no more treatment cycles should be added for an enrollee after he has a grade 5 symptom?

    thanks!

    Pieter

  • pietlinden (1/9/2014)


    Cruel Taskmaster <g>,

    I have a dumb question - if I use a non-looping solution, how do I indicate that no more treatment cycles should be added for an enrollee after he has a grade 5 symptom?

    thanks!

    Pieter

    I knew you were going to ask that! I'd suggest you ignore it for the moment and concentrate on a functional subsystem with the intention of revisiting. The thing is, without a working subsystem, it's hard to tell exactly where to interrupt the process.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Okay, I re-read SQLKiwi's articles on APPLY. Basically, the "left" side is the set of enrollees to "process", and the right side is the symptoms. Got that part. I think I tried doing APPLY, but I couldn't figure out how to use ORDER BY NewID() to randomize. I'm pretty sure that's where I gave up and cheated. (Jeez, do I have to go to the Principal's office too?)

  • Each enrollee has a (random) small number of cycles, each cycle has a (random) number of symptoms - is this correct?

    If a group 5 symptom is allocated at any point, it's marked as the last symptom of the set of symptoms/cycles for the enrollee?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Correct. It's modeling cancer treatment therapy, so sometimes they get fatal complications, and after that you can't test them anymore. (Cheery huh?)

  • pietlinden (1/9/2014)


    Correct. It's modeling cancer treatment therapy, so sometimes they get fatal complications, and after that you can't test them anymore. (Cheery huh?)

    1989/90 I worked for a while in clinical trials analysis. It wasn't much fun and I've kept clear of it since. Whilst I was there, a bubbly young lass, Maria, was taken on to key in the results of an antidepressant study. These included a paragraph or two filled in by the patient. Maria grew quieter and quieter as the months passed, and within six months of starting work she'd taken her own life. What made it doubly tragic was that her brother Tony worked there too.

    Anyway, back to business. What's the maximum number of cycles? Can they overlap? Presumably you want a datetime on each symptom row?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 15 posts - 16 through 30 (of 34 total)

You must be logged in to reply to this topic. Login to reply