Return random records in a table-valued function?

  • What's the maximum number of cycles? Can they overlap? Presumably you want a datetime on each symptom row?

    As best I can remember, protocols run for a non-predetermined period of time, usually over a year. "Cycles" are more at the individual patient level. For example, Patient X will enroll on 2/1 and then be treated every two weeks until he leaves the protocol for some reason or the protocol ends (enough data has been collected to prove/disprove the protocol's hypothesis. (Is that vague or what?) What this means when generating fake data is that each protocol has a somewhat random time limit. Especially for enrollment. Also for treatment. So for the active enrollment period, enrolled patients are receiving treatment and new patients are enrolling on a weekly basis.

    If I were going to do this in a semi-realistic way, I would loop days and add a random number of enrollments and then maybe go back and find all the people enrolled as of Protocol Week N and generate some symptoms for each one. (so think of a bunch of number lines that mostly end at the Close date, but start on or after the Open To Enroll Date. Then they repeat on two week periods. The time doesn't have to be perfect - it can be an approximation, really. The basic idea is that each enrollee reports a set of symptoms in each cycle. The whole cycle thing was weird at first until I realized it was essentially a proxy for a date -- the starting date and ending date didn't matter, it was the reported symptoms in the middle that did. So it went like this:

    Enroll----(C1)---(C2)---(C3)....--(C(n)) where C(n) was the final treatment Cycle. And for each n, there would always be symptoms.

    Was definitely a creepy job in some ways. Was okay as long as you didn't look too closely at what you were doing. If you stopped and thought about the patients being people, then it got kinda eerie. Not something I wanted to do long term, for sure.

  • Made a little progress. I added some columns just to understand what's going on (since I'm sort of stumbling around in the Dark Ages). Here's my latest attempt:

    DECLARE @MyProtocol VARCHAR(20) = 'LUN90';

    SELECT *

    FROM (SELECT EnrollmentID FROM enroll WHERE e_ProtocolNo = @MyProtocol AND LeaveDate IS NULL) d

    CROSS APPLY (

    SELECT TOP(1+(d.EnrollmentID+ABS(CHECKSUM(NEWID())))%20)

    vwSymptoms.ID

    , Data.Category

    , Data.Subcategory

    , Data.Term

    , Data.Grade

    , 1+ABS(CHECKSUM(NEWID()))%20 AS FakeCycle

    FROM vwSymptoms INNER JOIN Data ON vwSymptoms.ID = Data.ID

    ORDER BY NEWID()

    ) s

    ORDER BY d.enrollmentID,

    FakeCycle;

    The @Protocol parameter is just to make it easier to get my head around (by selecting a protocol with only 3 enrollees). The one part I can't sort out is how to group the symptoms into (enrollmentID,Cycle) groups. Once I have that part, I'm home free... I can turn this into an INSERT and I'm good to go.

    Thanks!

    "My mother told me not to ... use cursors!

    My mother told me not to ... use cursors!..."

  • okay... I think this fixes the "you can't report symptoms after you're dead" issue...

    CREATE VIEW vwEnrolleeFatalCycle

    AS

    SELECT enrollmentID

    , MIN(cycle) AS FatalCycle

    FROM Symptom INNER JOIN Data ON Symptom.SymptomID = Data.ID

    WHERE Grade = 4

    GROUP BY enrollmentID;

    -- get everything AFTER this and delete it.

    DELETE FROM Symptom

    FROM Symptom s inner join vwEnrolleeFatalCycle AS efc

    ON s.enrollmentID = efc.enrollmentID

    WHERE s.Cycle > efc.FatalCycle;

    That FROM... FROM... syntax threw me!

    Okay, now back to your regularly scheduled programming. And no cursors. (Cursers, sure!)

  • pietlinden (1/9/2014)


    ...

    "My mother told me not to ... use cursors!

    My mother told me not to ... use cursors!..."

    My mother cursed all the bl00dy time!

    How are you fixed for this now Pieter? I've got an idea or two for tackling those cases where a grade 4 occurs, but it looks like you've found a solution already.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks for the help. I think it's closed. I tested it with grade 4, just to short-circuit the record creation, since the "fatal" grade is somewhat arbitrary. In a nutshell, generate the symptoms by querying system tables, and then just use a delete query to remove the "impossible" records (the ones that occur after a fatal event). Might not be pretty, but it works, and the data is fake anyway.

    For a minute I was wondering why there was only one protocol with data, but then I realized that I limited the create stored procedure to a single protocol... I guess I can safely remove that now. It was just one of those "more data is not necessarily better" ... at least not at that moment.

    But it works a champ now. I guess for fun, I should do the performance comparisions between the cursing and the proper way to do it... <g> Just to convince myself never to play with that stuff again!

Viewing 5 posts - 31 through 34 (of 34 total)

You must be logged in to reply to this topic. Login to reply