Creating partially matched datasets

  • I want to create an experimental and control group to analyze statistically. I want the match to be random with the following constraints:

    id1<>id2

    age1=age2

    sex1=sex2

    I don't want id2 or id1 to repeat, even though there may be different numbers of them in table 1 and table 2

    How can I best accomplish this and have unique pairs of id numbers. Tables 1 and 2 are already created as temps with the critical conditions.

  • Have you looked at the RAND function (Random number generation) for example to generate your Id value ?:

    DECLARE @I DECIMAL(10,8)

    SET @I = RAND( (DATEPART(mm, GETDATE()) * 100000 )

    + (DATEPART(ss, GETDATE()) * 1000 )

    + DATEPART(ms, GETDATE()) )

    SET @I = @I * 100

    SELECT @I

    WAITFOR DELAY '00:00:01'

    GO 10

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Thanks, but the problem is not the randomization--I can accomplish that with a local variable and newid().

    the problem is getting unique values from a table that has

    aaa sss

    aaa ttt

    bbb ttt

    bbb sss

    What I want is:

    aaa sss

    bbb ttt

    or

    aaa ttt

    bbb sss

    where the two columns represent the unique id's of individuals

  • Use NTILE() to divide each partition into two parts (control and experimental) and then use ROW_NUMBER() to assign a number to each record within the group and match corresponding numbers. If you want tested code, provide DDL and sample data with expected results.

    Another option is to just use ROW_NUMBER() and then match based on rn%2 = (rn + 1)%2 (where rn is the alias for the ROW_NUMBER() results).

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks, Drew. My CIO, who was in meetings all day, suggested row_number--I'll look at NTILE as well as his version and your version of row_Number. Pretty sure what I need is there.

    Have a great, happy, healthy, and ABOVE ALL prosperous 2012.

    Sam

  • Since you are wanting to get each value only once something like this should work.

    ;with data (col1, col2)

    as (

    select 'aaa', 'sss' union all

    select 'aaa', 'ttt' union all

    select 'bbb', 'ttt' union all

    select 'bbb', 'sss'

    )

    select col1, col2

    from

    (

    select col1, row_number() over(order by col1) as RowNum1

    from

    (

    select distinct col1 from data

    ) s

    ) x

    left join

    (

    select col2, row_number() over(order by col2) as RowNum2

    from

    (

    select distinct col2 from data

    ) s2

    )y on y.RowNum2 = x.RowNum1

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • thanks, I'll try it, albeit without the first part as I have 162,000 records. 🙂

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply