Help with Data transformation

  • I have a table with column that has codes in a consolidtaed form as follows

    create table Code(

    id int

    codecol varchar(20))

    data is as follows

    1 'ABCDE'

    2 'DCE'

    3 'EFGH'

    4 'ABC'

    The code col needs to be transformed into a target table which has columns as

    ID A B C D E F G H

    1 1 1 1 1 1 0 0 0

    2 0 0 1 1 1 0 0 0

    3 0 0 0 0 1 1 1 1

    4 1 1 1 0 0 0 0 0

    Basically every character gets transformed to a bit column if value exists/not for a specific id. since this data doesnt have a seperator in the source table whats teh easiest/best way to transform it to the target table. any help will be greatly apprecaited. TIA

  • Hi,

    Well I'm not sure if this is the 'optimal' solution... But it does work! If you are running this across millions of rows it might be a bit slow.

    create table #Code(

    id int,

    codecol varchar(20))

    insert #code

    select 1, 'ABCDE'

    union all select 2, 'DCE'

    union all select 3 , 'EFGH'

    union all select 4 , 'ABC'

    select id , a= case when codecol like '%a%' then 1 else 0 end ,

    b= case when codecol like '%b%' then 1 else 0 end ,

    c= case when codecol like '%c%' then 1 else 0 end ,

    d= case when codecol like '%d%' then 1 else 0 end ,

    e= case when codecol like '%e%' then 1 else 0 end ,

    f= case when codecol like '%f%' then 1 else 0 end ,

    g= case when codecol like '%g%' then 1 else 0 end ,

    h= case when codecol like '%h%' then 1 else 0 end

    from #code

    Does that help?

    B

  • Seeing no way to improve on the CASE logic in the proposed solution, I tried using a CHARINDEX test instead of a LIKE test just for grins.

    a= case when charindex('a',codecol) > 0 then 1 else 0 end

    Testing with 400,000 rows, the LIKE test ran consistently faster if only by a fraction of a second.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob Hovious (12/2/2008)


    Seeing no way to improve on the CASE logic in the proposed solution

    Heh... "Must look eye..." 😛

    SET STATISTICS TIME ON

    SELECT ID,

    SIGN(CHARINDEX('a',CodeCol)) AS a,

    SIGN(CHARINDEX('b',CodeCol)) AS b,

    SIGN(CHARINDEX('c',CodeCol)) AS c,

    SIGN(CHARINDEX('d',CodeCol)) AS d,

    SIGN(CHARINDEX('e',CodeCol)) AS e,

    SIGN(CHARINDEX('f',CodeCol)) AS f,

    SIGN(CHARINDEX('g',CodeCol)) AS g

    FROM #Code

    SET STATISTICS TIME OFF

    Shaves ~20% off the CPU time... duration is limited by the display, in this case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ... and now I know there is a SIGN function... which of course returns just exactly what is needed for setting the "bits". Remind me never to take you on at Trivial Pursuit. 🙂

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks Guys i used the charindex without sign function to resolve this. Thanks for everybody;s input.

  • Bob Hovious (12/3/2008)


    ... and now I know there is a SIGN function... which of course returns just exactly what is needed for setting the "bits". Remind me never to take you on at Trivial Pursuit. 🙂

    Heh... aren't forums great? Right or wrong, I learn something new every day here. Glad to pass it forward.

    My (too short a time) programming mentor way way back (rocks don't live as long) in High School taught me, "If you want to learn a {computer} language, the very first thing you should do is become very familiar with it's instrinsic functions." He was right.

    As always, thanks for the constructive feedback, Bob. I really appreciate it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ishaan99 (12/3/2008)


    Thanks Guys i used the charindex without sign function to resolve this. Thanks for everybody;s input.

    You bet... thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply