Denormalizing Data

  • Hello,

    I have data in a normalized form:

    KEY1 | STRING1

    KEY1 | STRING2

    KEY1 | STRING3

    I would like to format it as:

    KEY1 | STRING1 STRING2 STRING3

    I wrote the following tsql chunk to do this, but it takes a long time. I've re-tuned it already once and was thinking of a couple of ways to make it better (removing the initial insert and making the updates themselves individual record inserts after building up the string in the loop). I'm concerned that those will only yield marginal improvements and that I need a fundamentally new way to do this.

    Anyone have any ideas?

    -----------------------

    create table DENORMALIZED(KEY CHAR(16), DENORM_STRING varchar(200))

    insert into DENORMALIZED

    select distinct KEY, ''

    from NORMALIZED_SOURCE

    --ACTUAL DENORMALIZING

    declare @charMyKey CHAR(16)

    declare @strValue varchar(20)

    declare @dteStart datetime

    declare @dteEnd datetime

    set @dteStart = (SELECT GETDATE())

    declare curmyKey CURSOR FOR

    select DISTINCT NORMALIZED_SOURCE.KEY, ORIG_STRING

    FROM NORMALIZED_SOURCE

    open curmyKey

    fetch next from curmyKey INTO @charMyKey, @strValue

    while @@FETCH_STATUS = 0

    begin

    update DENORMALIZED

    set DENORM_STRING = case DENORM_STRING

    when '' then @strValue

    else DENORM_STRING + ', ' + @strValue

    end

    where DENORMALIZED.KEY = @charMyKey

    fetch next from curmyKey into @charMyKey, @strValue

    end

    close curmyKey

    deallocate curmyKey

    set @dteEnd= (SELECT GETDATE())

    select @dteEnd - @dteStart

    ------------------------------------

  • First Try not to use a crusor.

    Put it in a function

    There are several examples on this site.

    --Creating a test table with some test data

    create table TestTable ( char(4), String varchar(10))

    -- Insert test data

    insert into TestTable

    Select 'Key1', 'String1' union

    Select 'Key1', 'String2' union

    Select 'Key1', 'String3' union

    Select 'Key2', 'xString1' union

    Select 'Key2', 'xString2' union

    Select 'Key2', 'xString3' union

    Select 'Key2', 'xString4'

    GO

    -- Create the function

    drop function ConcatStrings

    go

    create function ConcatStrings (@Key char(4))

    Returns varchar(8000)

    as

    Begin

    declare @RetString varchar(8000)

    select @RetString = coalesce(@RetString,'') + String + ' ' -- Separator is

    -- a space, you can use whatever you want

    from TestTable

    where = @key

    Return @RetString

    end

    GO

    -- Put the function inline with your select statement

    select , dbo.ConcatStrings() as String

    from TestTable

    group by

    -- need the group by because there's many keys in your table

    -- Result

    Key, String

    Key1 String1 String2 String3

    Key2 xString1 xString2 xString3 xString4

    drop table testtable

  • I'll give that a try tomorrow, but with millions of records I have a hard time thinking that a function doing a couple hundred thousand clustered index scans in that select is going to be faster.

    Either way - thanks for the idea.

  • Yeah, so I tried that system and compared it to a modified version of the modified script and the initial script is far better than either.

    The method you described using a UDF was running for 11 minutes when I killed it. The script I have pasted below takes 30 seconds on the same hardware and same data.

    Another point, it helped to set nocount on/off in my script as well, but that's no surprise.

    You mentioned that there are other solutions on the site - I can't find them so I must be using the wrong search terms. Can someone point me in the right direction?

    Thanks,

    Greg

    -----------------------

    create table DENORMALIZED(KEY CHAR(16), DENORM_STRING varchar(200))

    SET NOCOUNT ON

    declare @charMyKey CHAR(16)

    declare @charOldKey char(16)

    declare @strValue varchar(20)

    declare @strBigValue varchar(200)

    declare @dteStart datetime

    declare @dteEnd datetime

    set @dteStart = (SELECT GETDATE())

    declare curmyKey CURSOR FOR

    select DISTINCT , STRING

    from TestTAble

    order by , STRING

    open curmyKey

    fetch next from curmyKey INTO @charMyKey, @strValue

    set @charOldKey = (select '')

    while @@FETCH_STATUS = 0

    begin

    if @charOldKey = @charMyKey

    begin

    --keys are same, append to big string

    set @strBigValue = (select @strBigValue + ' '+ @strValue)

    end

    else

    begin

    --keys are different

    --update table

    if @charOldKey != ''

    begin

    insert into DENORMALIZED

    values (@charOldKey, @strBigValue)

    end

    --restart key/value

    set @charOldKey = (select @charMyKey)

    set @strBigValue = (select @strValue)

    end

    fetch next from curmyKey into @charMyKey, @strValue

    end

    --do the update one last time for the last record

    insert into DENORMALIZED

    values (@charOldKey, @strBigValue)

    close curmyKey

    deallocate curmyKey

    set @dteEnd= (SELECT GETDATE())

    select @dteEnd - @dteStart

    SET NOCOUNT OFF

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply