Denormalizing Data

Question

Post reply

Denormalizing Data

Greg Knaddison

SSC Eights!

Points: 859
More actions
October 10, 2005 at 4:39 pm

#95377

Hello,
I have data in a normalized form:
KEY1 | STRING1
KEY1 | STRING2
KEY1 | STRING3
I would like to format it as:
KEY1 | STRING1 STRING2 STRING3
I wrote the following tsql chunk to do this, but it takes a long time. I've re-tuned it already once and was thinking of a couple of ways to make it better (removing the initial insert and making the updates themselves individual record inserts after building up the string in the loop). I'm concerned that those will only yield marginal improvements and that I need a fundamentally new way to do this.
Anyone have any ideas?
-----------------------
create table DENORMALIZED(KEY CHAR(16), DENORM_STRING varchar(200))
insert into DENORMALIZED
select distinct KEY, ''
from NORMALIZED_SOURCE
--ACTUAL DENORMALIZING
declare @charMyKey CHAR(16)
declare @strValue varchar(20)
declare @dteStart datetime
declare @dteEnd datetime
set @dteStart = (SELECT GETDATE())
declare curmyKey CURSOR FOR
select DISTINCT NORMALIZED_SOURCE.KEY, ORIG_STRING
FROM NORMALIZED_SOURCE
open curmyKey
fetch next from curmyKey INTO @charMyKey, @strValue
while @@FETCH_STATUS = 0
begin
update DENORMALIZED
set DENORM_STRING = case DENORM_STRING
when '' then @strValue
else DENORM_STRING + ', ' + @strValue
end
where DENORMALIZED.KEY = @charMyKey
fetch next from curmyKey into @charMyKey, @strValue
end
close curmyKey
deallocate curmyKey
set @dteEnd= (SELECT GETDATE())
select @dteEnd - @dteStart
------------------------------------

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Ray M SSC-Insane Points: 21093 More actions · Answer 1

First Try not to use a crusor.

Put it in a function

There are several examples on this site.

--Creating a test table with some test data

create table TestTable ( char(4), String varchar(10))

-- Insert test data

insert into TestTable

Select 'Key1', 'String1' union

Select 'Key1', 'String2' union

Select 'Key1', 'String3' union

Select 'Key2', 'xString1' union

Select 'Key2', 'xString2' union

Select 'Key2', 'xString3' union

Select 'Key2', 'xString4'

GO

-- Create the function

drop function ConcatStrings

go

create function ConcatStrings (@Key char(4))

Returns varchar(8000)

as

Begin

declare @RetString varchar(8000)

select @RetString = coalesce(@RetString,'') + String + ' ' -- Separator is

-- a space, you can use whatever you want

from TestTable

where = @key

Return @RetString

end

GO

-- Put the function inline with your select statement

select , dbo.ConcatStrings() as String

from TestTable

group by

-- need the group by because there's many keys in your table

-- Result

Key, String

Key1 String1 String2 String3

Key2 xString1 xString2 xString3 xString4

drop table testtable

Greg Knaddison SSC Eights! Points: 859 More actions · Answer 2

I'll give that a try tomorrow, but with millions of records I have a hard time thinking that a function doing a couple hundred thousand clustered index scans in that select is going to be faster.

Either way - thanks for the idea.

Greg Knaddison SSC Eights! Points: 859 More actions · Answer 3

Yeah, so I tried that system and compared it to a modified version of the modified script and the initial script is far better than either.

The method you described using a UDF was running for 11 minutes when I killed it. The script I have pasted below takes 30 seconds on the same hardware and same data.

Another point, it helped to set nocount on/off in my script as well, but that's no surprise.

You mentioned that there are other solutions on the site - I can't find them so I must be using the wrong search terms. Can someone point me in the right direction?

Thanks,

Greg

-----------------------

create table DENORMALIZED(KEY CHAR(16), DENORM_STRING varchar(200))

SET NOCOUNT ON

declare @charMyKey CHAR(16)

declare @charOldKey char(16)

declare @strValue varchar(20)

declare @strBigValue varchar(200)

declare @dteStart datetime

declare @dteEnd datetime

set @dteStart = (SELECT GETDATE())

declare curmyKey CURSOR FOR

select DISTINCT , STRING

from TestTAble

order by , STRING

open curmyKey

fetch next from curmyKey INTO @charMyKey, @strValue

set @charOldKey = (select '')

while @@FETCH_STATUS = 0

begin

if @charOldKey = @charMyKey

begin

--keys are same, append to big string

set @strBigValue = (select @strBigValue + ' '+ @strValue)

end

else

begin

--keys are different

--update table

if @charOldKey != ''

begin

insert into DENORMALIZED

values (@charOldKey, @strBigValue)

end

--restart key/value

set @charOldKey = (select @charMyKey)

set @strBigValue = (select @strValue)

end

fetch next from curmyKey into @charMyKey, @strValue

end

--do the update one last time for the last record

insert into DENORMALIZED

values (@charOldKey, @strBigValue)

close curmyKey

deallocate curmyKey

set @dteEnd= (SELECT GETDATE())

select @dteEnd - @dteStart

SET NOCOUNT OFF