October 10, 2005 at 4:39 pm
Hello,
I have data in a normalized form:
KEY1 | STRING1
KEY1 | STRING2
KEY1 | STRING3
I would like to format it as:
KEY1 | STRING1 STRING2 STRING3
I wrote the following tsql chunk to do this, but it takes a long time. I've re-tuned it already once and was thinking of a couple of ways to make it better (removing the initial insert and making the updates themselves individual record inserts after building up the string in the loop). I'm concerned that those will only yield marginal improvements and that I need a fundamentally new way to do this.
Anyone have any ideas?
-----------------------
create table DENORMALIZED(KEY CHAR(16), DENORM_STRING varchar(200))
insert into DENORMALIZED
select distinct KEY, ''
from NORMALIZED_SOURCE
--ACTUAL DENORMALIZING
declare @charMyKey CHAR(16)
declare @strValue varchar(20)
declare @dteStart datetime
declare @dteEnd datetime
set @dteStart = (SELECT GETDATE())
declare curmyKey CURSOR FOR
select DISTINCT NORMALIZED_SOURCE.KEY, ORIG_STRING
FROM NORMALIZED_SOURCE
open curmyKey
fetch next from curmyKey INTO @charMyKey, @strValue
while @@FETCH_STATUS = 0
begin
update DENORMALIZED
set DENORM_STRING = case DENORM_STRING
when '' then @strValue
else DENORM_STRING + ', ' + @strValue
end
where DENORMALIZED.KEY = @charMyKey
fetch next from curmyKey into @charMyKey, @strValue
end
close curmyKey
deallocate curmyKey
set @dteEnd= (SELECT GETDATE())
select @dteEnd - @dteStart
------------------------------------
October 10, 2005 at 5:07 pm
First Try not to use a crusor.
Put it in a function
There are several examples on this site.
--Creating a test table with some test data
create table TestTable ( char(4), String varchar(10))
-- Insert test data
insert into TestTable
Select 'Key1', 'String1' union
Select 'Key1', 'String2' union
Select 'Key1', 'String3' union
Select 'Key2', 'xString1' union
Select 'Key2', 'xString2' union
Select 'Key2', 'xString3' union
Select 'Key2', 'xString4'
GO
-- Create the function
drop function ConcatStrings
go
create function ConcatStrings (@Key char(4))
Returns varchar(8000)
as
Begin
declare @RetString varchar(8000)
select @RetString = coalesce(@RetString,'') + String + ' ' -- Separator is
-- a space, you can use whatever you want
from TestTable
where = @key
Return @RetString
end
GO
-- Put the function inline with your select statement
select , dbo.ConcatStrings() as String
from TestTable
group by
-- need the group by because there's many keys in your table
-- Result
Key, String
Key1 String1 String2 String3
Key2 xString1 xString2 xString3 xString4
drop table testtable
October 10, 2005 at 5:16 pm
I'll give that a try tomorrow, but with millions of records I have a hard time thinking that a function doing a couple hundred thousand clustered index scans in that select is going to be faster.
Either way - thanks for the idea.
October 14, 2005 at 3:38 pm
Yeah, so I tried that system and compared it to a modified version of the modified script and the initial script is far better than either.
The method you described using a UDF was running for 11 minutes when I killed it. The script I have pasted below takes 30 seconds on the same hardware and same data.
Another point, it helped to set nocount on/off in my script as well, but that's no surprise.
You mentioned that there are other solutions on the site - I can't find them so I must be using the wrong search terms. Can someone point me in the right direction?
Thanks,
Greg
-----------------------
create table DENORMALIZED(KEY CHAR(16), DENORM_STRING varchar(200))
SET NOCOUNT ON
declare @charMyKey CHAR(16)
declare @charOldKey char(16)
declare @strValue varchar(20)
declare @strBigValue varchar(200)
declare @dteStart datetime
declare @dteEnd datetime
set @dteStart = (SELECT GETDATE())
declare curmyKey CURSOR FOR
select DISTINCT , STRING
from TestTAble
order by , STRING
open curmyKey
fetch next from curmyKey INTO @charMyKey, @strValue
set @charOldKey = (select '')
while @@FETCH_STATUS = 0
begin
if @charOldKey = @charMyKey
begin
--keys are same, append to big string
set @strBigValue = (select @strBigValue + ' '+ @strValue)
end
else
begin
--keys are different
--update table
if @charOldKey != ''
begin
insert into DENORMALIZED
values (@charOldKey, @strBigValue)
end
--restart key/value
set @charOldKey = (select @charMyKey)
set @strBigValue = (select @strValue)
end
fetch next from curmyKey into @charMyKey, @strValue
end
--do the update one last time for the last record
insert into DENORMALIZED
values (@charOldKey, @strBigValue)
close curmyKey
deallocate curmyKey
set @dteEnd= (SELECT GETDATE())
select @dteEnd - @dteStart
SET NOCOUNT OFF
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply