Convert from cursor to set-oriented approach

Question

Post reply

Convert from cursor to set-oriented approach

JT Lovell

Ten Centuries

Points: 1188
More actions
February 8, 2005 at 7:59 pm

#76234

I am converting some denormalized data to a normalized structure. In the process I've run into a situation where multiple values exist in a single column (separated by spaces) and I need to convert that single column into multiple rows of data. Now multiply that problem by 50 columns by 50k source rows.
The solution I came up with requires cursors.. *waits for the boos to die down* .. I'd love to simplify this to not require cursors at all, and many people have suggested in other threads that cursors are the devil and similar sentiments.
Here's your chance! Below is a test script that illustrates what I'm trying to do along with the user-function + cursor solution and proper output. It works just fine with cursors and as this is a one-time conversion the performance is plenty fast enough. Still, I'm always up for learning a better way, so feel free to suggest improvements, and thanks for any input!
CREATE FUNCTION split_multi_column(@id varchar(16), @list varchar(8000), @delimiter varchar(1) = ',')
RETURNS @tbl TABLE(
    my_id varchar(16),
    my_str varchar(80)) AS BEGIN
/*
split_multi_column
Author: JT Lovell
Based on functions written by Erland Sommarskog, SQL Server MVP from http://www.sommarskog.se/arrays-in-sql.html
*/
DECLARE
@pos      int,
@textpos int,
@chunklen smallint,
@tmpstr   varchar(8000),
@leftover varchar(8000),
@tmpval   varchar(8000)
SET @textpos = 1
SET @leftover = ''
WHILE @textpos <= datalength(@list) BEGIN
SET @chunklen = 4000 - datalength(@leftover)
SET @tmpstr = @leftover + substring(@list, @textpos, @chunklen)
SET @textpos = @textpos + @chunklen
SET @pos = charindex(@delimiter, @tmpstr)
WHILE @pos > 0 BEGIN
    SET @tmpval = ltrim(rtrim(left(@tmpstr, @pos - 1)))
    INSERT @tbl(my_id, my_str) VALUES(@id, @tmpval)
    SET @tmpstr = substring(@tmpstr, @pos + 1, len(@tmpstr))
    SET @pos = charindex(@delimiter, @tmpstr)
END
SET @leftover = @tmpstr
END
INSERT @tbl(my_id, my_str) VALUES (@id, ltrim(rtrim(@leftover)))
RETURN
END
GO
CREATE TABLE #source(
source_id varchar(16),
source_text varchar(80)
)
INSERT INTO #source(source_id,source_text)
VALUES('1','Alpha Bravo Charlie Delta Echo')
INSERT INTO #source(source_id,source_text)
VALUES('2','Fox Golf Hotel India Juliet')
CREATE TABLE #target(
target_id integer IDENTITY(1,1) NOT NULL,
source_id varchar(16),
target_type varchar(20),
target_text varchar(80)
)
DECLARE @pk varchar(16)
DECLARE @txt varchar(80)
/* normalize source data */
DECLARE cur_row CURSOR FAST_FORWARD
FOR
    SELECT s.source_id, s.source_text
    FROM #source s
OPEN cur_row
FETCH NEXT FROM cur_row INTO @pk, @txt
/* loop through each source row */
WHILE (@@FETCH_STATUS=0) BEGIN
INSERT INTO #target(source_id, target_type, target_text)
SELECT smc.my_id, 'Test', smc.my_str
FROM split_multi_column(@pk, @txt, ' ') smc
/* get next source row */
   FETCH NEXT FROM cur_row INTO @pk, @txt
END /* cursor loop */
/* close and deallocate cursor */
CLOSE cur_row
DEALLOCATE cur_row
/* this output is exactly what I am looking for */
SELECT * from #target
/*
output results:
target_id   source_id        target_type          target_text
----------- ---------------- -------------------- --------------------------------------------------------------------------------
1           1                Test                 Alpha
2           1                Test                 Bravo
3           1                Test                 Charlie
4           1                Test                 Delta
5           1                Test                 Echo
6           2                Test                 Fox
7           2                Test                 Golf
8           2                Test                 Hotel
9           2                Test                 India
10          2                Test                 Juliet
*/

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Ste_P SSCarpal Tunnel Points: 4430 More actions · Answer 1

Hi,

You can accomplish the loop without cursors:-

------------------------------------------------------

CREATE TABLE #source(

source_id varchar(16),

source_text varchar(80)

)

INSERT INTO #source(source_id,source_text)

VALUES('1','Alpha Bravo Charlie Delta Echo')

INSERT INTO #source(source_id,source_text)

VALUES('2','Fox Golf Hotel India Juliet')

CREATE TABLE #target(

target_id integer IDENTITY(1,1) NOT NULL,

source_id varchar(16),

target_type varchar(20),

target_text varchar(80)

)

DECLARE @txt varchar(80)

DECLARE @MinPK varchar(16)

DECLARE @MaxPK varchar(16)

SET @MinPK = (SELECT MIN(s.source_id) FROM #source s)

SET @MaxPK = (SELECT MAX(s.source_id) FROM #source s)

/* loop through each source row */

WHILE @MinPK <= @MaxPK

BEGIN

SET @txt = (SELECT s.source_text FROM #source s WHERE s.source_id = @MinPK)

INSERT INTO #target(source_id, target_type, target_text)

SELECT smc.my_id, 'Test', smc.my_str

FROM split_multi_column(@MinPK, @txt, ' ') smc

/* get next source row */

SET @MinPK = (SELECT MIN(s.source_id) FROM #source s WHERE s.source_id > @MinPK)

END /* loop */

/* this output is exactly what I am looking for */

SELECT * from #target

drop table #source

drop table #target

------------------------------------------------------

Gives exactly the same result.

The pros and cons of this approach would have to be explained by someone on a higher plane than me though

Have fun

Steve

We need men who can dream of things that never were.

David.Poole SSC Guru Points: 76039 More actions · Answer 2

If you use a user defined function in this way then I believe that SQL will use an implicit cursor because the function has to work on a row by row basis anyway.

LinkedIn Profile

Peter Kryszak SSCarpal Tunnel Points: 4368 More actions · Answer 3

Any other non-cursor method seems to be an academic exercise. And looping around a select statement is a poor substitute for a cursor, though it may not use as many resources, I wouldn't expect it to perform very well.

Here's a general idea of how to do it with a set based operation if the values in the combined text fields are from a known, limited set.

You could do INSERTs with values from a JOIN using LIKE:

INSERT INTO newtable SELECT ...,'KNOWN',... JOIN ... ON a.combined_value like '%KNOWN%';

INSERT INTO newtable SELECT ...,'KNOWN2',... JOIN ... ON a.combined_value like '%KNOWN2%';

INSERT INTO newtable SELECT ...,'KNOWN3',... JOIN ... ON a.combined_value like '%KNOWN3%';

David.Poole SSC Guru Points: 76039 More actions · Answer 4

The looping around the a select statement works quite well because the locks are only held for a short period of time.

LinkedIn Profile

JT Lovell Ten Centuries Points: 1188 More actions · Answer 5

Thanks for the replies! I have a few questions though:

Steve>The while loop versus a cursor operates in nearly the same way as the cursor. Is it really more efficient or just an alternate approach?

Peter> I take a similar approach for the columns that have a known set. The ones I'm doing via cursors could be anything though.

Thanks again!

Ste_P SSCarpal Tunnel Points: 4430 More actions · Answer 6

Unfortunately, I didn't really answer the original question - it is just to give you an alternative approach.

I am fairly sure they are very similar in terms of efficiency in this case.

There are other cases where you can really improve your efficiency by reducing the number of selects / database calls within the loop. Depends on circumstances.... and - as David says, the Lock time is very minimal.

Peter is closest to answering the 'set based approach question', just a pity he didn't code the answer fully...

Have fun

Steve

We need men who can dream of things that never were.

Peter Kryszak SSCarpal Tunnel Points: 4368 More actions · Answer 7

Sorry. I would have but it seems that it was not pertinent to the original post (see JT's last post). Also, I'm neck deep in mapping a government housing inspection business process into the finite-state machine of a particular CRM system. My brain is turning to mud.

noeld SSC Guru Points: 96590 More actions · Answer 8

JT

if looping is performing an expensive select, cursors are a great alternative. locking can exists while you walkthrough the cursor but if it allows you to take one pass only to the source table then things could get pretty interesting because those are the rare cases when the cursors could outperform the looping.

HTH

* Noel

JT Lovell Ten Centuries Points: 1188 More actions · Answer 9

Noel,

That's refreshing to hear! I think too many people throw out the baby with the bath water when it comes to cursors. They have their place as long as they aren't being used where set-based approaches are better.

Do cursors still lock the source rows when using a fast_forward (read-only, forward-only) cursor? I was under the impression that it didn't lock unless the FOR UPDATE keyword is used. I may be mistaken though.