T-SQL Loop Question

Question

T-SQL Loop Question

David Kranes

SSCrazy

Points: 2523
More actions
April 24, 2009 at 8:08 am

#130880

Hello, I have written a loop to transfer data from one table to another. In the code below what can I do to have it increment to the next 100000 rows as to not insert duplicate rows in the target table?
DECLARE
@rows INT
BEGIN TRY
WHILE 1 = 1
BEGIN
insert into outlet_sku_xref_test
([retailer_id]
,[sku_id]
,[outlet_id]
,[threshold]
,[qty]
,[sku_price]
,[tax_rate]
,[sell_thru]
,[inv_status_1]
,[inv_status_2]
,[inv_status_3]
,[inv_status_4]
,[inv_status_5]
,[rec_update_date]
,[rec_create_date]
,[rec_update_id])
select top (100000)
[retailer_id]
,[sku_id]
,[outlet_id]
,[threshold]
,[qty]
,[sku_price]
,[tax_rate]
,[sell_thru]
,[inv_status_1]
,[inv_status_2]
,[inv_status_3]
,[inv_status_4]
,[inv_status_5]
,[rec_update_date]
,[rec_create_date]
,[rec_update_id]
from outlet_sku_xref
IF @@RowCount = 0 BREAK
END
END TRY
BEGIN CATCH
SELECT ERROR_NUMBER() ErrorNBR, ERROR_SEVERITY() Severity,
ERROR_LINE() ErrorLine, ERROR_MESSAGE() Msg
PRINT 'The Insert Did Not Complete Successfully...'
END CATCH
GO
Best Regards,
~David
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

GSquared SSC Guru Points: 260824 More actions · Answer 1

Is there a column that you can use for something like "Where ID between 1000 * (@Iteration-1) + 1 and 1000 * @Iteration"? @Iteration = 1 for the first time through the loop, and increments by 1 each pass through the loop. Thus, it will start with 1-1000, then 1001-2000, and so on.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Philipp Post Default port Points: 1401 More actions · Answer 2

I have written a loop to transfer data from one table to another. In the code below what can I do to have it increment to the next 100000 rows as to not insert duplicate rows in the target table?

The issue with this is that a set has no ordering, so you can hardly say the first 100.000, the next 100.000. What you could do is apply a WHERE clause and say for example ... WHERE retailer_id BETWEEN 100001 AND 200000, provided they are numbered without gaps. Otherwise ROW_NUMBER() might help, but you will have to verify if performance is still acceptable for you as you would select the full table content then to get a row number for all rows.

brgds

Philipp Post

brgds

Philipp Post

David Kranes SSCrazy Points: 2523 More actions · Answer 3

There is only a composite unique key.

Best Regards,

~David

GSquared SSC Guru Points: 260824 More actions · Answer 4

Can you insert the whole table into a temp table, with an ID on it, and then transfer to the other table with the temp table as the source, in batches at that step?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

mister.magoo SSC-Forever Points: 47174 More actions · Answer 5

You have a unique index, so you could use it.

INSERT YOURTABLE( blah,blah2,blah3 etc)

SELECT blah,blah2,blah3 etc

FROM YOUROTHERTABLE T1

LEFT OUTER JOIN YOURTABLE T2

ON T1.unique_index_part1=T2.unique_index_part1

AND T1.unique_index_part2=T2.unique_index_part2

AND T1.unique_index_part3=T2.unique_index_part3

WHERE T2.some_column_that_does_not_allow_nulls IS NULL

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 6

[font="Tahoma"]Using a left join is one way, but the code will progressively slow down (er, that doesn't mean it's invalid, and sometimes it's still the best performing solution because other options generally involve additional I/O.)

Another option is to create a temporary table that lists the key values and has an identity key. You can track the number of the identity key to start from with a simple counter variable, and join the temporary table to the source table to get the range of keys.

A third option is to use an order by in your insert, and start the loop by getting the last values for the composite key fields in the destination table. You then have a where clause in the insert statement to skip past those keys. That generally takes the form: (A0 > A1 or (A0 = A1 and B0 > B1)).

[/font]

GSquared SSC Guru Points: 260824 More actions · Answer 7

Actually, in SQL 2005, another solution that sometimes works is to use an Output clause to insert into a temp table, then use that to manage the loop. That works best if you have well-ordered data.

For example, with an ID column, you can Output Into a temp table, select the max value from that and assign it to a variable, truncate the temp table, and insert top 10000 where ID > the variable, then repeat as needed.

If the key columns involved have an order to them of some sort, then that method can be really, really fast.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

dbajunior SSCrazy Points: 2307 More actions · Answer 8

follow up to GSquared's first idea:

You could add an Identity field to a temp table loaded with your source data. Then load your destination table from that.

-- create temp source table

SELECT * INTO #tmpSku FROM outlet_sku_xref

-- add unique ID field

Alter table #tmpSku add sku_row_id integer IDENTITY (1,1)

insert into outlet_sku_xref_test

select

....

from #tmpSku

where sku_row_id between 1 and 10000

insert into outlet_sku_xref_test

select

....

from #tmpSku

where sku_row_id between 10001 and 20000

...and so on

Sean Lange SSC Guru Points: 286590 More actions · Answer 9

I must be missing something...would it not work if you just removed the loop and inserted the records?

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

GSquared SSC Guru Points: 260824 More actions · Answer 10

slange (4/27/2009)
I must be missing something...would it not work if you just removed the loop and inserted the records?

There are times when it's more efficient to do batches like this. Usually if you're transfering huge amounts of data.

One that I ran into recently was a table with over 100-million rows of data, and I needed to archive about 75% of it into another table. Partitioning was not an option. Just trying to run it all as a single insert followed by a delete would literally have locked up the server for days (low end server). Breaking it up into small batches, however, worked beautifully. Still took a long time, but the regular processes on the server could continue while it chugged away at the archive process.

Tried various row-counts on the insert, found that it got the most done per hour at about 7k rows per batch, with the least interference with the regular jobs on there.

Used Insert...Output Into...Select to archive, then used the contents of the temp table I was outputing into for a delete command, all wrapped in a transaction. Looped that with While @@Trancount > 0, and it just ran and ran and got the job done efficiently.

SQL can handle those huge operations, but many servers aren't up to it.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 11

GSquared (4/28/2009)
slange (4/27/2009)
I must be missing something...would it not work if you just removed the loop and inserted the records?
There are times when it's more efficient to do batches like this. Usually if you're transfering huge amounts of data.

[font="Tahoma"]As a follow up: when you do one big insert, SQL Server does it all as a single transaction. Which means you can blow out the transaction logs and potentially tempdb as well as it rebuilds indexes. While SQL Server is rebuilding indexes and the like, all sorts of locking and resource contention is caused, and for a large update that can have significant impact on performance for anything else running.

If you break it into smaller "chunks" or "batches", then you don't run the risk of SQL Server running out of capacity or turning your transaction logs into mega-gigabyte behemoths.

Another valid approach for this sort of operation is to pull the data out into a file and use SSIS or BCP or BULK INSERT to load it and avoid most of the logging. You still get the issue with index rebuilds, but there are options to commit every certain number of rows.

[/font]

GSquared SSC Guru Points: 260824 More actions · Answer 12

Yeah, I've used the export...import method too. That works well in some situations. I've found it quite advantageous when the data has to move across a slow network. FTPing a text file can be a lot faster than any sort of linked server type solution.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon