Insert loop

Question

Post reply

Insert loop

lulu_lutz

SSCrazy

Points: 2155
More actions
June 6, 2011 at 1:34 pm

#252271

I have a HUGE table (TableA), and I need to insert data from TableA to TableB. I have developed this script, but it loops back through to the beginning every time the rowcount reaches 10,000. How do I get it to commit 10,000 and start at the next row rather than the beginning?
----script----
declare @CommitSize int
set @CommitSize = 10000
while @CommitSize = 10000
begin
begin tran
Insert into dbo.TableB
select top 10000 ACCESS_INSTANT, PROCESS_ID, ACCESS_TIME from TableA a
--omit duplicate records--
where not exists (Select ACCESS_INSTANT from dbo.TableB b where
a.ACCESS_INSTANT = b.ACCESS_INSTANT)
set @CommitSize = @@RowCount
commit tran
end

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 1

Do tell why you decided to use a loop. Also, please provide the DDL for TableA and TableB.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

lulu_lutz SSCrazy Points: 2155 More actions · Answer 2

I'm using a loop to commit the insert after 10,000 rows are selected. TableA is around 800 million rows, so I have to commit after x amount of rows, so my logs don't fill up.

1. TableA and TableB are in different databases on the same server

2. Both tables are exactly the same, but TableA is going to be decommissioned after we get the data into TableB

3. ACCESS_INSTANT (PK) in both tables

4. ACCESS_INSTANT is the field used to omit duplicate records

5. Need the loop to start at the next row after commit...not start at the beginning like the script below does (select top 10000)

-----Disregard my script in the other post - there was a small error...here is the correct script-----

declare @CommitSize int

set @CommitSize = 10000

while @CommitSize = 10000

begin

begin tran

Insert into dbo.TableB

select top 10000 ACCESS_INSTANT, PROCESS_ID, ACCESS_TIME from TableA a

--omit duplicate records--

where not exists (Select ACCESS_INSTANT from dbo.TableA a where

a.ACCESS_INSTANT = b.ACCESS_INSTANT)

set @CommitSize = @@RowCount

commit tran

end

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 3

Also, please provide the DDL for TableA and TableB.

I still think your query has a problem...where is the table with alias b?

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

lulu_lutz SSCrazy Points: 2155 More actions · Answer 4

No the script is running, but it's taking over a week because it's looping back to the beginning of the table after committing 10,000 rows. In my original script I have dbname.tablea and dbname.tableb

lulu_lutz SSCrazy Points: 2155 More actions · Answer 5

I just need direction on how to commit N rows and move onto the next row without looping to the beginning of the table after the commit.

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 6

sqluser_8119 (6/8/2011)
I just need direction on how to commit N rows and move onto the next row without looping to the beginning of the table after the commit.

Third try...please provide DDL for your tables and a working query. Your query references a table with an alias "b" but there is no table in your query with that alias.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Phil Parkin SSC Guru Points: 246956 More actions · Answer 7

Have you considered using SSIS to do this? It could handle the commits as you wish.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

lulu_lutz SSCrazy Points: 2155 More actions · Answer 8

No I haven't thought about SSIS...that's a great idea. Thank you!

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 9

If you go with SSIS consider having it run on one thread, select the data out of TableA in clustered index order and do a sorted-insert into TableB.

If you have it run on multiple threads -or- you have it one on one thread but don't follow the sequence above you'll have ridiculous table fragmentation at the end of the sync process for a table of that size.

Nothing against the suggestion because it will get the job done and be relatively simple to implement, but all things considered (single threaded or multi-threaded plus post-process table maintenance) T-SQL will blow the doors off SSIS in terms of performance and ease of implementation. SSIS will take the data out of the SQL memory space just to put it right back in. You've pretty much got the task done in 15 lines of T-SQL code but either way, good luck.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

gofrancesc Hall of Fame Points: 3269 More actions · Answer 10

Hello,

as I deduce from your query, the field ACCESS_INSTANT has unique values. Is TableA indexed by this field?, is TableB indexed by this field too?

If so you can alter your code fetching sorted data:

declare @CommitSize int

set @CommitSize = 10000

DECLARE @LastInstant AS ???

while @CommitSize = 10000

begin

begin tran

SELECT @LastInstant = MAX(ACCESS_INSTANT) FROM dbo.TableB

-- Add here some code to give a value when TableB is empty

Insert into dbo.TableB

select top 10000 ACCESS_INSTANT, PROCESS_ID, ACCESS_TIME from TableA a

--omit duplicate records--

where ACCESS_INSTANT > @LastInstant

ORDER BY ACCESS_INSTANT

set @CommitSize = @@RowCount

commit tran

end

There is still a problem if your apps can insert new data not in ascending order, but that can be ignored until a last step.

Regards,

Francesc

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 11

That will still require another access to the base table. There is a way to do it without touching the table again, and we know the query he shared is broken, but he won't share any DDL so it's difficult to provide a working suggestion.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Crispin Proctor SSCoach Points: 18161 More actions · Answer 12

as has been suggested, why why why touch the source table again.

What you are doing is telling 10000 people to step forward, checking to see you've not spoken to them yet and then deciding what to do.

On plenty occasions I have used loops to either move or delete massive amounts of data. They work well. However, you need to get something distinguishable in the source.

You don't need to use top either. I assume you have something to identify (share the DDL...) each record in the source like a ident or date etc? Why no loop by day and move 1 day at a time.

The concept is the same.

Another, and quicker way, would be bcp.

bcp mydb..mytable out c:\SpaceForAveryLargeFile.bcp -S(local) -T -m

and to push it back in

bcp mydb..mytable in c:\SpaceForAveryLargeFile.bcp -S(local) -T -m

if the file is on the local server (or a fast network share) it'll be much much quicker than moving it.

You can also compress the bcp file quite nicely with rar or zip to move it to new location.

Credit to you for waiting 2 weeks to see if it runs - I'm so lazy I would have found a faster way 1.9 weeks ago 😉

edit:

IIRC, I moved around 300m rows from a very wide table in an hour or so. (Phil can count the rows as it's not his table:-P)

Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 13

Crispin Proctor (6/10/2011)
as has been suggested, why why why touch the source table again.
What you are doing is telling 10000 people to step forward, checking to see you've not spoken to them yet and then deciding what to do.
On plenty occasions I have used loops to either move or delete massive amounts of data. They work well. However, you need to get something distinguishable in the source.
You don't need to use top either. I assume you have something to identify (share the DDL...) each record in the source like a ident or date etc? Why no loop by day and move 1 day at a time.
The concept is the same.
Another, and quicker way, would be bcp.
bcp mydb..mytable out c:\SpaceForAveryLargeFile.bcp -S(local) -T -m
and to push it back in
bcp mydb..mytable in c:\SpaceForAveryLargeFile.bcp -S(local) -T -m
if the file is on the local server (or a fast network share) it'll be much much quicker than moving it.
You can also compress the bcp file quite nicely with rar or zip to move it to new location.
Credit to you for waiting 2 weeks to see if it runs - I'm so lazy I would have found a faster way 1.9 weeks ago 😉
edit:
IIRC, I moved around 300m rows from a very wide table in an hour or so. (Phil can count the rows as it's not his table:-P)

Hey Crispin, we're on the same page with the additional table access. I am confused a little by parts of your post though...why are we talking about network and compression? <sarcastic>I'll assume that was a teaching item thrown out there for the benefit of the passerby because these are two DBs on the same instance. I'm setting myself up here but I am intrigued...I agree when moving data between disparate instances bcp is a good option due to the fact you can compress the file before shipping it over, but if a fast area network is involved, or there is no network involved as in this case, then how does bcp (disk I/O) become a better option than SSIS (RAM only)?

Seventh try including my previous post, Crispin's post and now this one...please share the DDL...help us, help you 😀

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Crispin Proctor SSCoach Points: 18161 More actions · Answer 14

yup, purely for passers by and "covering all the bases" I would bang my head against a wall if the response was "well, it takes to long to copy the file".

My normal approach is to split the file with winrar into 100MB chunks and repair volumes. Makes it easy to copy the file and if there is a problem, winrar can fix it 😀

I'm still keen on the bcp on same instance for massive tables because of the batching and lack of contention within the system.

BCP will hammer SQL and disk. Reverse happens on the way in. Using SSIS needs memory (and a lot of it if it has half a chance).

To be honest, SSIS vs bcp might be neck on neck, one might win over the other. I think they would be close regardless. Given that it takes < 30 seconds to write the batch file, I normally opt for that.

If I was a rich man with lots of memory, I would create a RAM disk and dump to that 😎

edit:

the fast network thing stems from someone once doing it (on my say-so) across a WAN. 6GB file but compressed down to < 1. 😉

Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!