October 23, 2016 at 6:27 am
Hi all,
I must transfer a big table (231k rows not that big, but 735GB big because of blob) to another location by INSERT INTO SELECT FROM...
It appears that first the whole source table is being loaded before any record is actually inserted into the destination table. But there's not enough diskspace because tempdb just grows and grows.
So to find a solution for this I thought if I loop through batches of let say 1000 or 5000 recs, maybe the tempdb wouldn't grow that much? Or do I need the space (double the table size?) anyway?
So, how do I set a counter from rec 1 to 1000; select those recs and insert them; then upscale the counter to 1001 to 2000 and so on?
Thanks in advance!
October 23, 2016 at 6:38 am
Super_Grover (10/23/2016)
Hi all,I must transfer a big table (231k rows not that big, but 735GB big because of blob) to another location by INSERT INTO SELECT FROM...
It appears that first the whole source table is being loaded before any record is actually inserted into the destination table. But there's not enough diskspace because tempdb just grows and grows.
So to find a solution for this I thought if I loop through batches of let say 1000 or 5000 recs, maybe the tempdb wouldn't grow that much? Or do I need the space (double the table size?) anyway?
So, how do I set a counter from rec 1 to 1000; select those recs and insert them; then upscale the counter to 1001 to 2000 and so on?
Thanks in advance!
Do you have a unique key on the table? If so then it's easy to use that key for the batch sizes.
😎
231K rows totaling at 735Gb equals 3.25 Mb p. row, what kind of data are you storing there?
October 23, 2016 at 7:31 am
Eirikur Eiriksson (10/23/2016)
Super_Grover (10/23/2016)
Hi all,I must transfer a big table (231k rows not that big, but 735GB big because of blob) to another location by INSERT INTO SELECT FROM...
It appears that first the whole source table is being loaded before any record is actually inserted into the destination table. But there's not enough diskspace because tempdb just grows and grows.
So to find a solution for this I thought if I loop through batches of let say 1000 or 5000 recs, maybe the tempdb wouldn't grow that much? Or do I need the space (double the table size?) anyway?
So, how do I set a counter from rec 1 to 1000; select those recs and insert them; then upscale the counter to 1001 to 2000 and so on?
Thanks in advance!
Do you have a unique key on the table? If so then it's easy to use that key for the batch sizes.
😎
231K rows totaling at 735Gb equals 3.25 Mb p. row, what kind of data are you storing there?
Oh sorry forgot to describe the table.
It's for testing/staging/comparing purposes and it's the first rough version (not under our control/design) so there are no key or indexes defined. Nothing.
It's just 3 columns:
-doc_number (nvarchar20)
-doc_name (nvarchar255)
-blob (varbinary(max)
Some data:
doc_number | doc_name | BLOB
PSD1 | | 0x025546955565D31546B..........
PSD2 | | 0x025546955565D31546B...........
PSD3 | | 0x025546955565D31546B............
PSD4 | | 0x025546955565D31546B............
I'm not concerned with the actual data but these are healthcare images
So, no ID or index or nothing. I think it must be done by rowcount or something? Any ideas how I can loop through batches of recs?
October 23, 2016 at 7:49 am
Is doc_number unique?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 23, 2016 at 7:54 am
Phil Parkin (10/23/2016)
Is doc_number unique?
It is unique AFAIKS but it's not an unbroken sequnece. In other words: it does not stay in sync with the row numbers (doc_numbers are missing so to say)
October 23, 2016 at 8:06 am
Super_Grover (10/23/2016)
Phil Parkin (10/23/2016)
Is doc_number unique?It is unique AFAIKS but it's not an unbroken sequnece. In other words: it does not stay in sync with the row numbers (doc_numbers are missing so to say)
OK, then you could easily create a temp table with an identity column and insert all of the doc_numbers into that. Then join from the temp table on doc_number and you have your numbering scheme.
Untested code to get you started
Create table #doc_numbers(RowNum int identity(1,1) primary key clustered, doc_number nvarchar(20) not null)
insert #doc_numbers (doc_number)
select doc_number
from tbl
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 23, 2016 at 8:29 am
Phil Parkin (10/23/2016)
Super_Grover (10/23/2016)
Phil Parkin (10/23/2016)
Is doc_number unique?It is unique AFAIKS but it's not an unbroken sequnece. In other words: it does not stay in sync with the row numbers (doc_numbers are missing so to say)
OK, then you could easily create a temp table with an identity column and insert all of the doc_numbers into that. Then join from the temp table on doc_number and you have your numbering scheme.
Untested code to get you started
Create table #doc_numbers(RowNum int identity(1,1) primary key clustered, doc_number nvarchar(20) not null)
insert #doc_numbers (doc_number)
select doc_number
from tbl
Thank you Phil. But wouldn't that cost me even more diskspace by creating and filling another table?
I think I'm more looking for a way do the INSERT INTO SELECT FROM in small batches.
October 23, 2016 at 9:57 am
What I'm trying to archieve is something like this untested (and incorrect) code (just wrote it down)
DECLARE @counter INT = 0
DECLARE @RwCnt INT = @@rowcount
WHILE @counter <= @RwCnt
BEGIN
INSERT INTO SELECT TOP (1000) FROM .........
SET @counter = @counter + 1000
END
Only problem is I don't know how to specify the record batch in the SELECT statement. How to link the select to the @counter.
I'm trying to make loops of 1000 records.
October 23, 2016 at 11:26 am
Super_Grover (10/23/2016)
Phil Parkin (10/23/2016)
Super_Grover (10/23/2016)
Phil Parkin (10/23/2016)
Is doc_number unique?It is unique AFAIKS but it's not an unbroken sequnece. In other words: it does not stay in sync with the row numbers (doc_numbers are missing so to say)
OK, then you could easily create a temp table with an identity column and insert all of the doc_numbers into that. Then join from the temp table on doc_number and you have your numbering scheme.
Untested code to get you started
Create table #doc_numbers(RowNum int identity(1,1) primary key clustered, doc_number nvarchar(20) not null)
insert #doc_numbers (doc_number)
select doc_number
from tbl
Thank you Phil. But wouldn't that cost me even more diskspace by creating and filling another table?
I think I'm more looking for a way do the INSERT INTO SELECT FROM in small batches.
I think you missed my point somewhat. Yes, it will use disk space, of course. But not much. And once you have the temp table populated, you can do something like this (also untested):
set nocount, xact_abort on;
declare @MinId int, @MaxId int;
select @MinId = min(RowNum), @MaxId = max(RowNum)
from #doc_numbers;
declare @Increment int = 1000;
declare @StartRange int = @MinId;
declare @EndRange int = @StartRange + @Increment;
declare @NumInserted int;
while @StartRange <=MaxId
begin
begin transaction;
insert target(col1, col2)
select t.col1, t.col2
from tbl t join #doc_numbers n on t.doc_number = n.doc_number
where n.RowNum >= @StartRange and n.RowNum < @EndRange
commit transaction;
set @StartRange = @EndRange;
set @EndRange = @StartRange + @Increment;
end
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 23, 2016 at 12:16 pm
Thanks again Phil. Yes I missed your point. Sorry for that.
Now with the extra code example I think it might be an excellent solution.
I'll start with it right away and hope I can get the code right.
I'll let you know how it goes.
October 23, 2016 at 1:47 pm
Okay, so I created the temp table and populated it. No problem.
I altered the other script so that in my opinion it should work. But now it states that 'Column name or number of supplied values does not match table definition'. So basicly it sais that source and destination tables do not match.
But I used the exact same INSERT querie that worked fine.
Is it possible that because of the extra join clause causes this?
Here my original working statement:
INSERT INTO [P2WIDB002].[CS_ConversieAanlevering_VWB].[dbo].[MULTIMED_INHOUD]
SELECT * FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD]
and just altered it to this:
INSERT INTO [P2WIDB002].[CS_ConversieAanlevering_VWB].[dbo].[MULTIMED_INHOUD]
SELECT * FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD] AS ca
JOIN #doc_numbers AS dn ON ca.DOCUMENTNUMMER = dn.doc_number
WHERE dn.RowNum >= @StartRange and dn.RowNum < @EndRange
So is it the extra join?
October 23, 2016 at 2:58 pm
You've just found a pretty good example of why SELECT * is often regarded as bad practice.
Usually, I would suggest that you specify all of the column names explicitly. But in this case, the shortcut is changing this line
SELECT * FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD] AS ca
to this
SELECT ca.* FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD] AS ca
Qualify the * with the source table alias and it will no longer mean that columns from the temp table are being included in the SELECT.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 23, 2016 at 3:14 pm
Phil Parkin (10/23/2016)
You've just found a pretty good example of why SELECT * is often regarded as bad practice.Usually, I would suggest that you specify all of the column names explicitly. But in this case, the shortcut is changing this line
SELECT * FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD] AS ca
to this
SELECT ca.* FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD] AS ca
Qualify the * with the source table alias and it will no longer mean that columns from the temp table are being included in the SELECT.
Thank you again! Yes, I knew it is bad practice but it was stronger than me 😉 Learned my lesson I guess 🙂
Unfortunately I ran into another message which is (I guess) of some whoel other kind: The operation could not be performed because OLE DB provider "SQLNCLI11" for linked server "P2WIDB002" was unable to begin a distributed transaction.
Destination server is a linked server over VPN tunnel. But it worked fine before.
I found some info on it but it probably has to do with the server config, so I need to see our sysadmin. On the remote linked server we cannot administer anything...
October 23, 2016 at 11:37 pm
Super_Grover (10/23/2016)
I knew it is bad practice ...
As with everything else in SQL Server, "It Depends".
--Jeff Moden
Change is inevitable... Change for the better is not.
October 24, 2016 at 7:09 am
Okay we're trying to solve this but I'll mark Phil's post as solution.
Thanks again!
Viewing 15 posts - 1 through 15 (of 43 total)
You must be logged in to reply to this topic. Login to reply