April 28, 2009 at 12:52 am
Hi,
I've loaded 170 million rows of data from an old Orcale DW to a staging DB (SQL2008, simple recovery model) and I'm now trying to fetch/update the table with FK keys from the new DW. My problem is that the log expands so much that I must split the data by years/weeks and then always release the disk space before the next update interval starts. What would be a good way to do this?
I would try the following:
update STG
Set ItemKEY = D.ItemKEY
From dbo.STG_OldDW STG JOIN DW.dbo.D_ITEM D ON (STG.ItemCode=D.ItemCode)
Where YEAR = 2004 and WEEK <= 26
go
DBCC SHRINKFILE(N'STG_OldDW_log', ???) -- a specific amount or allow it to minimum?
GO
update STG
Set ItemKEY = D.ItemKEY
From dbo.STG_OldDW STG JOIN DW.dbo.D_ITEM D ON (STG.ItemCode=D.ItemCode)
Where YEAR = 2004 and WEEK > 26
go
DBCC SHRINKFILE(N'STG_OldDW_log', ???)
GO
Etc…….
Any help would be appreciated, Ville
April 28, 2009 at 1:02 am
Your approach of updating in batches is right method and following it up with Shrink means you have limited disk space.
Is when you do date range does the rows split to 50000 rows per update or less then it makes it easier on 4Gb RAM server.
If above preassumption is right I dont see any problem with your approach.
Cheer Satish 🙂
April 28, 2009 at 1:17 am
There is still about 15 million rows when split like in my example.
OK, thanks. I'll just continue like I planned, or maybe I need to split even more
Ville
April 28, 2009 at 7:37 am
Ville Lucander (4/28/2009)
Hi,I've loaded 170 million rows of data from an old Orcale DW to a staging DB (SQL2008, simple recovery model) and I'm now trying to fetch/update the table with FK keys from the new DW. My problem is that the log expands so much that I must split the data by years/weeks and then always release the disk space before the next update interval starts. What would be a good way to do this?
I would try the following:
update STG
Set ItemKEY = D.ItemKEY
From dbo.STG_OldDW STG JOIN DW.dbo.D_ITEM D ON (STG.ItemCode=D.ItemCode)
Where YEAR = 2004 and WEEK 26
go
DBCC SHRINKFILE(N'STG_OldDW_log', ???)
GO
Etc…….
Any help would be appreciated, Ville
Is your database recovery model set to full?
If so you can write with in a loop and with in a transaction, data can be updated for every one week. Ensure that transaction log backup runs for every 15 minutes or so, that way all in active log entires are being flushed and transacation log releases space to accomodate for new transactions.
Given below is the code.
declare @minweek tinyint, @maxweek tinyiny,@sqlstring nvarchar(200)
set @minweek =1
set @maxweek = 52
while @minweek '+ convert(char(2),@minweek)+''''
print @sqlstring
Begin Transaction
I have commented out actual execution part, to execute it uncomment it and then execute it.
-- exec sp_executesql @sqlstring
Commit
set @minweek = @minweek+ 1
end
April 28, 2009 at 7:51 am
Hi
No it's running o Simple recovery model. But I could still use a loop construct and expand it a little to cover years 2004-2008 and just execute DBCC shrinkfile.... after each week.
Thanks, ville
April 29, 2009 at 10:41 am
You shouldn't need to do a dbcc shrinkfile if you're in simple recovery mode - once each transaction commits, the log space will be available to be re-used for the next transaction. Manually shrinking the log creates much more work for the server than is required.
April 29, 2009 at 10:58 am
Howard is right. You shouldn't need to release space. Set the log to be the largest batch based on the year/month you are doing. After each commit, the space will be reused.
April 30, 2009 at 12:59 am
Hi,
I'm on vacation but briefly. Yes, It makes very much sence what you say... but do I need to add explicite 'Begin Tran - Commit Tran' when running it in a loop like below?
declare @STARTweek tinyint, @ENDweek tinyint, @sqlstring nvarchar(200), @Year smallint
set @Year = 2004
WHILE @Year <= 2009
BEGIN
Set @STARTweek = 1
Set @ENDweek = 5
WHILE @ENDweek = ' + CAST(@STARTweek as varchar(2)) + N' and VIIKKO <= ' + CAST(@ENDweek as varchar(2))
raiserror (@sqlstring,0,1) with nowait
execute (@sqlstring)
raiserror ('SHRINKFILE(N''STG_VanhaDW_log'')',0,1) with nowait
DBCC SHRINKFILE(N'STG_VanhaDW_log')
Set @STARTweek += 5
Set @ENDweek += 5
END
Set @Year += 1
END
Ville
April 30, 2009 at 2:01 am
Hi Ville,
No, you don't necessarily need the begin tran, commit tran. Take the shrinkfile out though, it's not required.
By definition, if the file can be shrunk, then the disk space you're recovering is unused anyway. If you shrink the file, the next update has to claim the space from the OS all over again and depending on your log file growth settings, that could be very expensive and also cause fragmentation.
Cheers,
Howard
April 30, 2009 at 2:15 am
Hello Howard.
still wondering that when does the server have time to clear the log when I run the loop with 170 million rows. It is one batch/transaction, isn't it?
Steve mentioned earlier that I should make the log space big enough to hold the data of the biggest period of rows btu I'dl ike to run this as a whole (the loop example). (I might have to run this and similar ones with other dimensions in the history fact several times before all is done)
Is there any way to accomplish this or do I have to run it in smaller sets
Ville
April 30, 2009 at 2:27 am
In Simple recovery mode, the log space is recovered as soon as the transaction ends - this is a very lightweight process that only puts a pointer that the log space can be overwritten.
The log needs to be big enough to hold the largest transaction you'll run, so if you run one update statement that updates 170 million rows, then there's no getting around the fact that your transaction log needs to be big enough to accomodate this.
If you split it up into multiple updates (even if contained within a loop) then you only need a log big enough to accommodate each indivudual update statement.
Howard
April 30, 2009 at 2:32 am
Hi,
sounds reasonable. I'll just start a test loop and see how the log file behaves
thanks, ville
April 30, 2009 at 2:58 am
Hi Howard,
yes, it works just like you described
update STG
Set TuoteKEY = D.TuoteKEY
From dbo.STG_KIINTIO_VK_vDW STG JOIN DW.dbo.D_TUOTE D ON (STG.TuoteKoodi=D.TuoteKoodi)
Where VUOSI = 2004 and VIIKKO >= 1 and VIIKKO = 6 and VIIKKO <= 10
Etc...
Thanks again, ville
April 30, 2009 at 6:51 pm
Hello
I am new to Sql Server 2008 bulk load (Oracle DBA entrusted with SQL Server now); Soon i will be dealing with 2TB of data /year ; 17GB/day assuming 365 days of dataload. I am too reading about similar case scenarios. Did you give a thought to changing the database mode to BULK LOGGING prior to dataload and then putting it back to Simple. When the db is in SIMPLE mode the logging should not taking place. I am surprised it is happening. Again i am no expert in SS and new to this type of bulk data load stuff.
HTH
May 4, 2009 at 12:21 am
Hi,
changing to Bulk-logged would not have helped in this scenario. Updating existing column values is always logged. You can use bulk-logged when using bcp, select * into table from table2, etc,,,
and all transactions are always logged even though you are running the db in Simple recovery model but the log space is marked reusable after each transaction finishes.
Ville
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply