April 20, 2010 at 7:40 am
I had trouble removing the clustered index on InvoicedDate (it was taking a long time), but I was finally able to do it. I tested the stored procedure with all indexes removed except InvoicedDate, and the log file still jumped to 7G. Once I removed the InvoicedDate index, the log file only reached slightly over 2G. It is interesting to see how much of an impact the clustered (non-unique) index had on the transaction log.
I still have a identity key that I was not able to remove without removing the field itself. The identity key is required as the data provided no guaranteed unique keys. The primary key is non-clustered because it's order means nothing. I instead opted to cluster on the InvoicedDate field. The primary key doesn't appear to have a significant impact on the transaction log. Slightly over 2G is consistant with the other steps, which use slightly over 1G of log space. The extra log space can easily be accounted for by the increased number of rows, as well as the increase row size (the additional of several fields via lookup).
My preference is to get it under 2G, with enough room to spare that I don't have to worry about it growing. So, it looks like I may still have to break the data into smaller chunks. So now it becomes a matter of which provides the best performance. From what I'm seeing, recreating indexes seems expensive. However, building the data in chunks took over twice as long as building the data all at once. It looks like the difference in performance will be pretty nominal. Since building the data in chunks is simpler overall than some hybrid solution, looks like seperating the data is the way to go.
Thanks again for all your help!
April 20, 2010 at 9:32 am
Thanks for the update, and I am surprised at the difference in the log with the clustered index removal. I would tend to agree that it doesn't necessarily make sense.
I'm wondering in terms of your timing and chunking, could you insert the data differently? Maybe do something to avoid page splitting?
Likely this is a challenging problem, and I'd actually encourage you to start another thread and post some code, see what suggestions people might have for sections to try and change the way it's being load in the proc.
April 20, 2010 at 10:12 am
Well, ultimately breaking it into blocks gives me complete control over the transaction log size. In this specific case, disk space is more important than performance. I can trade performance for disk space by reducing the block size. Also, the fixed block size means that the transaction log will not increase in size as a result of more source data. This solution seems to be acceptable.
I ended up with this:
DECLARE @i INT, @max-2 INT, @blocksize INT
SET @i = 0
SET @blocksize = 50000
SELECT @max-2 = MAX(id) FROM temp.OrderPrep
WHILE @i <= @max-2
BEGIN
IF @debug = 1 SELECT @i AS , @max-2 AS [MAX]
INSERT INTO fact.CustomerOrder
SELECT
-- SELECT List removed for brevity and privacy
FROM
temp.OrderPrep OP
LEFT JOIN dim.Customer C on C.CustomerID = OP.CustomerID
OUTER APPLY Lookup.dbo.fn_SalesRegion(
CASE
WHEN C.CustomerType = 'OEM-N' then 'NAT'
WHEN C.Channel = 'EXPORT' then 'EXP'
ELSE ShipToState
END, ShipToCountry) SR
OUTER APPLY Lookup.dbo.fn_IRRegion(ShipToState, ShipToZipcode, ShipToCountry) IRR
WHERE OP.[id] <= @i AND (OP.[id] > @i - @blocksize)
IF @debug = 1 PRINT 'Checkpoint ' + CAST(@i AS VARCHAR(50))
CHECKPOINT
SET @i = @i + @blocksize
END
Which really isn't much more code than the original script. It's not the most elegant solution in the world, but it works. Frankly, I've already spent too much time on this. File bloat, however, can become serious, so it was imperative that I spend however much time was needed to solve it the problem.
Besides SSIS, I can't think of any other ways of moving the data. I've avoided SSIS for a couple of reasons. The first is because I am unsure about the OUTER APPLY capabilities via the UDFs. The other is because this code works, and I want to minimize the possibility of introducing new bugs. That second reason applies to any other method of moving data.
February 20, 2011 at 11:22 pm
I agree doing this in a loop or in chunks will give complete control. I too have experienced same problem.
I have a question here : bcp will log or NOT, while inserting data to a table?
February 21, 2011 at 12:29 am
There is no such thing as an unlogged operation. Every data modification will log.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 21, 2011 at 12:55 am
Yes. My assumption was correct.
There is nothing which cant be logged. But most of the people will think bcp will not be logged where as a simple insert command will log, that's why bcp is fast and dont eat disk space while insertion.
But the fact is bcp will do same inserts in batches or chunks so we dont see much usage of disk(logfile) even we are inserting huge data.
February 21, 2011 at 2:13 am
brainy (2/21/2011)
But most of the people will think bcp will not be logged where as a simple insert command will log, that's why bcp is fast and dont eat disk space while insertion.
I've not seen too many people assume that bcp isn't logged. If fact, if the DB is in full recovery, it'll be logged much like a normal insert
But the fact is bcp will do same inserts in batches or chunks so we dont see much usage of disk(logfile) even we are inserting huge data.
Not necessarily. If the DB is in full or bulk-logged recovery and a log backup doesn't run between the batches, the log can grow. bcp is minimally logged in bulk-logged and simple recovery, not in full recovery.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 21, 2011 at 3:14 am
Hi Gail,
Thanks for your comments.
What is the difference between minimally logged and fully logged. As per my knowledge, only the difference is, in minimally logged the the log will be reused after committing the running transaction. Where as in fully logged the it is not the case. It will be reused only after taking full backup.
(This is the main difference).
Correct me if I am wrong.
February 21, 2011 at 4:10 am
brainy (2/21/2011)
What is the difference between minimally logged and fully logged. As per my knowledge, only the difference is, in minimally logged the the log will be reused after committing the running transaction. Where as in fully logged the it is not the case. It will be reused only after taking full backup.(This is the main difference).
Completely wrong.
Fully logged and minimally logged operations have nothing at all to do with when the log space is reused.
Certain operations (and check books online for details) can be minimally logged. This means that instead of SQL logging the complete details of the change (eg in a fully logged insert it will write the values of columns into the transaction log), SQL just logs the details of what pages it changed. This can massively reduce the amount of transaction log used.
Minimally logged operations are (with a couple of exceptions) only possible in bulk-logged or simple recovery models. In full recovery, SQL logs the changes fully. Even in those recovery models, only some operations can be minimally logged (bcp, bulk insert, index rebuilds) and only under certain circumstances
The reuse of the log has to do with recovery model, not the type of operation performed. In simple recovery only the log is reused after a checkpoint, in all other recovery models a log backup is required. See today's headline article for more on transaction log reuse.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 22, 2011 at 6:43 am
It is amazing that 9GB of disk space is a problem these days. You can fit that on a USB key now! 😀 Hopefully the batching process works successfully and for a long time and you can still continue to meet your build window.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 23, 2011 at 3:35 am
TheSQLGuru (2/22/2011)
It is amazing that 9GB of disk space is a problem these days. You can fit that on a USB key now! 😀
While true, I'm really not sure I'd want to run a database off one! (Oh, and USB keys are usually formatted as FAT32 which limits you to 2Gb files... :-P).
March 21, 2013 at 4:27 pm
i experienced some thing similar. how do you explain it.
datafile size: 17 GB
log size: 12GB
objective: migration from 2005 to 2008 by the external vendor on site.
i noticed 2005 version is using 2000 comparability level.
so i changed the compatibility level to 100. run consistency check.
as db in full recovery mode i changed to simple recovery mode.
As i knew they going to drop indexes and rebuild and stats update changes to objects..etc.
i truncated the log and set to 5 GB.
My expectation: because it is in simple recovery and log should not grow much.
actual: it has grown to 48 GB.
i could not find the reason why it has grown that big.
how do you explain this?
March 21, 2013 at 5:25 pm
Please post new questions in a new thread. Thanks
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply