September 13, 2018 at 5:10 am
I have a datamart with one main fact table that is about 45Gbs of data and 9Gbs of index
It contains 220 million rows
Within one MERGE statement, we inserted/updated/deleted 80 million rows, resulting with only one more million rows.
This single statement used 135Gbs of log space ???
I understand the log keep the pages before and after update, hence I could understand the log being at most twice the size of that table (with index)?
While I watched the log grow, I noticed that it went steadily to 84GBs, stayed there for quite a while, to finally reach 135Gbs.
Is there a rational explanation for this?
Thanks
Eric
PS: Further details if that is relevant
- the merge compares a "Staging" version to the final version of that fact table (about 220 million rows each side)
- there is no trigger on the fact table
September 13, 2018 at 7:44 am
How many indexes and how big are they? All the updates/inserts/deletes also have to happen to each of the indexes as well as the table itself. These are also logged.
September 13, 2018 at 8:40 am
To add to what Lynn correctly stated, there's a hidden overhead that most people don't even know about and those are the effects of "Bad" page splits on the log file. When one of those occurs, roughly half (normally) of the data is moved to a new page. Every single row that is moved is fully logged. The smaller the rows, the more there are per page, the more logging that must be done.
There's also a super nasty "Bad" page split that isn't as rare as people may think an those are "skewed" page splits. This is where the initial page split doesn't satisfy the need to add a row to either page and more pages are split in the process. Normal "Bad" page splits can easily take 4.5 times or more the number of bytes logged and "skewed" page splits can easily take 44 times more log file space. Paul Randal over at SQLSkills.com has a wonderful article on the hidden cost of "Bad" page splits. I don't have the link handy, though, so you'll need to look for it.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 14, 2018 at 7:41 am
There is a Clustered Index on a composed Primary Key plus a non unique index on a timestamp (Datetime)
September 14, 2018 at 8:16 am
Eric Mamet - Friday, September 14, 2018 7:41 AMThere is a Clustered Index on a composed Primary Key plus a non unique index on a timestamp (Datetime)
Mind telling us what the columns are for that Primary Key? It's a bit important to this process. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
September 14, 2018 at 8:24 am
Also, the information in the transaction log is not stored as it is in the data files. Comparing the size of the transaction log to the database files themselves is really an apple and oranges thing.
September 14, 2018 at 8:46 am
I was "assuming" pages are entirely copied to the log (before and after)
For the columns, here it is
CREATE TABLE ... (
[FactDPRetailerMonthlyId] INT NOT NULL,
[CountryKey] INT NOT NULL,
[ReportingProductGroupKey] INT NOT NULL,
[ReportingChannelKey] SMALLINT NOT NULL,
[DistributionTypeKey] SMALLINT NOT NULL,
[BrandKey] INT NOT NULL,
[PeriodKey] INT NOT NULL,
[RawUnitsSold] DECIMAL (19, 4) NOT NULL,
[RawSalesValueEUR] DECIMAL (19, 4) NOT NULL,
[RawSalesValueNSEEUR] DECIMAL (19, 4) NOT NULL,
[RawSalesValueUSD] DECIMAL (19, 4) NOT NULL,
[RawSalesValueNSEUSD] DECIMAL (19, 4) NOT NULL,
[RawSalesValueLocal] DECIMAL (19, 4) NOT NULL,
[RawSalesValueNSELocal] DECIMAL (19, 4) NOT NULL,
[POSDataQuality] TINYINT NOT NULL,
[Coverage] DECIMAL (18, 15) NOT NULL,
[CoverageDataQuality] TINYINT NOT NULL,
[PriceClassLocalKey] INT NOT NULL,
[SmallTransactionId] INT NOT NULL,
[CreatedOn] DATETIME CONSTRAINT [DF_factDPRetailerMonthly_CreatedOn] DEFAULT (getdate()) NOT NULL,
[CreatedBy] NVARCHAR (100) CONSTRAINT [DF_factDPRetailerMonthly_CreatedBy] DEFAULT (suser_sname()) NOT NULL,
[ForcedBySAFE] BIT NULL,
CONSTRAINT [PK_DPRetailer.DPRetailerDatastoreMonthly] PRIMARY KEY CLUSTERED ([CountryKey] ASC, [ReportingProductGroupKey] ASC, [PeriodKey] ASC,
[ReportingChannelKey] ASC, [DistributionTypeKey] ASC, [BrandKey] ASC,
[PriceClassLocalKey] ASC),
CONSTRAINT [FK_factDPRetailerMonthly_BrandKey] FOREIGN KEY ([BrandKey]) REFERENCES [dpretailer].[dimDPRetailerBrand] ([BrandId]),
CONSTRAINT [FK_factDPRetailerMonthly_CountryKey] FOREIGN KEY ([CountryKey]) REFERENCES [dpretailer].[dimDPRetailerCountry] ([CountryId]),
CONSTRAINT [FK_factDPRetailerMonthly_DistributionTypeKey] FOREIGN KEY ([DistributionTypeKey]) REFERENCES [shared].[dimDistributionType] ([DistributionTypeId]),
CONSTRAINT [FK_factDPRetailerMonthly_PeriodKey] FOREIGN KEY ([PeriodKey]) REFERENCES [shared].[dimPeriod] ([PeriodId]),
CONSTRAINT [FK_factDPRetailerMonthly_ReportingChannelKey] FOREIGN KEY ([ReportingChannelKey]) REFERENCES [shared].[dimReportingChannel] ([ReportingChannelId]),
CONSTRAINT [FK_factDPRetailerMonthly_ReportingProductGroupKey] FOREIGN KEY ([ReportingProductGroupKey]) REFERENCES [dpretailer].[dimDPRetailerReportingProductGroup] ([ReportingProductGroupId])
);
GO
CREATE INDEX IX_DPRetailerDatastoreMonthly_CreatedOn ON dpretailer.factDPRetailerMonthly(CreatedOn);
September 14, 2018 at 9:07 am
You could try running this in the database:
select * from fn_dblog(null,null);
September 14, 2018 at 9:30 am
I'll do that next week because I have just created another log file (on a different drive) and shrank the original (which I'll delete when it becomes inactive).
September 17, 2018 at 5:03 am
"Unfortunately", we don't seem to have much updates going through the system now...
Would the output still be useful?
Lynn Pettis - Friday, September 14, 2018 9:07 AMYou could try running this in the database:
select * from fn_dblog(null,null);
Regards
Eric
September 17, 2018 at 9:15 am
Eric Mamet - Monday, September 17, 2018 5:03 AM"Unfortunately", we don't seem to have much updates going through the system now...Would the output still be useful?
Lynn Pettis - Friday, September 14, 2018 9:07 AMYou could try running this in the database:
select * from fn_dblog(null,null);Regards
Eric
I just wanted you to see what information is in the transaction log. Gives you an idea of why it may get bigger than you expect.
September 17, 2018 at 3:41 pm
MERGE is notorious for some performance issues. You might want to try "UPSERT" -- an equivalent UPDATE followed by an INSERT.
For overall performance,, make sure you have the best clustering key. Nothing at all wrong with a multi-column PK, but that does not necessarily mean it should be the clustering key.
What column(s) do you (almost) always specify when querying the table?
CountryKey or ( CountryKey, ReportingProductGroupKey ) or
( CountryKey, ReportingProductGroupKey, [PeriodKey] )
OR
CreatedOn?
If it's CreatedOn, cluster on that, followed by the PK column(s) frequently used for lookup . You could still keep a separate non-clus PK with the same columns if you wanted/needed to.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 18, 2018 at 2:27 am
ScottPletcher - Monday, September 17, 2018 3:41 PMMERGE is notorious for some performance issues. You might want to try "UPSERT" -- an equivalent UPDATE followed by an INSERT.For overall performance,, make sure you have the best clustering key. Nothing at all wrong with a multi-column PK, but that does not necessarily mean it should be the clustering key.
What column(s) do you (almost) always specify when querying the table?
CountryKey or ( CountryKey, ReportingProductGroupKey ) or
( CountryKey, ReportingProductGroupKey, [PeriodKey] )
OR
CreatedOn?If it's CreatedOn, cluster on that, followed by the PK column(s) frequently used for lookup . You could still keep a separate non-clus PK with the same columns if you wanted/needed to.
I thought the growth of the Log usage in two distinct steps made me think that indeed it was the UPSERT aspect.
I think the Clustered index is fine in regards to our requirements and yes we could split the merge into its constituents (including DELETE).
Not only it could help with the speed but it would also mean three transactions instead of one (which is good in this case since we use Simple recovery mode and we can take care of integrity ourselves).
Thanks
Eric
PS: for us, the speed is less of an issue that the potential size of the log
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply