Why such a big log size?

Question

Why such a big log size?

Eric Mamet

SSChampion

Points: 11728
More actions
September 13, 2018 at 5:10 am

#362908

I have a datamart with one main fact table that is about 45Gbs of data and 9Gbs of index
It contains 220 million rows
Within one MERGE statement, we inserted/updated/deleted 80 million rows, resulting with only one more million rows.
This single statement used 135Gbs of log space ???
I understand the log keep the pages before and after update, hence I could understand the log being at most twice the size of that table (with index)?
While I watched the log grow, I noticed that it went steadily to 84GBs, stayed there for quite a while, to finally reach 135Gbs.
Is there a rational explanation for this?
Thanks
Eric
PS: Further details if that is relevant
- the merge compares a "Staging" version to the final version of that fact table (about 220 million rows each side)
- there is no trigger on the fact table

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

How many indexes and how big are they? All the updates/inserts/deletes also have to happen to each of the indexes as well as the table itself. These are also logged.

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 2

To add to what Lynn correctly stated, there's a hidden overhead that most people don't even know about and those are the effects of "Bad" page splits on the log file. When one of those occurs, roughly half (normally) of the data is moved to a new page. Every single row that is moved is fully logged. The smaller the rows, the more there are per page, the more logging that must be done.

There's also a super nasty "Bad" page split that isn't as rare as people may think an those are "skewed" page splits. This is where the initial page split doesn't satisfy the need to add a row to either page and more pages are split in the process. Normal "Bad" page splits can easily take 4.5 times or more the number of bytes logged and "skewed" page splits can easily take 44 times more log file space. Paul Randal over at SQLSkills.com has a wonderful article on the hidden cost of "Bad" page splits. I don't have the link handy, though, so you'll need to look for it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eric Mamet SSChampion Points: 11728 More actions · Answer 3

There is a Clustered Index on a composed Primary Key plus a non unique index on a timestamp (Datetime)

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 4

Eric Mamet - Friday, September 14, 2018 7:41 AM
There is a Clustered Index on a composed Primary Key plus a non unique index on a timestamp (Datetime)

Mind telling us what the columns are for that Primary Key? It's a bit important to this process. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

Also, the information in the transaction log is not stored as it is in the data files. Comparing the size of the transaction log to the database files themselves is really an apple and oranges thing.

Eric Mamet SSChampion Points: 11728 More actions · Answer 6

I was "assuming" pages are entirely copied to the log (before and after)

For the columns, here it is

CREATE TABLE ... (
[FactDPRetailerMonthlyId] INT     NOT NULL,
[CountryKey]     INT     NOT NULL,
[ReportingProductGroupKey] INT     NOT NULL,
[ReportingChannelKey]  SMALLINT   NOT NULL,
[DistributionTypeKey]  SMALLINT   NOT NULL,
[BrandKey]      INT     NOT NULL,
[PeriodKey]      INT     NOT NULL,
[RawUnitsSold]     DECIMAL (19, 4) NOT NULL,
[RawSalesValueEUR]   DECIMAL (19, 4) NOT NULL,
[RawSalesValueNSEEUR]  DECIMAL (19, 4) NOT NULL,
[RawSalesValueUSD]   DECIMAL (19, 4) NOT NULL,
[RawSalesValueNSEUSD]  DECIMAL (19, 4) NOT NULL,
[RawSalesValueLocal]   DECIMAL (19, 4) NOT NULL,
[RawSalesValueNSELocal] DECIMAL (19, 4) NOT NULL,
[POSDataQuality]    TINYINT    NOT NULL,
[Coverage]      DECIMAL (18, 15) NOT NULL,
[CoverageDataQuality]  TINYINT    NOT NULL,
[PriceClassLocalKey] INT   NOT NULL,
[SmallTransactionId]   INT     NOT NULL,
[CreatedOn]      DATETIME   CONSTRAINT [DF_factDPRetailerMonthly_CreatedOn] DEFAULT (getdate()) NOT NULL,
[CreatedBy]      NVARCHAR (100) CONSTRAINT [DF_factDPRetailerMonthly_CreatedBy] DEFAULT (suser_sname()) NOT NULL,
[ForcedBySAFE]     BIT     NULL,
CONSTRAINT [PK_DPRetailer.DPRetailerDatastoreMonthly] PRIMARY KEY CLUSTERED ([CountryKey] ASC, [ReportingProductGroupKey] ASC, [PeriodKey] ASC,
                           [ReportingChannelKey] ASC, [DistributionTypeKey] ASC, [BrandKey] ASC,
                           [PriceClassLocalKey] ASC),
CONSTRAINT [FK_factDPRetailerMonthly_BrandKey] FOREIGN KEY ([BrandKey]) REFERENCES [dpretailer].[dimDPRetailerBrand] ([BrandId]),
CONSTRAINT [FK_factDPRetailerMonthly_CountryKey] FOREIGN KEY ([CountryKey]) REFERENCES [dpretailer].[dimDPRetailerCountry] ([CountryId]),
CONSTRAINT [FK_factDPRetailerMonthly_DistributionTypeKey] FOREIGN KEY ([DistributionTypeKey]) REFERENCES [shared].[dimDistributionType] ([DistributionTypeId]),
CONSTRAINT [FK_factDPRetailerMonthly_PeriodKey] FOREIGN KEY ([PeriodKey]) REFERENCES [shared].[dimPeriod] ([PeriodId]),
CONSTRAINT [FK_factDPRetailerMonthly_ReportingChannelKey] FOREIGN KEY ([ReportingChannelKey]) REFERENCES [shared].[dimReportingChannel] ([ReportingChannelId]),
CONSTRAINT [FK_factDPRetailerMonthly_ReportingProductGroupKey] FOREIGN KEY ([ReportingProductGroupKey]) REFERENCES [dpretailer].[dimDPRetailerReportingProductGroup] ([ReportingProductGroupId])
);

GO
CREATE INDEX IX_DPRetailerDatastoreMonthly_CreatedOn ON dpretailer.factDPRetailerMonthly(CreatedOn);

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

You could try running this in the database:

select * from fn_dblog(null,null);

Eric Mamet SSChampion Points: 11728 More actions · Answer 8

I'll do that next week because I have just created another log file (on a different drive) and shrank the original (which I'll delete when it becomes inactive).

Eric Mamet SSChampion Points: 11728 More actions · Answer 9

"Unfortunately", we don't seem to have much updates going through the system now...

Would the output still be useful?

Lynn Pettis - Friday, September 14, 2018 9:07 AM
You could try running this in the database:
select * from fn_dblog(null,null);

Regards

Eric

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

Eric Mamet - Monday, September 17, 2018 5:03 AM
"Unfortunately", we don't seem to have much updates going through the system now...
Would the output still be useful?
Lynn Pettis - Friday, September 14, 2018 9:07 AM
You could try running this in the database:
select * from fn_dblog(null,null);
Regards
Eric

I just wanted you to see what information is in the transaction log. Gives you an idea of why it may get bigger than you expect.

ScottPletcher SSC Guru Points: 101118 More actions · Answer 11

MERGE is notorious for some performance issues. You might want to try "UPSERT" -- an equivalent UPDATE followed by an INSERT.

For overall performance,, make sure you have the best clustering key. Nothing at all wrong with a multi-column PK, but that does not necessarily mean it should be the clustering key.

What column(s) do you (almost) always specify when querying the table?
CountryKey or ( CountryKey, ReportingProductGroupKey ) or
( CountryKey, ReportingProductGroupKey, [PeriodKey] )
OR
CreatedOn?

If it's CreatedOn, cluster on that, followed by the PK column(s) frequently used for lookup . You could still keep a separate non-clus PK with the same columns if you wanted/needed to.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Eric Mamet SSChampion Points: 11728 More actions · Answer 12

ScottPletcher - Monday, September 17, 2018 3:41 PM
MERGE is notorious for some performance issues. You might want to try "UPSERT" -- an equivalent UPDATE followed by an INSERT.
For overall performance,, make sure you have the best clustering key. Nothing at all wrong with a multi-column PK, but that does not necessarily mean it should be the clustering key.
What column(s) do you (almost) always specify when querying the table?
CountryKey or ( CountryKey, ReportingProductGroupKey ) or
( CountryKey, ReportingProductGroupKey, [PeriodKey] )
OR
CreatedOn?
If it's CreatedOn, cluster on that, followed by the PK column(s) frequently used for lookup . You could still keep a separate non-clus PK with the same columns if you wanted/needed to.

I thought the growth of the Log usage in two distinct steps made me think that indeed it was the UPSERT aspect.

I think the Clustered index is fine in regards to our requirements and yes we could split the merge into its constituents (including DELETE).
Not only it could help with the speed but it would also mean three transactions instead of one (which is good in this case since we use Simple recovery mode and we can take care of integrity ourselves).

Thanks

Eric

PS: for us, the speed is less of an issue that the potential size of the log