January 31, 2020 at 7:25 am
Ahoi,
i have the following table:
CREATE TABLE [DST].[MD_Materialwerksdaten_Historisierte_Eigenschaften](
[TimeDayId] [date] NULL,
[Material_Werksdaten_key] [int] NOT NULL,
[ABC_Kennzeichen_Key] [int] NOT NULL,
[Herstellerwerk_key] [int] NOT NULL,
[Vertriebstatus_key] [int] NOT NULL,
[Disponent_Key] [int] NOT NULL,
[Sonderbeschaffungsart] [nvarchar](2) NULL
) ON [PRIMARY]
GO
Data is only planned to be inserted at the end of the month, 4 million new rows each month in one insert.
I have different Tables/Views that have always the same way to access this table:
select a.Columns....
,ABC_Kennzeichen_Key=isnull(b.ABC_Kennzeichen_Key,1)
,Sonderbeschaffungsart=isnull(b.Sonderbeschaffungsart,1)
,Disponent_Key=isnull(b.Disponent_Key,1)
,Vertriebstatus_key=isnull(b.Vertriebstatus_key,1)
,Herstellerwerk_key=isnull(b.Herstellerwerk_key,1)
from dst.Example_Table a
left join DST.MD_Materialwerksdaten_Historisierte_Eigenschaften b
on a.Material_Werk_Key = b.Material_Werksdaten_key
and year(a.TimeDayId) = year(b.TimeDayId)
and month(a.TimeDayId) = month(b.TimeDayId
My question is how should i index this table?
From my basic understand of indexes i would have gone for something like this.
CREATE NONCLUSTERED INDEX [TestIndex] ON [DST].[MD_Materialwerksdaten_Historisierte_Eigenschaften]
(
[Material_Werksdaten_key] ASC,
[TimeDayId] ASC
)
INCLUDE (
[ABC_Kennzeichen_Key],
[Herstellerwerk_key],
[Vertriebstatus_key],
[Disponent_Key],
[Sonderbeschaffungsart]
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
I want to be the very best
Like no one ever was
January 31, 2020 at 7:50 am
Based on what you said I would change the table itself and have 2 new columns made up of the year and and month part of TimeDayId so the index could be used correctly. otherwise the functions will slow it down
/*
create 2 new computed clumns based on TimeDayId
*/
create table DST.MD_Materialwerksdaten_Historisierte_Eigenschaften
( YearId as year(TimeDayId) persisted
, MonthId as month(TimeDayId) persisted
, TimeDayId date null
, Material_Werksdaten_key int not null
, ABC_Kennzeichen_Key int not null
, Herstellerwerk_key int not null
, Vertriebstatus_key int not null
, Disponent_Key int not null
, Sonderbeschaffungsart nvarchar(2) null
)
go
/*
this index assumes that TimeDayID is always for the current month being inserted
if this is incorrect but if Material_Werksdaten_key is a ever ascending key then we could move Material_Werksdaten_key to be the first column of the index
*/
create clustered index CI_MD_Materialwerksdaten_Historisierte_Eigenschaften on DST.MD_Materialwerksdaten_Historisierte_Eigenschaften
(YearId
, MonthId
, Material_Werksdaten_key
)
-- with (data_compression = page) -- if you use compression - probably advisable
January 31, 2020 at 8:23 am
The idea with splitting sounds logical.
Why go for a clustered index in this situation?
Because the uniques granted through Material and Time?
I guess include is not relevant in this since the clustered orders the data itself and there is not the problem with "covering index" in like with the nonclustered?
I want to be the very best
Like no one ever was
January 31, 2020 at 10:18 am
How will the data be read most of the time? Since the clustered index defines data storage, I usually lean towards making the clustered index the most common access path to the data. Then, as needed, we can add other indexes, with or without include (every index does not have to be covering).
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 31, 2020 at 10:41 am
How will the data be read most of the time? Since the clustered index defines data storage, I usually lean towards making the clustered index the most common access path to the data. Then, as needed, we can add other indexes, with or without include (every index does not have to be covering).
It is always a big fact table/view with usually more rows than this table.
What i need to do is add the 5 columns mentioned to each row of the fact table.
There is not a single other access or query outside of this single access structure and there will not be.
I want to be the very best
Like no one ever was
January 31, 2020 at 3:16 pm
Then that first suggestion is probably the better one. Still, testing, testing, testing. Don't guess. Know.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 31, 2020 at 6:12 pm
You should cluster the table on:
( Material_Werksdaten_key, TimeDayId )
Your join is very confusing. You need to specify the TimeDayId(s) directly, not use a function on it.
Is TimeDayId one value per day or only value for the entire month (such as the first day or the last day of the month only)?
Why are you checking for any date within the month? Is it so that if the specific day is missing, you can join on some other day in the month?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 31, 2020 at 6:31 pm
You should cluster the table on:
( Material_Werksdaten_key, TimeDayId )
Your join is very confusing. You need to specify the TimeDayId(s) directly, not use a function on it.
Is TimeDayId one value per day or only value for the entire month (such as the first day or the last day of the month only)?
Why are you checking for any date within the month? Is it so that if the specific day is missing, you can join on some other day in the month?
should it? if "Material_Werksdaten_key" is what I think it is (fk to Material plan data) having it as first column on the clustered index, followed by a date key would kind of create an index that needs page splits for its inserts
January 31, 2020 at 7:04 pm
Yes, there will be some limited page splits. But the reads against the table will be vastly more efficient, even with the splits.
It's certainly possible that [TimeDayId] then [Material_Werksdaten_key] would be better clustering, but that would take considerably more details and analysis. And that wouldn't help with performance at all until the TimeDayId issue is resolved, so that a function is not needed against the column to do a lookup.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 31, 2020 at 7:16 pm
So "TimeDayId" is not a particularly accurate name for a DATE type column imo. My suggestion would be to rename the column to "DayDate" or something similar
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
January 31, 2020 at 7:26 pm
Yes, there will be some limited page splits. But the reads against the table will be vastly more efficient, even with the splits.
It's certainly possible that [TimeDayId] then [Material_Werksdaten_key] would be better clustering, but that would take considerably more details and analysis. And that wouldn't help with performance at all until the TimeDayId issue is resolved, so that a function is not needed against the column to do a lookup.
No it wouldn't solve it. but pages split probably not that limited as it is 4 million records added in one go
And the use of the function is what lead me to my suggestion to add the 2 new columns to avoid part of the function issue.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply