Index suggestions for table

Question

Post reply

Index suggestions for table

ktflash

Default port

Points: 1400

January 31, 2020 at 7:25 am

#3721897

Ahoi,

i have the following table:

CREATE TABLE [DST].[MD_Materialwerksdaten_Historisierte_Eigenschaften](
[TimeDayId] [date] NULL,
[Material_Werksdaten_key] [int] NOT NULL,
[ABC_Kennzeichen_Key] [int] NOT NULL,
[Herstellerwerk_key] [int] NOT NULL,
[Vertriebstatus_key] [int] NOT NULL,
[Disponent_Key] [int] NOT NULL,
[Sonderbeschaffungsart] [nvarchar](2) NULL
) ON [PRIMARY]
GO

Data is only planned to be inserted at the end of the month, 4 million new rows each month in one insert.

I have different Tables/Views that have always the same way to access this table:

Unique identifier through left join using the first 2 columns of the table
- Material_Werksdaten_key
- month(TimeDayId) and year(TimeDayId)
- these 2 columsn are not needed in the query output
The other 5 columns left in the table are all required in the output of the query and in case there is no match they are replaced by 1

select a.Columns.... 
  ,ABC_Kennzeichen_Key=isnull(b.ABC_Kennzeichen_Key,1)
  ,Sonderbeschaffungsart=isnull(b.Sonderbeschaffungsart,1)
  ,Disponent_Key=isnull(b.Disponent_Key,1)
  ,Vertriebstatus_key=isnull(b.Vertriebstatus_key,1)
  ,Herstellerwerk_key=isnull(b.Herstellerwerk_key,1)
from dst.Example_Table a
left join DST.MD_Materialwerksdaten_Historisierte_Eigenschaften b
                                            on a.Material_Werk_Key = b.Material_Werksdaten_key
                            and year(a.TimeDayId) = year(b.TimeDayId)
                            and month(a.TimeDayId) = month(b.TimeDayId

My question is how should i index this table?

From my basic understand of indexes i would have gone for something like this.

CREATE NONCLUSTERED INDEX [TestIndex] ON [DST].[MD_Materialwerksdaten_Historisierte_Eigenschaften]
(
[Material_Werksdaten_key] ASC,
[TimeDayId] ASC
)
INCLUDE (
        [ABC_Kennzeichen_Key],
[Herstellerwerk_key],
[Vertriebstatus_key],
[Disponent_Key],
[Sonderbeschaffungsart]
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

I want to be the very best
Like no one ever was

https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

frederico_fonseca SSCoach Points: 16167 More actions · Answer 1

Based on what you said I would change the table itself and have 2 new columns made up of the year and and month part of TimeDayId so the index could be used correctly. otherwise the functions will slow it down

/*
create 2 new computed clumns based on TimeDayId


*/
create table DST.MD_Materialwerksdaten_Historisierte_Eigenschaften
    ( YearId                  as year(TimeDayId) persisted
    , MonthId                 as month(TimeDayId) persisted
    , TimeDayId               date        null
    , Material_Werksdaten_key int         not null
    , ABC_Kennzeichen_Key     int         not null
    , Herstellerwerk_key      int         not null
    , Vertriebstatus_key      int         not null
    , Disponent_Key           int         not null
    , Sonderbeschaffungsart   nvarchar(2) null
    )
go
/*
 this index assumes that TimeDayID is always for the current month being inserted

if this is incorrect but if Material_Werksdaten_key is a ever ascending key then we could move Material_Werksdaten_key to be the first column of the index

*/

create clustered index CI_MD_Materialwerksdaten_Historisierte_Eigenschaften on DST.MD_Materialwerksdaten_Historisierte_Eigenschaften
(YearId
, MonthId
, Material_Werksdaten_key
)
-- with (data_compression = page) -- if you use compression - probably advisable

ktflash Default port Points: 1400 More actions · Answer 2

The idea with splitting sounds logical.

Why go for a clustered index in this situation?

Because the uniques granted through Material and Time?

I guess include is not relevant in this since the clustered orders the data itself and there is not the problem with "covering index" in like with the nonclustered?

I want to be the very best
Like no one ever was

https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 3

How will the data be read most of the time? Since the clustered index defines data storage, I usually lean towards making the clustered index the most common access path to the data. Then, as needed, we can add other indexes, with or without include (every index does not have to be covering).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ktflash Default port Points: 1400 More actions · Answer 4

Grant Fritchey wrote:

How will the data be read most of the time? Since the clustered index defines data storage, I usually lean towards making the clustered index the most common access path to the data. Then, as needed, we can add other indexes, with or without include (every index does not have to be covering).

It is always a big fact table/view with usually more rows than this table.

What i need to do is add the 5 columns mentioned to each row of the fact table.

There is not a single other access or query outside of this single access structure and there will not be.

I want to be the very best
Like no one ever was

https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 5

Then that first suggestion is probably the better one. Still, testing, testing, testing. Don't guess. Know.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 101122 More actions · Answer 6

You should cluster the table on:

( Material_Werksdaten_key, TimeDayId )

Your join is very confusing. You need to specify the TimeDayId(s) directly, not use a function on it.

Is TimeDayId one value per day or only value for the entire month (such as the first day or the last day of the month only)?

Why are you checking for any date within the month? Is it so that if the specific day is missing, you can join on some other day in the month?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

frederico_fonseca SSCoach Points: 16167 More actions · Answer 7

ScottPletcher wrote:

You should cluster the table on:
( Material_Werksdaten_key, TimeDayId )
Your join is very confusing. You need to specify the TimeDayId(s) directly, not use a function on it.
Is TimeDayId one value per day or only value for the entire month (such as the first day or the last day of the month only)?
Why are you checking for any date within the month? Is it so that if the specific day is missing, you can join on some other day in the month?

should it? if "Material_Werksdaten_key" is what I think it is (fk to Material plan data) having it as first column on the clustered index, followed by a date key would kind of create an index that needs page splits for its inserts

ScottPletcher SSC Guru Points: 101122 More actions · Answer 8

Yes, there will be some limited page splits. But the reads against the table will be vastly more efficient, even with the splits.

It's certainly possible that [TimeDayId] then [Material_Werksdaten_key] would be better clustering, but that would take considerably more details and analysis. And that wouldn't help with performance at all until the TimeDayId issue is resolved, so that a function is not needed against the column to do a lookup.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Steve Collins SSCrazy Points: 2174 More actions · Answer 9

So "TimeDayId" is not a particularly accurate name for a DATE type column imo. My suggestion would be to rename the column to "DayDate" or something similar

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

frederico_fonseca SSCoach Points: 16167 More actions · Answer 10

ScottPletcher wrote:

Yes, there will be some limited page splits. But the reads against the table will be vastly more efficient, even with the splits.
It's certainly possible that [TimeDayId] then [Material_Werksdaten_key] would be better clustering, but that would take considerably more details and analysis. And that wouldn't help with performance at all until the TimeDayId issue is resolved, so that a function is not needed against the column to do a lookup.

No it wouldn't solve it. but pages split probably not that limited as it is 4 million records added in one go

And the use of the function is what lead me to my suggestion to add the 2 new columns to avoid part of the function issue.