Index suggestions for table

  • Ahoi,

    i have the following table:

    CREATE TABLE [DST].[MD_Materialwerksdaten_Historisierte_Eigenschaften](
    [TimeDayId] [date] NULL,
    [Material_Werksdaten_key] [int] NOT NULL,
    [ABC_Kennzeichen_Key] [int] NOT NULL,
    [Herstellerwerk_key] [int] NOT NULL,
    [Vertriebstatus_key] [int] NOT NULL,
    [Disponent_Key] [int] NOT NULL,
    [Sonderbeschaffungsart] [nvarchar](2) NULL
    ) ON [PRIMARY]
    GO

    Data is only planned to be inserted at the end of the month, 4 million new rows each month in one insert.

    I have different Tables/Views that have always the same way to access this table:

    • Unique identifier through left join using the first 2 columns of the table

      • Material_Werksdaten_key
      • month(TimeDayId) and year(TimeDayId)
      • these 2 columsn are not needed in the query output

    • The other 5 columns left in the table are all required in the output of the query and in case there is no match they are replaced by 1
    select a.Columns.... 
    ,ABC_Kennzeichen_Key=isnull(b.ABC_Kennzeichen_Key,1)
    ,Sonderbeschaffungsart=isnull(b.Sonderbeschaffungsart,1)
    ,Disponent_Key=isnull(b.Disponent_Key,1)
    ,Vertriebstatus_key=isnull(b.Vertriebstatus_key,1)
    ,Herstellerwerk_key=isnull(b.Herstellerwerk_key,1)
    from dst.Example_Table a
    left join DST.MD_Materialwerksdaten_Historisierte_Eigenschaften b
    on a.Material_Werk_Key = b.Material_Werksdaten_key
    and year(a.TimeDayId) = year(b.TimeDayId)
    and month(a.TimeDayId) = month(b.TimeDayId

    My question is how should i index this table?

    From my basic understand of indexes i would have gone for something like this.

    CREATE NONCLUSTERED INDEX [TestIndex] ON [DST].[MD_Materialwerksdaten_Historisierte_Eigenschaften]
    (
    [Material_Werksdaten_key] ASC,
    [TimeDayId] ASC
    )
    INCLUDE (
    [ABC_Kennzeichen_Key],
    [Herstellerwerk_key],
    [Vertriebstatus_key],
    [Disponent_Key],
    [Sonderbeschaffungsart]
    ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
  • Based on what you said I would change the table itself and have 2 new columns made up of the year and and month part of TimeDayId so the index could be used correctly. otherwise the functions will slow it down

     

    /*
    create 2 new computed clumns based on TimeDayId


    */
    create table DST.MD_Materialwerksdaten_Historisierte_Eigenschaften
    ( YearId as year(TimeDayId) persisted
    , MonthId as month(TimeDayId) persisted
    , TimeDayId date null
    , Material_Werksdaten_key int not null
    , ABC_Kennzeichen_Key int not null
    , Herstellerwerk_key int not null
    , Vertriebstatus_key int not null
    , Disponent_Key int not null
    , Sonderbeschaffungsart nvarchar(2) null
    )
    go
    /*
    this index assumes that TimeDayID is always for the current month being inserted

    if this is incorrect but if Material_Werksdaten_key is a ever ascending key then we could move Material_Werksdaten_key to be the first column of the index

    */

    create clustered index CI_MD_Materialwerksdaten_Historisierte_Eigenschaften on DST.MD_Materialwerksdaten_Historisierte_Eigenschaften
    (YearId
    , MonthId
    , Material_Werksdaten_key
    )
    -- with (data_compression = page) -- if you use compression - probably advisable

     

  • The idea with splitting sounds logical.

    Why go for a clustered index in this situation?

    Because the uniques granted through Material and Time?

    I guess include is not relevant in this since the clustered orders the data itself and there is not the problem with "covering index" in like with the nonclustered?

  • How will the data be read most of the time? Since the clustered index defines data storage, I usually lean towards making the clustered index the most common access path to the data. Then, as needed, we can add other indexes, with or without include (every index does not have to be covering).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey wrote:

    How will the data be read most of the time? Since the clustered index defines data storage, I usually lean towards making the clustered index the most common access path to the data. Then, as needed, we can add other indexes, with or without include (every index does not have to be covering).

    It is always a big fact table/view with usually more rows than this table.

    What i need to do is add the 5 columns mentioned to each row of the fact table.

    There is not a single other access or query outside of this single access structure and there will not be.

  • Then that first suggestion is probably the better one. Still, testing, testing, testing. Don't guess. Know.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • You should cluster the table on:

    ( Material_Werksdaten_key, TimeDayId )

    Your join is very confusing.  You need to specify the TimeDayId(s) directly, not use a function on it.

    Is TimeDayId one value per day or only value for the entire month (such as the first day or the last day of the month only)?

    Why are you checking for any date within the month?  Is it so that if the specific day is missing, you can join on some other day in the month?

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    You should cluster the table on:

    ( Material_Werksdaten_key, TimeDayId )

    Your join is very confusing.  You need to specify the TimeDayId(s) directly, not use a function on it.

    Is TimeDayId one value per day or only value for the entire month (such as the first day or the last day of the month only)?

    Why are you checking for any date within the month?  Is it so that if the specific day is missing, you can join on some other day in the month?

    should it? if "Material_Werksdaten_key" is what I think it is (fk to Material plan data) having it as first column on the clustered index, followed by a date key would kind of create an index that needs page splits for its inserts

  • Yes, there will be some limited page splits.  But the reads against the table will be vastly more efficient, even with the splits.

    It's certainly possible that [TimeDayId] then [Material_Werksdaten_key] would be better clustering, but that would take considerably more details and analysis.  And that wouldn't help with performance at all until the TimeDayId issue is resolved, so that a function is not needed against the column to do a lookup.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • So "TimeDayId" is not a particularly accurate name for a DATE type column imo.  My suggestion would be to rename the column to "DayDate" or something similar

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • ScottPletcher wrote:

    Yes, there will be some limited page splits.  But the reads against the table will be vastly more efficient, even with the splits.

    It's certainly possible that [TimeDayId] then [Material_Werksdaten_key] would be better clustering, but that would take considerably more details and analysis.  And that wouldn't help with performance at all until the TimeDayId issue is resolved, so that a function is not needed against the column to do a lookup.

    No it wouldn't solve it. but pages split probably not that limited as it is 4 million records added in one go

    And the use of the function is what lead me to  my suggestion to add the 2 new columns to avoid part of the function issue.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply