Partitioned Indexes

  • hi guys,

    I have a DWH were I partitioned by customerno. (blocks like this 10000, 20000, 30000 etc.)

    Now I have a lot of queries that are using historizied table, so they check the valid_from, valid_to dates.

    So you usually built Indexes that sort the data in the right way

    My e.g. valid_from asc, valid_to asc, customerno asc (include a few attributes)

    As I want to align the indexes with my table I have to use the same partition_function and _scheme.

    Lets make an example, where we just look at one partition:

    customerno valid_from valid_to

    110101 2010-12-01 2010-12-05

    110101 2010-12-06 2010-12-07

    110101 2010-12-08 2010-12-09

    110102 2010-12-01 2010-12-04

    110102 2010-12-05 2010-12-09

    110102 2010-12-10 2010-12-12

    My Index should sort it like that:

    110102 2010-12-01 2010-12-04

    110101 2010-12-01 2010-12-05

    110102 2010-12-05 2010-12-09

    110101 2010-12-06 2010-12-07

    110101 2010-12-08 2010-12-09

    110102 2010-12-10 2010-12-12

    Now as my partition-key is the customerno it could implies a certain sort. So I was wondering, if the partitioned index is doing the sort over my selected index colums (valid_from, valid_to) and just uses the partition-scheme as "hint" in which partition to put the data, or if he first uses the partitionfunction "sort over customerno" and than my index-sort?

    This would have quite an impact as I should re-think to use a more "general" partion-column that only gives hints which partition to use but doesn't implies a sort in depth. So that I could built indexes that are really of use

    Thanx for your advice

  • So Many Hits no answer... Is that a Problem that no One has or no One knows an answer to? :w00t:

  • mitch.fh (12/23/2010)


    So Many Hits no answer... Is that a Problem that no One has or no One knows an answer to? :w00t:

    I think people may be waiting for you to look at the actual execution plan and determine the answer yourself.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for your advice.

    I would like to read sth out of the execution plan, but I do not see anything out of the index creation index plan 🙁 or which execution plan do you mean?

    <?xml version="1.0" encoding="utf-16"?>

    <ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.0.4000.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">

    <BatchSequence>

    <Batch>

    <Statements>

    <StmtSimple StatementCompId="1" StatementId="1" StatementText="

    CREATE NONCLUSTERED INDEX [IX_ST_address_valid_from_valid_to_user_id] ON [dbo].[ST_address]

    (

    [valid_from] ASC,

    [valid_to] ASC,

    [user_id] ASC

    )

    INCLUDE ( [country_id],

    [state_id],

    [zip_code],

    [addressline1],

    [city], [part_key_id]) WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90, DATA_COMPRESSION = PAGE)

    ON [ST_address_index_partition_scheme]([part_key_id])

    " StatementType="CREATE INDEX" />

    </Statements>

    </Batch>

    </BatchSequence>

    </ShowPlanXML>

  • mitch.fh (12/23/2010)


    So Many Hits no answer... Is that a Problem that no One has or no One knows an answer to? :w00t:

    You know the old saying... "if you have nothing good to say better say nothing" but, if you insist I would say that there is an underlying - more basic - design issue. What would be the reason to partition DWH tables by customer blocks? I bet that partition strategy doesn't help during archiving and purging and doesn't help during reporting either - those are the two primary reasons to focus on when choosing a partition key.

    Well... that's the reason why I'm saying nothing 😀

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Thanks Paul, I am open for constructive criticism though.

    The idea for partitioning is that I have about 100 million users from different portals united in one dwh.

    Load processes are seperated by portal and in the beginning most queries where filtered by a portal. So partitioning made sense in my head.

    But I more and more have queries that are portal overlapping. (60% now)

    In my example I simplyfied my partitioning rule. I have an partitioning-key that is combinding portal and customerno

    portal_id * 10,000,000,000 + customerno. And the function is using the portal to create the blocks. so e.g.

    right from

    portal1 10,000,000,000

    portal2 20,000,000,000

    portal3 30,000,000,000

  • mitch.fh (12/27/2010)


    Thanks for your advice.

    I would like to read sth out of the execution plan, but I do not see anything out of the index creation index plan 🙁 or which execution plan do you mean?

    <?xml version="1.0" encoding="utf-16"?>

    <ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.0.4000.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">

    <BatchSequence>

    <Batch>

    <Statements>

    <StmtSimple StatementCompId="1" StatementId="1" StatementText="

    CREATE NONCLUSTERED INDEX [IX_ST_address_valid_from_valid_to_user_id] ON [dbo].[ST_address]

    (

    [valid_from] ASC,

    [valid_to] ASC,

    [user_id] ASC

    )

    INCLUDE ( [country_id],

    [state_id],

    [zip_code],

    [addressline1],

    [city], [part_key_id]) WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90, DATA_COMPRESSION = PAGE)

    ON [ST_address_index_partition_scheme]([part_key_id])

    " StatementType="CREATE INDEX" />

    </Statements>

    </Batch>

    </BatchSequence>

    </ShowPlanXML>

    Right idea but I was looking for the execution plan of the query against the data that caused you to ask the following question...

    Now as my partition-key is the customerno it could implies a certain sort. So I was wondering, if the partitioned index is doing the sort over my selected index colums (valid_from, valid_to) and just uses the partition-scheme as "hint" in which partition to put the data, or if he first uses the partitionfunction "sort over customerno" and than my index-sort?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    I created an index to test it. I realised that I had to put the part_key_id in to the "sorted" columns, so it looks like that:

    CREATE NONCLUSTERED INDEX [IX_ST_activity_valid_from_valid_to_customerno] ON [dbo].[ST_activity]

    (

    [valid_from] ASC,

    [valid_to] ASC,

    [portal_id] ASC

    [customerno] ASC,

    [part_key_id] ASC <-- combination of portal and customer as discribed above

    )

    INCLUDE ( [act_status_id])

    WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90)

    ON [ST_activity_index_partition_scheme]([part_key_id])

    This is my example-query:

    DECLARE @data_date date = '2008-06-15';

    SELECT sa.customerno, sa.act_status_id

    FROM dbo.ST_activity AS sa

    WHERE sa.valid_from <= @data_date AND sa.valid_to >= @data_date

    The Execution Plan of the example-Querie looks like that

    <?xml version="1.0" encoding="utf-16"?>

    <ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.0.4000.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">

    <BatchSequence>

    <Batch>

    <Statements>

    <StmtSimple StatementCompId="1" StatementId="1" StatementText="DECLARE @data_date date = '2010-06-15'; " StatementType="ASSIGN" />

    </Statements>

    <Statements>

    <StmtSimple StatementCompId="2" StatementEstRows="17418000" StatementId="2" StatementOptmLevel="FULL" StatementSubTreeCost="235.169" StatementText=" SELECT sa.customerno, sa.act_status_id FROM dbo.ST_activity AS sa WHERE sa.valid_from <= @data_date AND sa.valid_to >= @data_date " StatementType="SELECT" QueryHash="0xE4B0A2E72089E207" QueryPlanHash="0xA03C41B788CD8D5B">

    <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />

    <QueryPlan CachedPlanSize="16" CompileTime="2" CompileCPU="2" CompileMemory="240">

    <RelOp AvgRowSize="17" EstimateCPU="6.06058" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="17418000" LogicalOp="Gather Streams" NodeId="0" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="235.169">

    <OutputList>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="customerno" />

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="act_status_id" />

    </OutputList>

    <Parallelism>

    <RelOp AvgRowSize="20" EstimateCPU="7.98406" EstimateIO="217.641" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="17418000" LogicalOp="Index Seek" NodeId="1" Parallel="true" Partitioned="true" PhysicalOp="Index Seek" EstimatedTotalSubtreeCost="225.625" TableCardinality="193534000">

    <OutputList>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="customerno" />

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="act_status_id" />

    </OutputList>

    <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">

    <DefinedValues>

    <DefinedValue>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="customerno" />

    </DefinedValue>

    <DefinedValue>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="act_status_id" />

    </DefinedValue>

    </DefinedValues>

    <Object Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Index="[IX_ST_activity_valid_from_valid_to_portal_id_customerno]" Alias="[sa]" IndexKind="NonClustered" />

    <SeekPredicates>

    <SeekPredicateNew>

    <SeekKeys>

    <StartRange ScanType="GE">

    <RangeColumns>

    <ColumnReference Column="PtnId1000" />

    </RangeColumns>

    <RangeExpressions>

    <ScalarOperator ScalarString="(1)">

    <Const ConstValue="(1)" />

    </ScalarOperator>

    </RangeExpressions>

    </StartRange>

    <EndRange ScanType="LE">

    <RangeColumns>

    <ColumnReference Column="PtnId1000" />

    </RangeColumns>

    <RangeExpressions>

    <ScalarOperator ScalarString="(6)">

    <Const ConstValue="(6)" />

    </ScalarOperator>

    </RangeExpressions>

    </EndRange>

    </SeekKeys>

    <SeekKeys>

    <EndRange ScanType="LE">

    <RangeColumns>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="valid_from" />

    </RangeColumns>

    <RangeExpressions>

    <ScalarOperator ScalarString="[@data_date]">

    <Identifier>

    <ColumnReference Column="@data_date" />

    </Identifier>

    </ScalarOperator>

    </RangeExpressions>

    </EndRange>

    </SeekKeys>

    </SeekPredicateNew>

    </SeekPredicates>

    <Predicate>

    <ScalarOperator ScalarString="[DWH_DB].[dbo].[ST_activity].[valid_to] as [sa].[valid_to]>=[@data_date]">

    <Compare CompareOp="GE">

    <ScalarOperator>

    <Identifier>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="valid_to" />

    </Identifier>

    </ScalarOperator>

    <ScalarOperator>

    <Identifier>

    <ColumnReference Column="@data_date" />

    </Identifier>

    </ScalarOperator>

    </Compare>

    </ScalarOperator>

    </Predicate>

    </IndexScan>

    </RelOp>

    </Parallelism>

    </RelOp>

    </QueryPlan>

    </StmtSimple>

    </Statements>

    </Batch>

    </BatchSequence>

    </ShowPlanXML>

    Then I created a index that is just using the portal_id for partitioning (with another function and scheme!)

    CREATE NONCLUSTERED INDEX [IX_ST_activity_valid_from_valid_to_customerno] ON [dbo].[ST_activity]

    (

    [valid_from] ASC,

    [valid_to] ASC,

    [portal_id] ASC

    [customerno] ASC

    )

    INCLUDE ( [act_status_id])

    WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90)

    ON [ST_activity_index_with_portal_id_partition_scheme]([portal_id])

    Using this index the execution plan looks like that:

    <?xml version="1.0" encoding="utf-16"?>

    <ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.0.4000.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">

    <BatchSequence>

    <Batch>

    <Statements>

    <StmtSimple StatementCompId="1" StatementId="1" StatementText="DECLARE @data_date date = '2010-06-15'; " StatementType="ASSIGN" />

    </Statements>

    <Statements>

    <StmtSimple StatementCompId="2" StatementEstRows="17418000" StatementId="2" StatementOptmLevel="FULL" StatementSubTreeCost="235.169" StatementText=" SELECT sa.customerno, sa.act_status_id FROM dbo.ST_activity AS sa WHERE sa.valid_from <= @data_date AND sa.valid_to >= @data_date " StatementType="SELECT" QueryHash="0xE4B0A2E72089E207" QueryPlanHash="0x958222DFD826C9A1">

    <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />

    <QueryPlan CachedPlanSize="16" CompileTime="5" CompileCPU="5" CompileMemory="240">

    <RelOp AvgRowSize="17" EstimateCPU="6.06058" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="17418000" LogicalOp="Gather Streams" NodeId="0" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="235.169">

    <OutputList>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="customerno" />

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="act_status_id" />

    </OutputList>

    <Parallelism>

    <RelOp AvgRowSize="20" EstimateCPU="7.98406" EstimateIO="217.641" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="17418000" LogicalOp="Index Seek" NodeId="1" Parallel="true" Partitioned="true" PhysicalOp="Index Seek" EstimatedTotalSubtreeCost="225.625" TableCardinality="193534000">

    <OutputList>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="customerno" />

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="act_status_id" />

    </OutputList>

    <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">

    <DefinedValues>

    <DefinedValue>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="customerno" />

    </DefinedValue>

    <DefinedValue>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="act_status_id" />

    </DefinedValue>

    </DefinedValues>

    <Object Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Index="[IX_ST_activity_valid_from_valid_to_portal_id_customerno2]" Alias="[sa]" IndexKind="NonClustered" />

    <SeekPredicates>

    <SeekPredicateNew>

    <SeekKeys>

    <StartRange ScanType="GE">

    <RangeColumns>

    <ColumnReference Column="PtnId1004" />

    </RangeColumns>

    <RangeExpressions>

    <ScalarOperator ScalarString="(1)">

    <Const ConstValue="(1)" />

    </ScalarOperator>

    </RangeExpressions>

    </StartRange>

    <EndRange ScanType="LE">

    <RangeColumns>

    <ColumnReference Column="PtnId1004" />

    </RangeColumns>

    <RangeExpressions>

    <ScalarOperator ScalarString="(6)">

    <Const ConstValue="(6)" />

    </ScalarOperator>

    </RangeExpressions>

    </EndRange>

    </SeekKeys>

    <SeekKeys>

    <EndRange ScanType="LE">

    <RangeColumns>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="valid_from" />

    </RangeColumns>

    <RangeExpressions>

    <ScalarOperator ScalarString="[@data_date]">

    <Identifier>

    <ColumnReference Column="@data_date" />

    </Identifier>

    </ScalarOperator>

    </RangeExpressions>

    </EndRange>

    </SeekKeys>

    </SeekPredicateNew>

    </SeekPredicates>

    <Predicate>

    <ScalarOperator ScalarString="[DWH_DB].[dbo].[ST_activity].[valid_to] as [sa].[valid_to]>=[@data_date]">

    <Compare CompareOp="GE">

    <ScalarOperator>

    <Identifier>

    <ColumnReference Database="[DWH_DB]" Schema="[dbo]" Table="[ST_activity]" Alias="[sa]" Column="valid_to" />

    </Identifier>

    </ScalarOperator>

    <ScalarOperator>

    <Identifier>

    <ColumnReference Column="@data_date" />

    </Identifier>

    </ScalarOperator>

    </Compare>

    </ScalarOperator>

    </Predicate>

    </IndexScan>

    </RelOp>

    </Parallelism>

    </RelOp>

    </QueryPlan>

    </StmtSimple>

    </Statements>

    </Batch>

    </BatchSequence>

    </ShowPlanXML>

    Could you tell me, where I can read sth out of it?

  • I was hoping that someone more skilled at reading XML execution plans than I would stop by to answer your question.

    I see that the partitions are coming into play after the index but I don't believe it's working to any advantage because the query is necessarily doing an IndexScan.

    I'm not sure that answers your original question, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mitch.fh (12/27/2010)


    Thanks Paul, I am open for constructive criticism though.

    The idea for partitioning is that I have about 100 million users from different portals united in one dwh.

    Load processes are seperated by portal and in the beginning most queries where filtered by a portal. So partitioning made sense in my head.

    But I more and more have queries that are portal overlapping. (60% now)

    In my example I simplyfied my partitioning rule. I have an partitioning-key that is combinding portal and customerno

    portal_id * 10,000,000,000 + customerno. And the function is using the portal to create the blocks. so e.g.

    right from

    portal1 10,000,000,000

    portal2 20,000,000,000

    portal3 30,000,000,000

    That's a very different story Mitch - Thank you for the clarification.

    I can see current partitioning strategy being helpful during ETL - particularly if ETL for different portals are running concurrently; provided different partitions are sitting in different spins.

    Question. Do you rely in partition switching or any feature that forces you to have indexes aligned with base table partitioning strategy?

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • There is one VERY important point you are missing about table partitioning: it is NOT primarily about performance, but rather for MAINTENANCE/MANAGEMENT. If it also happens to allow for partition elimination for some queries, that is great.

    If you want best query performance - partition by date. If you want best loading and managment, partition by customerno. You really cannot have it both ways unless you make a separate copy of the data after loading into another table partitioned by date.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (12/28/2010)


    If you want best query performance - partition by date. If you want best loading and managment, partition by customerno. You really cannot have it both ways unless you make a separate copy of the data after loading into another table partitioned by date.

    I agree you can't have it both ways - as I already pointed out - but, isn't proposed strategy a self-defeating one?

    Lets see... ETL against first table to take advantage of ETL-friendly partitioning strategy then, to maximize reporting performance ETL against second table - which has a reporting-friendly partitioning strategy.

    mmhhh... isn't first ETL process 100% overhead? 😀

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • PaulB-TheOneAndOnly (12/28/2010)


    Question. Do you rely in partition switching or any feature that forces you to have indexes aligned with base table partitioning strategy?

    There is just one table that needs to be aligned as I use a dynamic partition switching (big table 1 billion rows) and this one is partitioned by portal + date. Every day I through out one date and put a new one in, so I have a "classic" sliding window there.

    The other tables don't need alignment in the sliding-window-way as I don't have to delete whole partitions. All normal tables have separate filegroups for the indexes.

    There is another question poping up though: Isn't it better to use alignment for performance reasons?

    I agree you can't have it both ways - as I already pointed out - but, isn't proposed strategy a self-defeating one?

    But I want BOTH 😀

    Now seriously: I cannot effort to copies of the data as it would take to much space. So I need a solution that serves the Load-ETLs and the Report-ETLs.

    My original question was aiming on the partitioning rule.

    The partition-key I constructed had to goals: 1. Partitioning by Portal 2. Sorting the data in the clustered index by customerno because new customers get always a higher cus_no and it is my main join connection.

    If the delicate partitionkey is causing a sort problem for indexes that sort by valid_from/valid_to than I would use another partitionkey - maybe just the portal_id. The data won't be sorted by customerno anymore in the clustered index but the non clustered index will take care sorting.

    If the delicate partitionkey is not creating any sort problem then I will keep my partitioning key.

    A compromise between both solutions would be - as I have no necessity for alignment - to use a different partition function/scheme combination for indexes. In this function I would just use the portal for partitioning and the index will be sorted by the columns needed.

  • IMO aligned indexes provide both performance and recovery advantages:

    - performance: because the are physically split like their corresponding table, so the engine will only use the part needed by the query, split per partition if possible. Eliminating IO.

    parallelism will also take advantage of this.

    - recovery: if you perform a partial restore, all aligned indexes might be used with the restored partition if you can direct the query to use only that partition.

    a non-aligned index would need the full table to be restored to be usable.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • TheSQLGuru (12/28/2010)


    There is one VERY important point you are missing about table partitioning: it is NOT primarily about performance, but rather for MAINTENANCE/MANAGEMENT. If it also happens to allow for partition elimination for some queries, that is great.

    If you want best query performance - partition by date. If you want best loading and managment, partition by customerno. You really cannot have it both ways unless you make a separate copy of the data after loading into another table partitioned by date.

    Hmmm... I wonder if there IS a way to have it both ways without duplicating the data. I've gotta try a couple of things when I get some time because this actually comes up a lot in the world of heavy metal.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply