Understanding partitioning, using joins and indexes?

  • Hi All

    I'm new to partitioning and need a bit of assistance. Here are the basics:

    1. I've partitioned six tables by date : MONTH_END

    2. They receive around 10 million rows with each load into one MONTH_END, e.g. 2011/01/31 gets 10mil, 2011/02/28 gets 10 mil and so on.

    3. These tables join to each other by various keys, e.g TABLE A to TABLE B on ID1 = ID1, TABLE A to TABLE B on ID2 = ID2, etc.

    4. When these tables join, it is only for one MONTH_END at a time

    My question regarding joins

    a. When joining these tables, should I be specifying the TABLE A.MONTH_END= TABLE_B.MONTH_END in the join clause?

    b. Or should I filter these in the WHERE clause, e.g WHERE '2011/01/31' IN (TABLE A.MONTH_END, TABLE B.MONTH_END)

    c. Do a and b

    My questions on indexing.

    1. When I create non-clustered indexes on the various tables, should they include MONTH_END?

    2. Currently, I don't include month end, I bulk load these tables from SSIS, and when I have indexes on them, besides the partition index, it takes hours to load. I have used Database Analyser and it always suggestions indexes without the MONTH_END - this is probably because of how I join the tables.

    Assistance is much appreciated.

  • For you joining question i will go with B approach as it will use the indexes better.

    Second YES you non clus index must have MONTH_END column as data will be search/filter on its basis.

    Above of all i would like to see you query and index definition.specially whats you clus index.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Thanks - b is how I currently have it.

    The clustered index on all the tables is currently only on MONTH_END.

    The tables that join don't all necessarily join on the same keys. e.g

    SELECT *

    FROM TABLE_A a

    LEFT

    JOIN TABLE_B b

    ON a.COL_1 = b.COL_1

    INNER

    JOIN TABLE_C c

    ON a.COL_2 = c.COL_2

    WHERE '2011/01/31' IN (a.MONTH_END, b.MONTH_END, c.MONTH_END)

    Regarding the additional indexes, I am still deciding what and where to create them - currently any attempt has made the load process to slow, not sure if this is because of the exclusion of MONTH_END

  • xrapidx (5/20/2011)


    Hi All

    1. When I create non-clustered indexes on the various tables, should they include MONTH_END?

    That is not required because clus index will do the job here.

    xrapidx (5/20/2011)


    Hi All

    2. Currently, I don't include month end, I bulk load these tables from SSIS, and when I have indexes on them, besides the partition index, it takes hours to load. I have used Database Analyser and it always suggestions indexes without the MONTH_END - this is probably because of how I join the tables.

    if you don't use MONTH_END for filter the data then your SSIS package will work on complete dataset.and How you decide that it is taking longer time ? because heavy vol data always takes time and another thing if data is traveling through network then clock will run slower.

    Again you query of join doesn't help here to check how you are using the indexes or columns.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • xrapidx (5/20/2011)


    currently any attempt has made the load process to slow, not sure if this is because of the exclusion of MONTH_END

    YOu cant remove MONTH_END column from indexes.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • I think we're misunderstanding each other.

    I currently have two issues.

    Before loading an additional month:

    1.

    Currently, the non-clustered indexes being recommended by Database Engine Tuning Advisor do not include MONTH_END - If I create these indexes and attempt to load data into the table using SSIS it takes around 3 hours to load. If I delete these indexes the table loads in about 30 minutes.

    After loading:

    2.

    When I try and join the tables, if the non-clustered index recommended by Database Engine Tuning Adviso are present, the query returns in about 30 minutes, if the non-clustered indexes are not there - the query takes hours to return the dataset.

  • xrapidx (5/20/2011)


    I think we're misunderstanding each other.

    i cant comment/suggest until i will see the query and index definition

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • The query itself is 300 lines long, consisting of 13 different tables (five of which are partitioned on month_end) - my main reason for not posting it is the size of the query....

    Basically, I'm trying to understand how indexes work when it comes to partitioned tables, and how the partitioned column affects them.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply