Best primarykey and index

  • frederico_fonseca wrote:

    ScottPletcher wrote:

    Jeff Moden wrote:

    ScottPletcher wrote:

    To be absolutely clear, you want a:

    2) unique clustered [index] on (tenant,year,id)

    And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.

    Supposedly, "Tenant" has more than 10 thousand members to it.  Why would you partition on it?

    I would so that it matches the clustered index (leading column) and how you query the table.  Again, you typically wouldn't cluster this table, but if you did ...

    taking in consideration the known fact that there are 10k tenants on the table, partitioning by tenant and year is a very bad advise as they would only be able to have a single year worth of data on that table due to the again know limit of 15k partitions per table.

    as the OP clearly stated that the filtering is ALWAYS by Tenant and Year having a partition by year would work well without reaching the 15k partition limit, would do partition elimination as desired, and then further filtering based on tenant.

    But by partitioning on year, you will have to go thru ALL rows for that year to find those for only the 1 tenant you're trying to read, i.e., orders of magnitude more rows than you need to read.

    If you have too many tenants to partition by tenant, then don't partition at all is a much better choice than partitioning by year, esp. if some tenants have much fewer rows than others.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    frederico_fonseca wrote:

    ScottPletcher wrote:

    Jeff Moden wrote:

    ScottPletcher wrote:

    To be absolutely clear, you want a:

    2) unique clustered [index] on (tenant,year,id)

    And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.

    Supposedly, "Tenant" has more than 10 thousand members to it.  Why would you partition on it?

    I would so that it matches the clustered index (leading column) and how you query the table.  Again, you typically wouldn't cluster this table, but if you did ...

    taking in consideration the known fact that there are 10k tenants on the table, partitioning by tenant and year is a very bad advise as they would only be able to have a single year worth of data on that table due to the again know limit of 15k partitions per table.

    as the OP clearly stated that the filtering is ALWAYS by Tenant and Year having a partition by year would work well without reaching the 15k partition limit, would do partition elimination as desired, and then further filtering based on tenant.

    But by partitioning on year, you will have to go thru ALL rows for that year to find those for only the 1 tenant you're trying to read, i.e., orders of magnitude more rows than you need to read.

    If you have too many tenants to partition by tenant, then don't partition at all is a much better choice than partitioning by year, esp. if some tenants have much fewer rows than others.

    if what you are stating is that with partitioning by year alone any query (where tenant = y and year = z) means that SQL will need to query every single record belonging to that year then I truly believe you are incorrect and have not thought it through.

    Please have a look at the following table (and code to populate it) as well as the explain plan here https://www.brentozar.com/pastetheplan/?id=ryPzXe6tj

    -- create partition function and scheme to use on the table
    CREATE PARTITION FUNCTION order_by_year_function (date)
    AS RANGE LEFT
    FOR VALUES ('1997-12-31' , '1998-12-31' , '1999-12-31' , '2000-12-31' , '2001-12-31' , '2002-12-31' , '2003-12-31' , '2004-12-31' , '2005-12-31' , '2006-12-31' , '2007-12-31' , '2008-12-31' , '2009-12-31' , '2010-12-31' , '2011-12-31' , '2012-12-31' , '2013-12-31' , '2014-12-31' , '2015-12-31' , '2016-12-31' , '2017-12-31' , '2018-12-31' , '2019-12-31' , '2020-12-31' , '2021-12-31' , '2022-12-31' , '2023-12-31' , '2024-12-31' , '2025-12-31' , '2026-12-31' , '2027-12-31' , '2028-12-31' , '2029-12-31' )
    ;

    CREATE PARTITION SCHEME order_by_year_scheme
    AS PARTITION order_by_year_function
    all TO ([primary]);

    -- create table and clustered index on partition defined above

    CREATE TABLE dbo.accountmovement
    (
    tenant bigint NULL
    ,movementdate date NOT NULL
    ,movement bigint NULL
    ,value1 bigint NULL
    ,value2 bigint NULL
    ) ON order_by_year_scheme(movementdate)
    WITH (DATA_COMPRESSION = PAGE)
    ;


    CREATE CLUSTERED INDEX ci_tenant_movementdate ON dbo.accountmovement
    (
    tenant ASC
    , movementdate ASC
    )
    WITH (DATA_COMPRESSION = PAGE) ON order_by_year_scheme(movementdate)
    ;

    -- run code to populate table - 66 million rows in total
    WITH E1(N) AS (
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
    ), --10E+1 or 10 rows
    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
    cteTally(N) AS ( --=== This provides the "zero base" and limits the number of rows right up front
    -- for both a performance gain and prevention of accidental "overruns"
    SELECT 0 UNION ALL
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
    )
    insert into accountmovement with (tablock)
    select n + 1 as tenant
    , t.movementdate
    , t2.movement
    , t2.movement * 235 as value1
    , t2.movement * 435 as value2
    from cteTally
    cross apply (values ( '1997-12-31')
    , ( '1998-12-31') , ( '1999-12-31') , ( '2000-12-31') , ( '2001-12-31') , ( '2002-12-31') , ( '2003-12-31')
    , ( '2004-12-31') , ( '2005-12-31') , ( '2006-12-31') , ( '2007-12-31') , ( '2008-12-31') , ( '2009-12-31')
    , ( '2010-12-31') , ( '2011-12-31') , ( '2012-12-31') , ( '2013-12-31') , ( '2014-12-31') , ( '2015-12-31')
    , ( '2016-12-31') , ( '2017-12-31') , ( '2018-12-31') , ( '2019-12-31') , ( '2020-12-31') , ( '2021-12-31')
    , ( '2022-12-31') , ( '2023-12-31') , ( '2024-12-31') , ( '2025-12-31') , ( '2026-12-31') , ( '2027-12-31')
    , ( '2028-12-31') , ( '2029-12-31')
    ) t(movementdate)
    cross apply (select top 200 n + 1 as movement
    from cteTally
    ) t2

    /*
    resulting table contains 33 partitions with 2000200 rows per partition for a total of 66006600 rows on the table
    Space taken by the table is 6.4 GB
    */

    /* finally run the following 2 queries to match the explain plan attached.

    outpout from statistics io

    (600 rows affected)
    Table 'accountmovement'. Scan count 3, logical reads 13, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row affected)

    (200 rows affected)
    Table 'accountmovement'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row affected)


    */

    declare @tenant bigint = 200
    declare @movementdate date = '2014-12-31'
    declare @tenant2 bigint = 400
    declare @movementdate2 date = '2018-12-31'
    declare @tenant3 bigint = 780
    declare @movementdate3 date = '2021-12-31'

    set statistics io on

    select *
    from [dbo].[accountmovement]
    where (movementdate = @movementdate and tenant = @tenant)
    or (movementdate = @movementdate2 and tenant = @tenant2)
    or (movementdate = @movementdate3 and tenant = @tenant3)


    set @tenant = 9000
    set @movementdate = '1998-12-31'

    select *
    from [dbo].[accountmovement]
    where (movementdate = @movementdate and tenant = @tenant)

    by looking at the plans it seems very clear that

    1 - partition elimination is taking place as expected and desirable

    2 - tenant is also being used as a SEEK predicate greatly eliminating the need to scan all rows on the partition. Again expected as Tenant was specified as being the first column on the index.

    choice of index is based on the fact that the OP stated, very clearly, that all queries always use Tenant and Year.

    I did not bother creating a column with just the year and used a date instead, but final effect would be the same if it has just been a year (smallint).

    for the OP - the desired clustered index would be (tenant, year, id) with partition function changed to be of same datatype you define for year and with the required values - ALWAYS leave an empty partition on both extremes of the ranges.

  • This was removed by the editor as SPAM

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply