Best primarykey and index

Question

Best primarykey and index

January 16, 2023 at 10:44 am

This was removed by the editor as SPAM

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101134 More actions · Answer 1

frederico_fonseca wrote:

ScottPletcher wrote:
Jeff Moden wrote:
ScottPletcher wrote:
To be absolutely clear, you want a:
2) unique clustered [index] on (tenant,year,id)
And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.
Supposedly, "Tenant" has more than 10 thousand members to it. Why would you partition on it?
I would so that it matches the clustered index (leading column) and how you query the table. Again, you typically wouldn't cluster this table, but if you did ...
taking in consideration the known fact that there are 10k tenants on the table, partitioning by tenant and year is a very bad advise as they would only be able to have a single year worth of data on that table due to the again know limit of 15k partitions per table.
as the OP clearly stated that the filtering is ALWAYS by Tenant and Year having a partition by year would work well without reaching the 15k partition limit, would do partition elimination as desired, and then further filtering based on tenant.

But by partitioning on year, you will have to go thru ALL rows for that year to find those for only the 1 tenant you're trying to read, i.e., orders of magnitude more rows than you need to read.

If you have too many tenants to partition by tenant, then don't partition at all is a much better choice than partitioning by year, esp. if some tenants have much fewer rows than others.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

frederico_fonseca SSCoach Points: 16174 More actions · Answer 2

ScottPletcher wrote:

frederico_fonseca wrote:
ScottPletcher wrote:
Jeff Moden wrote:
ScottPletcher wrote:
To be absolutely clear, you want a:
2) unique clustered [index] on (tenant,year,id)
And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.
Supposedly, "Tenant" has more than 10 thousand members to it. Why would you partition on it?
I would so that it matches the clustered index (leading column) and how you query the table. Again, you typically wouldn't cluster this table, but if you did ...
taking in consideration the known fact that there are 10k tenants on the table, partitioning by tenant and year is a very bad advise as they would only be able to have a single year worth of data on that table due to the again know limit of 15k partitions per table.
as the OP clearly stated that the filtering is ALWAYS by Tenant and Year having a partition by year would work well without reaching the 15k partition limit, would do partition elimination as desired, and then further filtering based on tenant.
But by partitioning on year, you will have to go thru ALL rows for that year to find those for only the 1 tenant you're trying to read, i.e., orders of magnitude more rows than you need to read.
If you have too many tenants to partition by tenant, then don't partition at all is a much better choice than partitioning by year, esp. if some tenants have much fewer rows than others.

if what you are stating is that with partitioning by year alone any query (where tenant = y and year = z) means that SQL will need to query every single record belonging to that year then I truly believe you are incorrect and have not thought it through.

Please have a look at the following table (and code to populate it) as well as the explain plan here https://www.brentozar.com/pastetheplan/?id=ryPzXe6tj

-- create partition function and scheme to use on the table
CREATE PARTITION FUNCTION order_by_year_function (date)
AS RANGE LEFT 
FOR VALUES  ('1997-12-31' , '1998-12-31' , '1999-12-31' , '2000-12-31' , '2001-12-31' , '2002-12-31' , '2003-12-31' , '2004-12-31' , '2005-12-31' , '2006-12-31' , '2007-12-31' , '2008-12-31' , '2009-12-31' , '2010-12-31' , '2011-12-31' , '2012-12-31' , '2013-12-31' , '2014-12-31' , '2015-12-31' , '2016-12-31' , '2017-12-31' , '2018-12-31' , '2019-12-31' , '2020-12-31' , '2021-12-31' , '2022-12-31' , '2023-12-31' , '2024-12-31' , '2025-12-31' , '2026-12-31' , '2027-12-31' , '2028-12-31' , '2029-12-31' )
;

CREATE PARTITION SCHEME order_by_year_scheme
AS PARTITION order_by_year_function
all TO ([primary]);

-- create table and clustered index on partition defined above

CREATE TABLE dbo.accountmovement
(
 tenant bigint NULL
,movementdate date NOT NULL
,movement bigint NULL
,value1 bigint NULL
,value2 bigint NULL
) ON order_by_year_scheme(movementdate)
WITH (DATA_COMPRESSION = PAGE)
;


CREATE CLUSTERED INDEX ci_tenant_movementdate ON dbo.accountmovement
(
tenant ASC
, movementdate ASC
)
WITH (DATA_COMPRESSION = PAGE) ON order_by_year_scheme(movementdate)
;

-- run code to populate table - 66 million rows in total
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS ( --=== This provides the "zero base" and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT 0 UNION ALL
                 SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                )
insert into accountmovement with (tablock)
select n + 1 as tenant
     , t.movementdate
     , t2.movement
     , t2.movement * 235 as value1
     , t2.movement * 435 as value2
from cteTally
cross apply (values ( '1997-12-31')
, ( '1998-12-31') , ( '1999-12-31') , ( '2000-12-31') , ( '2001-12-31') , ( '2002-12-31') , ( '2003-12-31')
, ( '2004-12-31') , ( '2005-12-31') , ( '2006-12-31') , ( '2007-12-31') , ( '2008-12-31') , ( '2009-12-31')
, ( '2010-12-31') , ( '2011-12-31') , ( '2012-12-31') , ( '2013-12-31') , ( '2014-12-31') , ( '2015-12-31')
, ( '2016-12-31') , ( '2017-12-31') , ( '2018-12-31') , ( '2019-12-31') , ( '2020-12-31') , ( '2021-12-31')
, ( '2022-12-31') , ( '2023-12-31') , ( '2024-12-31') , ( '2025-12-31') , ( '2026-12-31') , ( '2027-12-31')
, ( '2028-12-31') , ( '2029-12-31')
) t(movementdate)
cross apply (select top 200 n + 1 as movement
             from cteTally
            ) t2

/*
resulting table contains 33 partitions with 2000200 rows per partition for a total of 66006600 rows on the table
Space taken by the table is 6.4 GB
*/

/* finally run the following 2 queries to match the explain plan attached.

outpout from statistics io

(600 rows affected)
Table 'accountmovement'. Scan count 3, logical reads 13, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

(200 rows affected)
Table 'accountmovement'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)


*/

declare @tenant bigint = 200
declare @movementdate date = '2014-12-31'
declare @tenant2 bigint = 400
declare @movementdate2 date = '2018-12-31'
declare @tenant3 bigint = 780
declare @movementdate3 date = '2021-12-31'

set statistics io on

select *
from [dbo].[accountmovement]
where (movementdate = @movementdate and tenant = @tenant)
or (movementdate = @movementdate2 and tenant = @tenant2)
or (movementdate = @movementdate3 and tenant = @tenant3)


set @tenant = 9000
set @movementdate = '1998-12-31'

select *
from [dbo].[accountmovement]
where (movementdate = @movementdate and tenant = @tenant)

by looking at the plans it seems very clear that

1 - partition elimination is taking place as expected and desirable

2 - tenant is also being used as a SEEK predicate greatly eliminating the need to scan all rows on the partition. Again expected as Tenant was specified as being the first column on the index.

choice of index is based on the fact that the OP stated, very clearly, that all queries always use Tenant and Year.

I did not bother creating a column with just the year and used a date instead, but final effect would be the same if it has just been a year (smallint).

for the OP - the desired clustered index would be (tenant, year, id) with partition function changed to be of same datatype you define for year and with the required values - ALWAYS leave an empty partition on both extremes of the ranges.