February 27, 2019 at 10:07 pm
Comments posted to this topic are about the item Index Column Order – Be Happy!
Mike Byrd
February 28, 2019 at 5:45 am
I miss one important version:
If you regularly filter for Status = 5 (and not 3 or 4 or 6 or anything else) and OrderDate BETWEEN x and y, the best solution would be a filtered index:
CREATE NONCLUSTERED INDEX IX_SalesOrderHeaderBig_OrderDate_Status5
ON Sales.SalesOrderHeaderBig (OrderDate)
INCLUDE (SalesPersonId, SubTotal, Status) -- Status has to been included, even if it is the filter column (otherwise you will have key lookups)
WHERE Status = 5
WITH (DATA_COMPRESSION = PAGE) -- Since there are usually many Orders at the same day, PAGE compression would be very effective
This way you would decrease the index size (depending on the distribution of the stati) and have much better statistics (since they are created always on the first column in the index)
God is real, unless declared integer.
March 1, 2019 at 12:45 pm
What is the table definition? I was at a Brent Ozar session at PASS 2018 and they said the column order in index recommendations is based on the order they exist in the table and nothing else. So in your case, I'd have to assume that the index recommendation was just luckily in the right order.
I'd be interested to see other examples where the index recommendation is actually correct on purpose as a learning exercise.
March 1, 2019 at 6:38 pm
Table definition is
CREATE TABLE Sales.SalesOrderHeaderBig(
SalesOrderID int IDENTITY(1,1) NOT NULL,
RevisionNumber tinyint NOT NULL,
OrderDate datetime NOT NULL,
DueDate datetime NOT NULL,
ShipDate datetime NULL,
[Status] tinyint NOT NULL,
OnlineOrderFlag dbo.Flag NOT NULL,
SalesOrderNumber AS (isnull(N'SO'+CONVERT(nvarchar(23),SalesOrderID,(0)),N'*** ERROR ***')),
PurchaseOrderNumber dbo.OrderNumber NULL,
AccountNumber dbo.AccountNumber NULL,
CustomerID int NOT NULL,
SalesPersonID int NULL,
TerritoryID int NULL,
BillToAddressID int NOT NULL,
ShipToAddressID int NOT NULL,
ShipMethodID int NOT NULL,
CreditCardID int NULL,
CreditCardApprovalCode varchar(15) NULL,
CurrencyRateID int NULL,
SubTotal money NOT NULL,
TaxAmt money NOT NULL,
Freight money NOT NULL,
TotalDue AS (isnull((SubTotal+TaxAmt)+Freight,(0))),
Comment nvarchar(128) NULL,
rowguid uniqueidentifier ROWGUIDCOL NOT NULL,
ModifiedDate datetime NOT NULL,
CONSTRAINT PK_SalesOrderHeaderBig_SalesOrderID PRIMARY KEY NONCLUSTERED
( SalesOrderID ASC))
In this particular instance it appears that OrderDate is defined before Status. IMHO and as much as I respect Ben Ozark, I still think the optimizer picked the best column order for this particular query. This is not to say the optimizer always picks the best order, but it did in this case. However, my point is (as you will see in Part 2) that each index is just part of the bigger picture and that query combinations and index definitions may require some testing depending on the data and usage. Cheers!
Mike Byrd
March 2, 2019 at 10:30 am
thomas_franz - Thursday, February 28, 2019 5:45 AMI miss one important version:
If you regularly filter for Status = 5 (and not 3 or 4 or 6 or anything else) and OrderDate BETWEEN x and y, the best solution would be a filtered index:
CREATE NONCLUSTERED INDEX IX_SalesOrderHeaderBig_OrderDate_Status5
ON Sales.SalesOrderHeaderBig (OrderDate)
INCLUDE (SalesPersonId, SubTotal, Status) -- Status has to been included, even if it is the filter column (otherwise you will have key lookups)
WHERE Status = 5
WITH (DATA_COMPRESSION = PAGE) -- Since there are usually many Orders at the same day, PAGE compression would be very effectiveThis way you would decrease the index size (depending on the distribution of the stati) and have much better statistics (since they are created always on the first column in the index)
I am always careful with filtered indexes; sometimes (especially in the newer editions) the optimizer parameterizes the WHERE clause and the filtered index is not used. You are correct if Status always is 5. I'm also very careful with DATA_COMPRESSION = PAGE. If the newer data is undergoing many updates then there may be a performance hit. I always use DATA_COMPRESSION = ROW since there is almost no performance hit and logical reads are usually greatly reduced. PAGE does offer possible better performance but not if there are many updates to the data. Usually I look at PAGE Compression when I have a table partitioned and make the older data partitions PAGE and the newer ones ROW.
Mike Byrd
March 3, 2019 at 1:07 pm
Mike, when producing its missing index recommendations, SQL Server groups the columns into 2 buckets: all columns involved in equality predicates come first and then all columns involved in inequality predicates (a date range in this case) come second. Within each of these 2 buckets, the column order is based on the ordinal position of the columns within the actual table definition. In other words, the order of columns in the missing index recommendations has nothing to do with perceived selectivity or cardinality.
In your scenario, the Status column came first, because it was involved in an equality predicate. The OrderDate column came in second, because it was involved in an inequality predicate.
More details of the algorithm used and repro scripts can be found here: https://dba.stackexchange.com/questions/208947/how-does-sql-server-determine-key-column-order-in-missing-index-requests
March 3, 2019 at 1:18 pm
Also, be aware that the query uses local variables for the date range. This behavior is completely different than if you used a stored procedure with 2 date parameters or sp_executesql with 2 date parameters. Using local variables causes SQL Server to use the density vector in the statistics to produce the cardinality estimates during query optimization, because the values of local variables are not known until runtime. Using a stored procedure or sp_executesql causes SQL Server to sniff the parameter values during initial query optimization and use those parameter values to produce cardinality estimates from the statistics.
See https://www.brentozar.com/archive/2018/03/troubleshooting-parameter-sniffing-issues-the-right-way-part-3/ for more information.
March 4, 2019 at 7:50 am
Is that Bryan Nike? My old cube neighbor?
March 4, 2019 at 8:20 am
Yep. It's me, Matt. I chuckled to myself when I saw that you had responded in this forum a few days ago. It's a small world after all...
March 4, 2019 at 8:30 am
Small world indeed...
December 4, 2020 at 4:44 am
The link for Jonathan's script, once you find it on SQLSkills, is broken. Do you have another source for the script that makes the two sales tables larger?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 4, 2020 at 12:18 pm
A covering index with Status column first followed by OrderDate is inevitably a subject of severe page splits on every ongoing INSERT's. Very soon during normal operation the index will become badly fragmented, causing performance degradation and requiring "beloved" index maintenance.
When an index with OrderStatus at first place will be populated gracefully, keeping the right order of the rows in the index naturally, with no extra effort required.
minor gains in logical reads on a static data would be overcompensated by extra reads from fragmented index, ineffective use of memory and huge overhead of re-writing the index during maintenance excersises.
So, I would not say the suggestion from Microsoft is the best one in this case. As well, as pretty much any other case.
_____________
Code for TallyGenerator
December 6, 2020 at 4:52 am
The link for Jonathan's script, once you find it on SQLSkills, is broken. Do you have another source for the script that makes the two sales tables larger?
And, I just got word from Jonathan that the link has been repaired and I tested that as correct. It seems they're moving from Windows to Linux and things like spaces in links and case sensitivity are breaking this kind of thing.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2020 at 4:55 am
A covering index with Status column first followed by OrderDate is inevitably a subject of severe page splits on every ongoing INSERT's.
I can definitely vouch for that... in my earlier days, I made such an index on a table that brought Expedia.com to it's knees for about 5 minutes by the very reason you cite above.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2020 at 10:49 pm
where the Primary Key/Clustered Index is based on SalesOrderID (identity column)
And that's the issue.
If you will often query by OrderDate BETWEEN ..., then cluster the table on:
( OrderDate, SalesOrderID )
in that order. The PK can stay the same, it will just be NONCLUSTERED.
Page compression is often extremely valuable on large tables. If page compression shows as useful -- based on EXEC sys.sp_estimate_data_compression_savings and your own results afterward -- I'd use it, esp. with so few varchar and so many fixed columns. You'd have to do an extraordinarily large number of UPDATEs to offset it.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply