poor performing report query against table without a PK

Question

poor performing report query against table without a PK

polkadot

SSChampion

Points: 13664
More actions
November 25, 2018 at 11:02 pm

#378711

The table is practically a heap because its' clustered index is on a column that allows NULLs, and there are hundreds of thousands of NULLs.
Whenever I run my query the execution plan shows that the clustered index is always used (all 32 million rows scanned everytime), nevermind my query doesn't even search against the clustered index column, and despite the presence of non clustered indexes for each of the filtered columns.
During daytime my report query takes 3-4 minutes to return 9 rows and at night 5 seconds. Business needs it to return in 1 second at all times.
Clustered Index: OfferID
Non Clustered Index1: Account#
Non Clustered Index3: BulkId
Non Clustered Index4: IMEI
Non Clustered Index2: Date
My query:SELECT <25 columns> WHERE Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR Date >= @startdate AND Date <= @enddate.
Execution plan used shows three operations: Clustered Index Scan, Parallelism, and SELECT.
Please suggest how to optimize. Can the table be optimized for my query if it has no PK, and clustered index permits NULLs, and all columns being filtered also allow nulls?
--Quote me

Viewing 15 posts - 1 through 15 (of 54 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

polkadot - Sunday, November 25, 2018 11:02 PM
The table is practically a heap because its' clustered index is on a column that allows NULLs, and there are hundreds of thousands of NULLs.
Whenever I run my query the execution plan shows that the clustered index is always used (all 32 million rows scanned everytime), nevermind my query doesn't even search against the clustered index column, and despite the presence of non clustered indexes for each of the filtered columns.
During daytime my report query takes 3-4 minutes to return 9 rows and at night 5 seconds. Business needs it to return in 1 second at all times.
Clustered Index: OfferID
Non Clustered Index1: Account#
Non Clustered Index3: BulkId
Non Clustered Index4: IMEI
Non Clustered Index2: Date
My query:SELECT <25 columns> WHERE Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR Date >= @startdate AND Date <= @enddate.
Execution plan used shows three operations: Clustered Index Scan, Parallelism, and SELECT.
Please suggest how to optimize. Can the table be optimized for my query if it has no PK, and clustered index permits NULLs, and all columns being filtered also allow nulls?

How many columns are in the table?
Which of the filter columns is most selective?
How many rows are returned by the query? Obviously this will vary - a low and high figure will be useful.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

muthukkumaran Kaliyamoorthy SSCoach Points: 19026 More actions · Answer 2

Can you share the execution plan. PK and CI is different. You can also check the fragmentation, since it's a heap.

Muthukkumaran Kaliyamoorthy
https://www.sqlserverblogforum.com/

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 3

muthukkumaran Kaliyamoorthy - Monday, November 26, 2018 5:05 AM
Can you share the execution plan. PK and CI is different. You can also check the fragmentation, since it's a heap.

One, it's not a heap if it has a clustered index (however bad the clustered index may be). Two, fragmentation won't affect plan creation and index choices (there's an exception to this, but not worth getting into here). Be cautious on advice along these lines.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 4

polkadot - Sunday, November 25, 2018 11:02 PM
The table is practically a heap because its' clustered index is on a column that allows NULLs, and there are hundreds of thousands of NULLs.
Whenever I run my query the execution plan shows that the clustered index is always used (all 32 million rows scanned everytime), nevermind my query doesn't even search against the clustered index column, and despite the presence of non clustered indexes for each of the filtered columns.
During daytime my report query takes 3-4 minutes to return 9 rows and at night 5 seconds. Business needs it to return in 1 second at all times.
Clustered Index: OfferID
Non Clustered Index1: Account#
Non Clustered Index3: BulkId
Non Clustered Index4: IMEI
Non Clustered Index2: Date
My query:SELECT <25 columns> WHERE Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR Date >= @startdate AND Date <= @enddate.
Execution plan used shows three operations: Clustered Index Scan, Parallelism, and SELECT.
Please suggest how to optimize. Can the table be optimized for my query if it has no PK, and clustered index permits NULLs, and all columns being filtered also allow nulls?

Without the plan and the structure, any advice is just guesses. So, all those OR clauses make this look like a catch-all query. You should read this article by Gail Shaw on how to better deal with this.

One question on the logic there. Is it this: (Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR Date >= @startdate) AND Date <= @enddate

Or, is it this: Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR (Date >= @startdate AND Date <= @enddate)

Or what you already have?

I'd suggest using parenthesis to quantify that for the optimizer. It will make a difference in the indexes you build.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Eirikur Eiriksson SSC Guru Points: 182916 More actions · Answer 5

polkadot - Sunday, November 25, 2018 11:02 PM
The table is practically a heap because its' clustered index is on a column that allows NULLs, and there are hundreds of thousands of NULLs.
Whenever I run my query the execution plan shows that the clustered index is always used (all 32 million rows scanned everytime), nevermind my query doesn't even search against the clustered index column, and despite the presence of non clustered indexes for each of the filtered columns.
During daytime my report query takes 3-4 minutes to return 9 rows and at night 5 seconds. Business needs it to return in 1 second at all times.
Clustered Index: OfferID
Non Clustered Index1: Account#
Non Clustered Index3: BulkId
Non Clustered Index4: IMEI
Non Clustered Index2: Date
My query:SELECT <25 columns> WHERE Account# = @account# OR BulkID = @BulkID OR IMEI = @IMEI OR Date >= @startdate AND Date <= @enddate.
Execution plan used shows three operations: Clustered Index Scan, Parallelism, and SELECT.
Please suggest how to optimize. Can the table be optimized for my query if it has no PK, and clustered index permits NULLs, and all columns being filtered also allow nulls?

Post the DDL for the table and all indices please, makes it easier to contribute to the problem's solution.
😎
Suggest you also post the execution plan😉

Eirikur Eiriksson SSC Guru Points: 182916 More actions · Answer 6

polkadot - Sunday, November 25, 2018 11:02 PM
Non Clustered Index4: IMEI

Is this mobile CDR data?
😎

Jonathan AC Roberts SSCoach Points: 18363 More actions · Answer 7

Your query is equivalent to this:
SELECT * FROM myTable WHERE Account# = @account# UNION SELECT * FROM myTable WHERE BulkID = @BulkID UNION SELECT * FROM myTable WHERE IMEI = @IMEI UNION SELECT * FROM myTable WHERE Date BETWEEN @startdate AND Date @enddate

You might be able to get it to use the indexes if you rewrite the query like that.
The indexes should also include the 25 columns you are selecting.

muthukkumaran Kaliyamoorthy SSCoach Points: 19026 More actions · Answer 8

Grant Fritchey - Monday, November 26, 2018 6:44 AM
muthukkumaran Kaliyamoorthy - Monday, November 26, 2018 5:05 AM
Can you share the execution plan. PK and CI is different. You can also check the fragmentation, since it's a heap.
One, it's not a heap if it has a clustered index (however bad the clustered index may be). Two, fragmentation won't affect plan creation and index choices (there's an exception to this, but not worth getting into here). Be cautious on advice along these lines.

The table is practically a heap because its' clustered index is on a column that allows NULLs, and there are hundreds of thousands of NULLs.

Oh bad, I thought it was a heap. ya, right fragmentation will not affect the plan creation and it slow the data fetch.
Sure. I will read carefully.

Muthukkumaran Kaliyamoorthy
https://www.sqlserverblogforum.com/

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

Jonathan AC Roberts - Monday, November 26, 2018 7:52 AM
Your query is equivalent to this:
SELECT * FROM myTable WHERE Account# = @account# UNION SELECT * FROM myTable WHERE BulkID = @BulkID UNION SELECT * FROM myTable WHERE IMEI = @IMEI UNION SELECT * FROM myTable WHERE Date >= @startdate A AND Date <= @enddat
You might be able to get it to use the indexes if you rewrite the query like that.
The indexes should also include the 25 columns you are selecting.

25 columns is likely to be most or all of the columns (guessing of course).
Next guess - a small number of rows would be best returned with a seek of a narrow index and key lookup.
Some idea of the data distribution might help, however I suspect the OP actually has the usual "match or null" logic in the query i.e. [variable = NULL or variable = column]

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 10

ChrisM@Work - Monday, November 26, 2018 8:06 AM
25 columns is likely to be most or all of the columns (guessing of course).
Next guess - a small number of rows would be best returned with a seek of a narrow index and key lookup.
Some idea of the data distribution might help, however I suspect the OP actually has the usual "match or null" logic in the query i.e. [variable = NULL or variable = column]

Which does bring up the question, are those variables we're seeing or parameters? That too could affect index choice based on the differences in how the row estimates are arrived at. There just are no easy answers without all the details.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 11

Grant Fritchey - Monday, November 26, 2018 8:11 AM
ChrisM@Work - Monday, November 26, 2018 8:06 AM
25 columns is likely to be most or all of the columns (guessing of course).
Next guess - a small number of rows would be best returned with a seek of a narrow index and key lookup.
Some idea of the data distribution might help, however I suspect the OP actually has the usual "match or null" logic in the query i.e. [variable = NULL or variable = column]
Which does bring up the question, are those variables we're seeing or parameters? That too could affect index choice based on the differences in how the row estimates are arrived at. There just are no easy answers without all the details.

Can I run this again in case it was missed?

"There just are no easy answers without all the details."
Guessing can be amusing, however a faster (and probably better) solution will be arrived at with all of the necessary details up front.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jonathan AC Roberts SSCoach Points: 18363 More actions · Answer 12

Grant Fritchey - Monday, November 26, 2018 8:11 AM
ChrisM@Work - Monday, November 26, 2018 8:06 AM
25 columns is likely to be most or all of the columns (guessing of course).
Next guess - a small number of rows would be best returned with a seek of a narrow index and key lookup.
Some idea of the data distribution might help, however I suspect the OP actually has the usual "match or null" logic in the query i.e. [variable = NULL or variable = column]
Which does bring up the question, are those variables we're seeing or parameters? That too could affect index choice based on the differences in how the row estimates are arrived at. There just are no easy answers without all the details.

The OP did say the query returns just 9 rows. So they should be able to get it to use an index.

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 13

Jonathan AC Roberts - Monday, November 26, 2018 8:47 AM
Grant Fritchey - Monday, November 26, 2018 8:11 AM
ChrisM@Work - Monday, November 26, 2018 8:06 AM
25 columns is likely to be most or all of the columns (guessing of course).
Next guess - a small number of rows would be best returned with a seek of a narrow index and key lookup.
Some idea of the data distribution might help, however I suspect the OP actually has the usual "match or null" logic in the query i.e. [variable = NULL or variable = column]
Which does bring up the question, are those variables we're seeing or parameters? That too could affect index choice based on the differences in how the row estimates are arrived at. There just are no easy answers without all the details.
OP did say the query returns just 9 rows. So they should be able to get it to use an index.

No arguments there. We just don't have enough data to do more than speculate... somewhat wildly.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

polkadot SSChampion Points: 13664 More actions · Answer 14

Answers to Qs:
Grant Fritchey: (1) I added parenthesis around Date. (2) using parameters. Attached query.
ChrisM: (1) table has 98 columns (2) selectivity of columns will change over time (3) rows returned will be between 5 to 50
Erik Erikson: (1) IMEI is mobile. (2) you'll see from DDL that some columns are quite huge and all but one, Bill_Process_Failure_Reason @ varchar(4000) is justified.
Jonathan AC Roberts: I don't think I will take UNION route because report query actually has a number of JOINS that I have commented out for sake of addressing bottlenecks one step at a time

attached exec plan / inserted DDL, and report query.

DDL

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[QR_Device_Trade_In_Master](

[Offer_ID] [varchar](500) NULL,

[Offer_Generate_Date] [datetime] NULL,

[Offer_Update_Date] [datetime] NULL,

[Manufacturer] [varchar](300) NOT NULL,

[Model] [varchar](300) NOT NULL,

[Serial_Number] [varchar](100) NOT NULL,

[Serial_Number_Type] [varchar](50) NULL,

[Application_User_ID] [varchar](500) NULL,

[Application_ID] [varchar](500) NULL,

[device_Evaluation_Date] [datetime] NULL,

[Store_ID] [varchar](100) NULL,

[Purchase_Device_Sku] [varchar](100) NULL,

[BAN] [numeric](9, 0) NULL,

[Subscriber_No] [varchar](20) NULL,

[Offer_Amount] [numeric](8, 4) NULL,

[RMA_Number] [varchar](100) NULL,

[Offer_Expiration_Date] [datetime] NULL,

[Customer_Email_Address] [varchar](100) NULL,

[Customer_Phone_Number] [varchar](30) NULL,

[Offer_Status] [varchar](100) NULL,

[Rejection_Reason] [varchar](300) NULL,

[RMA_Status] [varchar](100) NULL,

[RMA_Expiration_Date] [datetime] NULL,

[Settlement_Amount] [numeric](8, 4) NULL,

[Network_Operator] [varchar](100) NOT NULL,

[Shipping_Label_URL] [varchar](2000) NULL,

[Eligibility_Status] [varchar](100) NULL,

[Eligibility_Reason] [varchar](1000) NULL,

[Is_Customer_Billed] [varchar](1) NULL,

[Adjustment_Reason] [varchar](2000) NULL,

[Bill_Status] [varchar](1) NULL,

[Bill_Process_Failure_Reason] [varchar](4000) NULL,

[Eligibility_Reason_Code] [varchar](100) NULL,

[Sender_ID] [varchar](500) NULL,

[Channel_ID] [varchar](500) NULL,

[Session_ID] [varchar](500) NULL,

[Work_Flow_ID] [varchar](500) NULL,

[Activity_ID] [varchar](500) NULL,

[Dealer_Code] [varchar](500) NULL,

[Credit_Issue_Date] [datetime] NULL,

[Create_Date] [datetime] NOT NULL,

[Created_By] [varchar](500) NULL,

[Update_Date] [datetime] NULL,

[Updated_By] [varchar](500) NULL,

[Image_Url] [varchar](4000) NULL,

[Sales_Document_Number] [varchar](100) NULL,

[Purchase_Order_Number] [varchar](100) NULL,

[Sap_Channel] [varchar](50) NULL,

[Sap_Return_Type] [varchar](10) NULL,

[Sap_Return_Message] [varchar](1000) NULL,

[Transaction_Type] [varchar](100) NULL,

[ProcessLogID] [int] NOT NULL,

[ImportFileDate] [date] NOT NULL,

[Event_Type] [varchar](100) NULL,

[Retail_Store_Type] [varchar](50) NULL,

[Assurant_Model] [varchar](300) NULL,

[Assurant_Manufacturer] [varchar](300) NULL,

[Cancel_Reason] [varchar](500) NULL,

[Program_ID] [varchar](10) NOT NULL,

[Fulfillment_Ord_Number_Forward] [varchar](30) NULL,

[EIP_Balance] [numeric](8, 4) NULL,

[EIP_Plan_ID] [varchar](50) NULL,

[EIP_Device_ID] [varchar](50) NULL,

[Claim_Auth_Code] [varchar](50) NULL,

[Claim_Req] [varchar](2) NULL,

[Claim_Auth_Date] [date] NULL,

[Claim_Auth_Available] [varchar](2) NULL,

[Deductible_Amount] [numeric](8, 4) NULL,

[Device_Collection_Required] [varchar](2) NULL,

[Claimed_Device_Collected] [varchar](1) NULL,

[Jump_TradeIn_Value] [numeric](8, 4) NULL,

[Claim_Submitted_By] [varchar](100) NULL,

[TOTAL_DAMAGE_AMOUNT] [numeric](10, 4) NULL,

[INSURANCE_DEDUCTABLE_AMOUNT] [numeric](10, 4) NULL,

[DEVICE_SHIPMENT_DATE] [date] NULL,

[DEVICE_RETURN_EXPIRATION_DATE] [date] NULL,

[DEVICE_RETURN_DATE] [date] NULL,

[DEVICE_RETURN_SESSION_ID] [varchar](100) NULL,

[ORIGINAL_RMA_NUMBER] [varchar](100) NULL,

[UPDATED_RMA_EXPIRATION_DATE] [date] NULL,

[Adjustment_Id] [varchar](50) NULL,

[Tracking_Carrier] [varchar](100) NULL,

[Tracking_Number] [varchar](50) NULL,

[Return_Tracking_Number] [varchar](100) NULL,

[Trade_In_Sub_Status] [varchar](50) NULL,

[Trade_In_Status] [varchar](50) NULL,

[Return_Tracking_Carrier] [varchar](100) NULL,

[Bill_Credit_Balance] [numeric](8, 4) NULL,

[Device_Received_Date] [date] NULL,

[Trade_In_Credit_Date] [date] NULL,

[Credited_Down_Payment] [numeric](8, 4) NULL,

[Returned_Date] [date] NULL,

[TRADEIN_INTENT] [varchar](50) NULL,

[ORDER_LINE_ID] [varchar](50) NULL,

[BulkID] [varchar](50) NULL

) ON [PRIMARY]
GO

SET ANSI_PADDING

OFF

GO

Report Query
In report query you'll notice I am pulling from table with '_2016onward' which is a copy of the table, containing records from year 2016 onward, so as not to interfere with production table while testing new indexes in production.
Have commented out the JOINS to focus on primary table

DECLARE

@BULKID varchar(50) = 5006,

@BANvarchar(50) = null,-- '955932237',

@FROM_DATEdate = NULL,

@TO_DATEdate = NULL,

@IMEI varchar(100) = NULL

SELECT QR.[Offer_ID]

,QR.[Offer_Generate_Date]

,QR.[Offer_Update_Date]

,QR.[Manufacturer]

,QR.[Model]

,QR.[Serial_Number]

,QR.[Application_User_ID]

,QR.[Application_ID]

,QR.[device_Evaluation_Date]

,QR.[Store_ID]

,QR.[BAN]

,QR.[Subscriber_No]

,QR.[Offer_Amount]

,QR.[RMA_Number]

,QR.[Offer_Status]

,QR.[Rejection_Reason]

,QR.[RMA_Status]

,QR.[Settlement_Amount]

,QR.[Eligibility_Status]

,QR.[Eligibility_Reason]

,QR.[Adjustment_Reason]

,QR.[Bill_Status]

,QR.[Channel_ID]

,QR.[Credit_Issue_Date]

,QR.[Program_ID]

,QR.[BulkID]

--,QP.MARKET_ASSESSED_AMOUNT as [Fair Market Value]

--,QP.PRM_NAME AS Promo

,QR.Assurant_Manufacturer as [Assurant Manufacturer Received]

,QR.Assurant_Model as [Assurant Model Received]

--,COALESCE(Q1.Answer_Name,'N/A') AS [FMiP OFF]

--,COALESCE(Q6.Answer_Name,'N/A') AS [AntiTheft OFF]

--,COALESCE(Q2.Answer_Name,'N/A') AS [Liquid Damage]

--,COALESCE(Q3.Answer_Name,'N/A') AS [Promo Accepted]

--,COALESCE(Q4.Answer_Name,'N/A') AS [LCD Acceptable]

--,COALESCE(Q5.Answer_Name,'N/A') AS [Powers On]

FROM DistributionCenterFeed.[dbo].[QR_Device_Trade_In_Master_2016onward] QR

--LEFT JOIN ODS.[dbo].[QR_DR_TradeIn_Promo] QP ON QR.Offer_ID = QP.Offer_ID

--LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q1 ON QR.Offer_ID = Q1.Offer_ID AND Q1.Question_Name = 'Find My Iphone'--

--LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q2 ON QR.Offer_ID = Q2.Offer_ID AND Q2.Question_Name = 'Liquid Damage'

--LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q3 ON QR.Offer_ID = Q3.Offer_ID AND Q3.Question_Name = 'ATLANTIS_PROMO'

--LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q4 ON QR.Offer_ID = Q4.Offer_ID AND Q4.Question_Name = 'LCD Issue'

--LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q5 ON QR.Offer_ID = Q5.Offer_ID AND Q5.Question_Name = 'Powers On'

--LEFT JOIN ODS.[dbo].[QR_DR_Transaction_Ques_Answers] Q6 ON QR.Offer_ID = Q6.Offer_ID AND Q6.Question_Name = 'Anti Theft Disabled'

WHERE

BAN = @BAN

OR BulkID = @BULKID

OR Serial_Number= @IMEI

OR ([Offer_Generate_Date] >= @FROM_DATE AND [Offer_Generate_Date] <= @TO_DATE)

GO

--Quote me