Query Performance Question

Question

Query Performance Question

DB_Newbie2007

SSCarpal Tunnel

Points: 4449
More actions
August 3, 2010 at 6:41 am

#236138

We are having a problem with the performance of a query and I was wondering why there might be a difference in the performance between Query 1 and Query 2. Please, this is not about the design of the database objects but on the query performance.
I would have thought that the join between the InvSummary table and the VIEW_InvoiceData (on InvoiceDate) would have prevented scanning all of the horizontally partitioned tables (dbo.Jan10, dbo.Feb10, etc.).
Here is a very simple example of the table structures:
CREATE TABLE dbo.InvSummary (
InvoiceNumber varchar(12) NOT NULL,
InvoiceDate smalldatetime NOT NULL,
Records int NOT NULL,
CustomerNumber varchar(10) NOT NULL,
CONSTRAINT [PK_InvSum] PRIMARY KEY CLUSTERED
(InvoiceDate, InvoiceNumber)
WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
CREATE TABLE dbo.Jan10 (
RowNumber int NOT NULL,
InvoiceNumber varchar(15) NOT NULL,
InvoiceDate smalldatetime NOT NULL,
InvoiceItem varchar(24) NOT NULL,
InvoiceAmount float NOT NULL,
InvoiceChargeItem char(3) NOT NULL,
ItemCharge float NOT NULL,
CustomerNumber varchar(10) NOT NULL,
CustNote1 varchar(30) NULL,
CustNote2 varchar(30) NULL,
CONSTRAINT [PK_Jan10] PRIMARY KEY CLUSTERED
(InvoiceDate, InvoiceNumber, RowNumber)
WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
CREATE TABLE dbo.Feb10 (
RowNumber int NOT NULL,
InvoiceNumber varchar(15) NOT NULL,
InvoiceDate smalldatetime NOT NULL,
InvoiceItem varchar(24) NOT NULL,
InvoiceAmount float NOT NULL,
InvoiceChargeItem char(3) NOT NULL,
ItemCharge float NOT NULL,
CustomerNumber varchar(10) NOT NULL,
CustNote1 varchar(30) NULL,
CustNote2 varchar(30) NULL,
CONSTRAINT [PK_Feb10] PRIMARY KEY CLUSTERED
(InvoiceDate, InvoiceNumber, RowNumber)
WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
CREATE VIEW dbo.VIEW_InvoiceData
AS
SELECT * FROM dbo.Jan10 UNION SELECT * FROM dbo.Feb10
GO
----- Query 1: This query takes 24 minutes
SELECT ID.InvoiceNumber, ID.InvoiceDate, ID.CustomerNumber, SUM(ID.ItemCharge) as SumCharges
FROM dbo.VIEW_InvoiceData ID
INNER JOIN dbo.InvSummary IS
ON ID.InvoiceNumber = RIGHT(RTRIM(InvoiceNumber),12)
AND ID.InvoiceDate = IS.InvoiceDate
WHERE IS.CustomerNumber = '12345678'
AND ID.InvoiceChargeItem = 'abc'
AND IS.InvoiceDate BETWEEN '1/1/10' AND '2/28/10'
----- Query 2: This query takes 42 seconds
SELECT ID.InvoiceNumber, ID.InvoiceDate, ID.CustomerNumber, SUM(ID.ItemCharge) as SumCharges
FROM dbo.VIEW_InvoiceData ID
INNER JOIN dbo.InvSummary IS
ON ID.InvoiceNumber = RIGHT(RTRIM(InvoiceNumber),12)
AND ID.InvoiceDate = IS.InvoiceDate
WHERE IS.CustomerNumber = '12345678'
AND ID.InvoiceChargeItem = 'abc'
AND ID.InvoiceDate BETWEEN '1/1/10' AND '2/28/10'
Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Stefan_G SSCertifiable Points: 6609 More actions · Answer 1

----- Query 1: This query takes 24 minutes
SELECT ID.InvoiceNumber, ID.InvoiceDate, ID.CustomerNumber, SUM(ID.ItemCharge) as SumCharges
FROM dbo.VIEW_InvoiceData ID
INNER JOIN dbo.InvSummary IS
ON ID.InvoiceNumber = RIGHT(RTRIM(InvoiceNumber),12)
AND ID.InvoiceDate = IS.InvoiceDate
WHERE IS.CustomerNumber = '12345678'
AND ID.InvoiceChargeItem = 'abc'
AND IS.InvoiceDate BETWEEN '1/1/10' AND '2/28/10'
----- Query 2: This query takes 42 seconds
SELECT ID.InvoiceNumber, ID.InvoiceDate, ID.CustomerNumber, SUM(ID.ItemCharge) as SumCharges
FROM dbo.VIEW_InvoiceData ID
INNER JOIN dbo.InvSummary IS
ON ID.InvoiceNumber = RIGHT(RTRIM(InvoiceNumber),12)
AND ID.InvoiceDate = IS.InvoiceDate
WHERE IS.CustomerNumber = '12345678'
AND ID.InvoiceChargeItem = 'abc'
AND ID.InvoiceDate BETWEEN '1/1/10' AND '2/28/10'

I dont see any difference at all between Query 1 and Query 2

Did you make a mistake in your post ?

Also, please include the actual execution plans for your two queries. Please read the following article for an explanation:

http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 2

I'm not sure that I see a difference either, but, this:

RIGHT(RTRIM(InvoiceNumber),12)

Is going to absolutely kill performance. You can only ever get scans because of that. Functions on columns is a major no-no.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 3

Ok... included are the scrubbed query plans (removing as much confidential information as possible). The query in the actual execution plans is a bit more involved than the example given above, as I was trying to remove as much confidential/proprietary information as possible, but the example is extremely close.

The only difference in the two queries is the InvoiceDate being used, IS.Invoicedate versus ID.InvoiceDate in the WHERE clause.

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 4

Grant,

I agree with the previous statement (RIGHT(RTRIM(InvoiceNumber),12) is a performance issue)... the VIEW_InvoiceData table was never intended to be queried against. Unfortunately, the original architect (2007) did not do a very good job in normalizing the data from that table into other "normalized" tables, so report queries have to go back to the "source" data to grab missing information.

There is a project (after two years of my "barking"... LOL) to finally correctly normalize the data, as the goal is to not have to run any queries on the VIEW_InvoiceData table. You have NO idea how many reports are "dogs" due to normalization issues!!!!!

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 5

The faster query is getting a parallel execution (assuming I'm reading which of the two plans is the faster). That could explain the difference in execution times right there. But the bigger concern for me is all the key lookup operations. Just as an experiment, can you try adding the ChargeType to the clustered index? Also, you might try creating the missing index that it's suggesting, but that's a lot of columns for an INCLUDE statement.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 6

DB_Newbie2007 (8/3/2010)
Grant,
I agree with the previous statement (RIGHT(RTRIM(InvoiceNumber),12) is a performance issue)... the VIEW_InvoiceData table was never intended to be queried against. Unfortunately, the original architect (2007) did not do a very good job in normalizing the data from that table into other "normalized" tables, so report queries have to go back to the "source" data to grab missing information.
There is a project (after two years of my "barking"... LOL) to finally correctly normalize the data, as the goal is to not have to run any queries on the VIEW_InvoiceData table. You have NO idea how many reports are "dogs" due to normalization issues!!!!!

I hear where you're coming from. We go through the same sorts of fights where I work. Best of luck on it.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 7

Grant,

Unfortunately, "ChargeType" is actually my name for a bunch of columns added together in this particular table to list a charge. The tables (dbo.Jan10, dbo.Feb10, etc) are wide tables, greater than 150 columns per record (which is why they were normalized).

Yes, Query2.sqlplan is the faster of the two queries.

The question I have is why changing just which table is being used to define the search conditions affects the query in such a dramatic fashion.

I would have thought the INNER JOIN and using InvoiceDate as part of the join condition would have caused both queries to have the same performance.

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 8

DB_Newbie2007 (8/3/2010)
Grant,
Unfortunately, "ChargeType" is actually my name for a bunch of columns added together in this particular table to list a charge. The tables (dbo.Jan10, dbo.Feb10, etc) are wide tables, greater than 150 columns per record (which is why they were normalized).
Yes, Query2.sqlplan is the faster of the two queries.
The question I have is why changing just which table is being used to define the search conditions affects the query in such a dramatic fashion.
I would have thought the INNER JOIN and using InvoiceDate as part of the join condition would have caused both queries to have the same performance.

No, a different column in the WHERE clause means a different set of statitistics on a different table, which, can, and did, lead to differences in the execution plan.

Hmm... What's the cost threshold for parallelism on the server? I'm kind of surprised the slower query isn't firing off in parallel too.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Stefan_G SSCertifiable Points: 6609 More actions · Answer 9

DB_Newbie2007 (8/3/2010)
Grant,
Unfortunately, "ChargeType" is actually my name for a bunch of columns added together in this particular table to list a charge. The tables (dbo.Jan10, dbo.Feb10, etc) are wide tables, greater than 150 columns per record (which is why they were normalized).

Hmm. There is an index on ChargeType that is being used by both queries. If ChargeType is not a real column in your original tables it seems odd that you have defined an index on that column.

Right now it feels like your posted execution plans are not really representative for the actual plans used by the real queries.

It would be a lot easier to help if you could post the real table definitions, indexes, queries and execution plans.

Yes, Query2.sqlplan is the faster of the two queries.
The question I have is why changing just which table is being used to define the search conditions affects the query in such a dramatic fashion.
I would have thought the INNER JOIN and using InvoiceDate as part of the join condition would have caused both queries to have the same performance.

It is correct that your two queries are logically eqivalent. Unfortunately the optimizer will not always reach the same execution plan for two different but equivalent queries.

The optimizer will not analyze all possible plans - it will stop when it reaches a plan that it judges as good enough.

Now, there are some problems with your posted tables and views that might or might not be present in the real tables. Since you have not posted the actual table definitions it is ompossible to tell.

When using a partitioned view you should always have CHECK-constraints on the tables to tell the optimizer that it can avoid searching some tables completely. You should also use UNION ALL instead of UNION in the partitioned view.

See the following article for more details: http://msdn.microsoft.com/en-us/library/ms190019.aspx

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 10

Thanks for the comments, Grant.

The cost threshold for parallelism option on this server is set to 5.

Window Server 2008 Standard (64-bit).

SQL Server 2008 Standard (64-bit), Version 10.0.1600.22.

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 11

Stefan, good thought on the optimizer dropping out, but both queries went through a FULL optimization process. That means the optimizer thought it had them both right. I'm still wondering about the parallelism, because that's the one thing that's making the most difference here, from what I can see.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 12

DB_Newbie2007 (8/3/2010)
Thanks for the comments, Grant.
The cost threshold for parallelism option on this server is set to 5.
Window Server 2008 Standard (64-bit).
SQL Server 2008 Standard (64-bit), Version 10.0.1600.22.

Well nuts. That doesn't help.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Stefan_G SSCertifiable Points: 6609 More actions · Answer 13

What is the actual execution times of the two query plans you posted ?

The estimated cost for Query1 is 109.4 while the estimated cost for Query2 is 46.3

With these numbers it seems very strange that Query1 would really take more than 30 times more time than Query2

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 14

Stefan,

1) We do have constraints on the tables... here is an example:

ALTER TABLE [dbo].[Jan10] WITH NOCHECK ADD CONSTRAINT [CK_Jan10] CHECK ((datepart(month,[InvoiceDate])=(1) AND datepart(year,[InvoiceDate])=(2010)))

GO

ALTER TABLE [dbo].[Jan10] CHECK CONSTRAINT [CK_Jan10]

GO

2) The VIEW_InvoiceData table listed in the original post was incorrect. It should have been

CREATE VIEW dbo.VIEW_InvoiceData

AS

SELECT * FROM dbo.Jan10 UNION ALL SELECT * FROM dbo.Feb10

3) The actual execution plans are correct. "The names (of the fields and tables) have been changed to protect the innocent" (namely, ME 😉 ). A confidentiality agreement prevents me from posting the actual table design, indexes, etc.

4) There is a field (which I renamed to ChargeType), but again, you have to understand the table design to figure out how that field (plus other fields) combine together to generate a "charge" record (again, this is where things get proprietary). This particular report used that (plus one other) field to generate it's data (which you can see in the query plans). 😉 I just had to "scrub" the plans before being posted.

This is also only one of over 100 queries/reports run on the database(s). If I were to create "missing indexes" based on every recommendation by the query analyzer or DTA, I would have real performance issues 🙁 I try (with my limited skills and knowledge to date) to create indexes that make sense and try to cover as many of the different situations/queries, but none of them will be perfect. This is why the normalized tables need to be fixed :).

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)