Query Performance Question

Question

Query Performance Question

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 1

Stefan_G (8/3/2010)
What is the actual execution times of the two query plans you posted ?
The estimated cost for Query1 is 109.4 while the estimated cost for Query2 is 46.3
With these numbers it seems very strange that Query1 would really take more than 30 times more time than Query2

That was the reason for my original posting... I could not believe that a simple change in which table was being used for the search conditions/constraints would have such a dramatic affect on the queries. In fact, I re-ran these tests on two different servers (and two different databases) just to be sure, as well as running the queries themselves three or 4 times. I also included my

DBCC DROPCLEANBUFFERS -- frees buffers

DBCC FREEPROCCACHE -- Frees procedure cache, should not matter here

statements just to be sure nothing was being cached to give me incorrect results.

Just as an FYI, for this particular example/database there are 18,670,308 records in the VIEW_InvoiceData view and 40,015 records in the InvSummary table.

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 2

Query 1 took 00:24:33 (+/- 1 minute in other trials) and Query 2 took 00:00:50 (+/- 2 seconds in other trials). Both were run on the server to minimize any network delays.

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Stefan_G SSCertifiable Points: 6609 More actions · Answer 3

DB_Newbie2007 (8/3/2010)
Stefan,
1) We do have constraints on the tables... here is an example:
ALTER TABLE [dbo].[Jan10] WITH NOCHECK ADD CONSTRAINT [CK_Jan10] CHECK ((datepart(month,[InvoiceDate])=(1) AND datepart(year,[InvoiceDate])=(2010)))
GO
ALTER TABLE [dbo].[Jan10] CHECK CONSTRAINT [CK_Jan10]
GO

I am not sure, but I think it would work better with a constraint like this:

ALTER TABLE [dbo].[Jan10] ADD CONSTRAINT [CK_Jan10] CHECK InvoiceDate >= '20100101' AND InvoiceDate < '20100201'

2) The VIEW_InvoiceData table listed in the original post was incorrect. It should have been
CREATE VIEW dbo.VIEW_InvoiceData
AS
SELECT * FROM dbo.Jan10 UNION ALL SELECT * FROM dbo.Feb10

Good

3) The actual execution plans are correct. "The names (of the fields and tables) have been changed to protect the innocent" (namely, ME 😉 ). A confidentiality agreement prevents me from posting the actual table design, indexes, etc.

That is too bad. It makes it much harder to really understand your problem.

4) There is a field (which I renamed to ChargeType), but again, you have to understand the table design to figure out how that field (plus other fields) combine together to generate a "charge" record (again, this is where things get proprietary). This particular report used that (plus one other) field to generate it's data (which you can see in the query plans). 😉 I just had to "scrub" the plans before being posted.

Hmm exactly how did you produce the "scrubbed" plans ?

Did you first execute the real query, capture the execution plan, and then perform a search & replace in the plan xml ?

or did you create a set of modified tables and indexes, and queries, run the modified queries and then post the actual plans ?

the queries you posted in your first post contained a GROUP BY. The queries shown in the actual plans did not - at least no aggregation is visible in the query plans.

Do your real problem queries contain GROUP BY or not ?

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 4

Stefan_G (8/3/2010)
Hmm exactly how did you produce the "scrubbed" plans ?
Did you first execute the real query, capture the execution plan, and then perform a search & replace in the plan xml ?
or did you create a set of modified tables and indexes, and queries, run the modified queries and then post the actual plans ?
the queries you posted in your first post contained a GROUP BY. The queries shown in the actual plans did not - at least no aggregation is visible in the query plans.
Do your real problem queries contain GROUP BY or not ?

Executed the real query, captured the execution plan(s), then did a search and replace on specific names (Database name, table name, field name).

There is no grouping on the original query.

*SIGH* Ok, if you really must have the original query (again, which is available in the posted execution plans, here it is after scrubbing the field names). This is the query IN ITS ENTIRETY (i.e., run directly on SSMS as is), no grouping, sub-queries, etc.

USE ScratchDB

GO

DBCC DROPCLEANBUFFERS -- frees buffers

DBCC FREEPROCCACHE -- Frees procedure cache

SELECT

IS.InvoiceNumber,

InvoiceItem,

EnteredWeight,

IndItemWeight,

CONVERT(varchar(3),CONVERT(int, SUBSTRING(ID.PkgDim,1,5)))+' x '+

CONVERT(varchar(3),CONVERT(int, SUBSTRING(ID.PkgDim,7,5)))+' x '+

CONVERT(varchar(3),CONVERT(int, SUBSTRING(ID.PkgDim,13,5))) AS PkgDim,

IsNull(CustNote1,'')+' '+IsNull(CustNote2,'')+' '+IsNull(CustNote3,'') AS Notes,

TransactionDate AS PickUp_Date,

ItemCharge AS Charges

FROM VIEW_InvoiceData ID

LEFT OUTER JOIN [dbo].[BilledWtType] WT ON ID.[WtType] = WT.[Code]

INNER JOIN dbo.InvSummary IS ON ID.InvoiceDate = IS.InvoiceDate AND RIGHT(ID.InvoiceNumber,9) = IS.InvoiceNumber

WHERE IS.InvoiceDate Between '7/15/2010' And '8/3/2010'

--AND ID.InvoiceDate Between '7/15/2010' And '8/3/2010'

AND InvoiceChargeItem = 'abc'

AND ChargeType = 'def'

AND CustNote1 IS NOT NULL

---- Parameter is being passed to this report to determine how to run it

---- makes the query dynamic

AND ((PkgDim IS NOT NULL AND 'Yes' = 'Yes')

OR (PkgDim IS NULL AND 'No' = 'Yes'))

I understand that having the DB objects, query plans, etc. help in diagnosis, but I was interested AT A HIGH LEVEL in knowing why there was such a dramatic difference in the execution times. The reason for posting the execution plans in this case as to compare the same query using the two different conditions to help in diagnosing the differences. My original post was an example of the table structure and indexes, plus the focus was "not about the design of the database objects but on the query performance".

Part of the question was answered,

Grant Fritchey (8/3/2010)
No, a different column in the WHERE clause means a different set of statitistics on a different table, which, can, and did, lead to differences in the execution plan.

plus

Grant Fritchey (8/3/2010)
Stefan, good thought on the optimizer dropping out, but both queries went through a FULL optimization process. That means the optimizer thought it had them both right. I'm still wondering about the parallelism, because that's the one thing that's making the most difference here, from what I can see.

Now I have some areas to research to learn more! 😀

Thanks! 🙂

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Stefan_G SSCertifiable Points: 6609 More actions · Answer 5

DB_Newbie2007 (8/3/2010)
I understand that having the DB objects, query plans, etc. help in diagnosis, but I was interested AT A HIGH LEVEL in knowing why there was such a dramatic difference in the execution times. The reason for posting the execution plans in this case as to compare the same query using the two different conditions to help in diagnosing the differences. My original post was an example of the table structure and indexes, plus the focus was "not about the design of the database objects but on the query performance".

Sorry for asking so many questions. The reason I did not trust the query in the actual plans was that you said you had scrubbed the plans - I was not sure what that meant.

Now let me rephrase your actual question slightly:

1) Why is there such a dramatic difference in execution time between two different (but logically equivalent) formulations of the same query ?

The simple answer is that there is no guarantee that two different queries will get the exact same execution plan. Most of the time you will get very similar plans, but in this case one plan was fairly fast but the other was extremely slow.

This gives us another question.

2) Why is the plan generated for Query1 so slow ?

The reason for this is probably bad statistics.

If you look at the plan for Query1, you see that the clustered index seek in InvSummary has an estimated number of rows=1 while actual number of rows = 915. This means that the actual cost of this plan is 915 times higher than the estimated cost.

if that estimation was correct this would be a very good plan.

Since that estimate is so bad the plan becomes very bad.

Please make sure that the statistics for the primary key on InvSummary is up to date.

While you are at it also make sure that all other statistics for all other relevant tables are up to date. (Especially the index on ChargeType in all the partitions)

If you do this I believe you will get good performance from both your queries. At least it would surprise me very much if the same plan was selected for Query1 if the optimizer new the real cost was not 109 but 109*915=99735

Stefan_G SSCertifiable Points: 6609 More actions · Answer 6

I made some tests and they confirm my suspicions:

* It is very important to formulate the CHECK condition on the individual tables appropriately.

* If you have correct statistics it does not matter how you formulate the query

Try the script below.

Change the constraints and see the difference in execution plan.

Change the predicate in the query and see that it makes no difference with correct statistics.

Enable the STATISTICS_NORECOMPUTE option and see that now you get different plans depending on the predicate.

-- Let us play somewhere safe

USE tempdb

GO

if object_id('dbo.InvSummary') is not null DROP TABLE dbo.InvSummary

GO

CREATE TABLE dbo.InvSummary (

InvoiceNumber varchar(12) NOT NULL,

InvoiceDate smalldatetime NOT NULL,

CONSTRAINT [PK_InvSum] PRIMARY KEY CLUSTERED (InvoiceDate, InvoiceNumber)

WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY],

--WITH (IGNORE_DUP_KEY = ON, STATISTICS_NORECOMPUTE = ON) ON [PRIMARY],

) ON [PRIMARY];

GO

if object_id('dbo.Jan10') is not null DROP TABLE dbo.Jan10

GO

CREATE TABLE dbo.Jan10 (

RowNumber int NOT NULL,

InvoiceNumber varchar(15) NOT NULL,

InvoiceDate smalldatetime NOT NULL,

CONSTRAINT [PK_Jan10] PRIMARY KEY CLUSTERED (InvoiceDate, InvoiceNumber, RowNumber) WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]

-- Try switching between these two different constraints. Note that only the first performs as expected

,CONSTRAINT CK_Jan10 CHECK (InvoiceDate >= '20100101' and InvoiceDate < '20100201')

--,CONSTRAINT CK_Jan10 CHECK (datepart(year,InvoiceDate)=2010 and datepart(month,InvoiceDate)=1)

) ON [PRIMARY];

GO

if object_id('dbo.Feb10') is not null DROP TABLE dbo.Feb10

GO

CREATE TABLE dbo.Feb10 (

RowNumber int NOT NULL,

InvoiceNumber varchar(15) NOT NULL,

InvoiceDate smalldatetime NOT NULL,

CONSTRAINT [PK_Feb10] PRIMARY KEY CLUSTERED (InvoiceDate, InvoiceNumber, RowNumber) WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]

,CONSTRAINT CK_Feb10 CHECK (InvoiceDate >= '20100201' and InvoiceDate < '20100301')

--,CONSTRAINT CK_Feb10 CHECK (datepart(year,InvoiceDate)=2010 and datepart(month,InvoiceDate)=2)

) ON [PRIMARY];

GO

if object_id('dbo.VIEW_InvoiceData') is not null DROP VIEW dbo.VIEW_InvoiceData

GO

CREATE VIEW dbo.VIEW_InvoiceData

AS

SELECT * FROM dbo.Jan10 UNION ALL

SELECT * FROM dbo.Feb10

GO

-- generate a tally table

if object_id('dbo.Tally') is not null drop table dbo.Tally

go

create table dbo.Tally (n int primary key clustered)

insert into dbo.Tally

select top (1000) (row_number() over(order by (select null))) as n

from sys.columns c1, sys.columns c2, sys.columns c3

-- generate sample data - use 450 rows per invoice

insert into dbo.Jan10 (InvoiceNumber, RowNumber, InvoiceDate)

select r1.n as InvoiceNumber, r2.n as RowNumber, dateadd(day, r1.n%28, '20100101') as InvoiceDate

from dbo.Tally r1

cross join dbo.Tally r2

where r1.n <=1000 and r2.n <= 450

insert into dbo.Feb10 (InvoiceNumber, RowNumber, InvoiceDate)

select r1.n as InvoiceNumber, r2.n as RowNumber, dateadd(day, r1.n%28, '20100201') as InvoiceDate

from dbo.Tally r1

cross join dbo.Tally r2

where r1.n <=1000 and r2.n <= 450

insert into dbo.InvSummary (InvoiceNumber, InvoiceDate)

select distinct InvoiceNumber, InvoiceDate

from VIEW_InvoiceData

go

----- Query 1

SET STATISTICS IO ON

SELECT ID.InvoiceNumber, ID.InvoiceDate

FROM dbo.VIEW_InvoiceData ID

INNER JOIN dbo.InvSummary ISU

ON ID.InvoiceNumber = RIGHT(RTRIM(ISU.InvoiceNumber),12)

AND ID.InvoiceDate = ISU.InvoiceDate

WHERE

-- Try switching between these two predicates. It should not matter which you use

ISU.InvoiceDate BETWEEN '20100101' and '20100105'

--ID.InvoiceDate BETWEEN '20100101' and '20100105'

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 7

Stefan_G (8/4/2010)
2) Why is the plan generated for Query1 so slow ?
The reason for this is probably bad statistics.

Ok... I updated the statistics for the primary key on the dbo.InvSummary table, then re-ran the queries! WOW!

Query 1 ran in 00:01:37 and ran with parallelism

Query 2 ran in 00:00:15.

I do not know much about statistics and I would have thought (I dislike the word "assumed", but that is what this is) that the AUTO_UPDATE_STATISTICS setting on the database would be taking care of this, but obviously I am wrong. :w00t:

I think I need to spend some time research methods to review and update statistics on all of the databases.

WOW again! I am stunned!

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Stefan_G SSCertifiable Points: 6609 More actions · Answer 8

Glad I could help! 😎

Just of curiosity, could you please post the actual query plans of Query1 and Query2 after the statistics update ?

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 9

Execution plans after updating the Statistics on InvSummary table only.

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 10

>>I do not know much about statistics and I would have thought (I dislike the word "assumed", but that is what this is) that the AUTO_UPDATE_STATISTICS setting on the database would be taking care of this, but obviously I am wrong.

Most people aren't aware that for almost every table in existence you must modify TWENTY PERCENT OF THE ROWS in the table before you get an auto-fired statistics update. And even once you get that it will only normally do a sampling of the values. So for your 500M row table you must modify 100MILLION rows of data to get a stats change. This is a crippling issue for MANY of my clients who rely on it. You simply MUST do manual stats updates on some interval that makes sense for each table in your system. I have some client tables that get stats updates every HOUR.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 11

Kevin,

Thanks for the post. I found the same thing elsewhere on the web about 20% for 2005 and 2008.

I found this website that needs more review: http://www.sqlstatistics.com/page/2/

I also started on my new journey to get a statistics update process running for all of my DB(s). Hopefully this will help identify the statistics that need to be updated:

SELECT o.name AS

, i.name AS [Index Name], i.rowmodctr AS [Rows Modified]

, stats_date(ST.object_id, ST.stats_id) AS [Last Updated]

FROM SYS.SYSOBJECTS o

INNER JOIN SYS.SYSINDEXES i

ON o.id = i.id

INNER JOIN SYS.STATS ST

ON I.indid = ST.object_id

WHERE i.rowmodctr > 0 ---- number of inserted, deleted, or updated rows since the last time statistics were updated for the table.

and o.xtype = 'U' ---- U = User table

AND stats_date(ST.object_id, ST.stats_id) < ''+GetDate()-5+'' ---- last time statistics were NOT updated within last 5 days

ORDER BY O.Name, i.rowmodctr DESC

I will need to modify the query slightly to exclude certain tables that I know do not need to be updated (staging tables, temp tables, or older static tables).

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)