Making Sense of Statistics

Question

Post reply

Making Sense of Statistics

Neil Burton

SSC-Insane

Points: 23001
More actions
June 2, 2014 at 5:09 am

#292008

I'm not sure I'm posting this in the correct forum but anyway....I came across a proc written by one of my colleagues that used a subquery repeatedly. It ran in less than a second but it just seemed like a very cack-handed way of doing something simple. I decided to change it to use a join (because that's the way you're supposed to do things right) and everything worked fine. Just out of interest I thought I'd have a look at the statistics. I don't know the first thing about how to read and make sense of them but I know if one number is bigger than another. I was expecting to see something that would show the new set based approach to be better but what I got was not that.
(13 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Problems'. Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The statistics above are for the new approach and these below are for the old subquery method.
(13 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Problems'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
To my untrained eye, it appears that the statistics for the old method are better than those for the new and that seems illogical. Can anybody help me make sense of statistics in general please? I don't understand why the 'right' way to do something produces an a apparently worse set of statistics than a subquery fest. I'm not too worried about improving either query specifically, they both run in less than a second and they'll never have to scale up more than high tens of rows.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
—Charles Babbage, Passages from the Life of a Philosopher
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Igor Micev SSC-Dedicated Points: 33109 More actions · Answer 1

Hi,

Data that you posted is not the right statistics data. It's some statistics of the IO (set statistics IO on).

Can you run this set (do it on test)

dbcc dropcleanbuffers

dbcc freeproccache

dbcc freesessioncache

and tell back if you're seeing the same logical and physical reads?

Igor Micev,My blog: www.igormicev.com

Neil Burton SSC-Insane Points: 23001 More actions · Answer 2

That's kind of what I meant, I'm not sure what I'm looking for! However, I don't actually have permission to run those. I'm part of the BI team so we've not got much in the way of access.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
—Charles Babbage, Passages from the Life of a Philosopher

How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Igor Micev (6/2/2014)
Can you run this set (do it on test)
dbcc dropcleanbuffers
dbcc freeproccache
dbcc freesessioncache

Why?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Igor Micev SSC-Dedicated Points: 33109 More actions · Answer 4

GilaMonster (6/2/2014)
Igor Micev (6/2/2014)
Can you run this set (do it on test)
dbcc dropcleanbuffers
dbcc freeproccache
dbcc freesessioncache
Why?

Sometimes when you run the same query again it uses less physical reads, since the retrieved pages to use are already in memory. That's the idea, and think he is confused because of it.

Igor Micev,My blog: www.igormicev.com

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

In general, lower numbers are better, however those stats values are so low there's no point in tweaking.

As for subqueries vs joins, depends on where the subqueries are, how many there are, how they're written and more. Without actually seeing the query it's hard to say anything about the changes made and their effectiveness.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Igor Micev (6/2/2014)
GilaMonster (6/2/2014)
Igor Micev (6/2/2014)
Can you run this set (do it on test)
dbcc dropcleanbuffers
dbcc freeproccache
dbcc freesessioncache
Why?
Sometimes when you run the same query again it uses less physical reads, since the retrieved pages to use are already in memory. That's the idea, and think he is confused because of it.

But neither of the stats he posted has any physical reads...

Also, there's no need to clear all the caches to compare like with like. Running the query twice and ignoring the first result does the same, and is generally more representative of how a query runs in production, since it is generally hoped that queries don't usually run on a cold cache.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Neil Burton SSC-Insane Points: 23001 More actions · Answer 7

GilaMonster (6/2/2014)
In general, lower numbers are better, however those stats values are so low there's no point in tweaking.
As for subqueries vs joins, depends on where the subqueries are, how many there are, how they're written and more. Without actually seeing the query it's hard to say anything about the changes made and their effectiveness.

Thanks Gail, I should probably have realised that the subqueries vs joins argument would be an 'it depends' case. It was just that conventional wisdom is that set based is best so I thought that there would be an improvement. As you say, the numbers are so low it's not worth worrying about, although the code is much easier to read. I suspect, probably wrongly, that it's something to do with the fact I'm joining to the same table twice. I'd be interested in direction to some advice on the difference between reads, seeks and scans if somebody could suggest a could link please?

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
—Charles Babbage, Passages from the Life of a Philosopher

How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Igor Micev SSC-Dedicated Points: 33109 More actions · Answer 8

GilaMonster (6/2/2014)
Igor Micev (6/2/2014)
GilaMonster (6/2/2014)
Igor Micev (6/2/2014)
Can you run this set (do it on test)
dbcc dropcleanbuffers
dbcc freeproccache
dbcc freesessioncache
Why?
Sometimes when you run the same query again it uses less physical reads, since the retrieved pages to use are already in memory. That's the idea, and think he is confused because of it.
But neither of the stats he posted has any physical reads...
Also, there's no need to clear all the caches to compare like with like. Running the query twice and ignoring the first result does the same, and is generally more representative of how a query runs in production, since it is generally hoped that queries don't usually run on a cold cache.

You're right, no physical reads, I mixed it up with the logical reads.

However the idea was to see results under cold cache, and of course this is not recommended on production.

Thanks anyway.

Igor Micev,My blog: www.igormicev.com

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

BWFC (6/2/2014)
It was just that conventional wisdom is that set based is best so I thought that there would be an improvement.

What does set-based have to do with changing subqueries to joins?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Neil Burton SSC-Insane Points: 23001 More actions · Answer 10

GilaMonster (6/2/2014)
BWFC (6/2/2014)
It was just that conventional wisdom is that set based is best so I thought that there would be an improvement.
What does set-based have to do with changing subqueries to joins?

This is definitely my inexperience showing. I know what I mean and what I want to do but not necessarily the correct name for it. I thought that joining to the whole table and returning data from that was the set based approach because I was doing something to the whole set. It seemed logically different from looking up something each time needed it with the subqueries. As always though I'm more than willing to learn different.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
—Charles Babbage, Passages from the Life of a Philosopher

How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

BWFC (6/2/2014)
It seemed logically different from looking up something each time needed it with the subqueries.

But that's not how subqueries behave....

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Neil Burton SSC-Insane Points: 23001 More actions · Answer 12

GilaMonster (6/2/2014)
BWFC (6/2/2014)
It seemed logically different from looking up something each time needed it with the subqueries.
But that's not how subqueries behave....

And that's inexperience again. I think it's fair to say that I can write the code to get the results I want but I have no idea how things work in the background. As a team we're largely, if not entirely, self-taught and my boss is a great believer in sub-queries. He taught me and so I know what a sub-query does but not how it does it.

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
—Charles Babbage, Passages from the Life of a Philosopher

How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537