February 17, 2011 at 1:57 am
I have a query, a fairly straightforward, single table query with a slightly nasty WHERE clause that looks like this:
'right(column_name,1) != 'X'
(not my query)
The problem I have is that this query runs and returns around 6M rows in 1:30 on a test server, but running against the same dataset on a production box is taking 22:30. The servers are almost identical in spec, they're in different datacentres but are both SAN attached with similar configuration.
So far:
-its not related to blocking as this occurs even when no other process is accessing the server
-Neither table has any indexes (its Fact table in a cube build) and Execution plans are simple and identical
-Both servers have optimisation jobs running nightly which update stats
-I've run SQLIO against the disks with comparable results
-I see nothing freaky when I check out the basic counters in Perfmon
I'm running out of ideas - does anyone else have any?
February 17, 2011 at 2:41 am
Network? Streaming 6 million rows across the network won't be quick. Were you running local on the test server and remote on production perhaps?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 17, 2011 at 3:31 am
All local. The issue scales with volume anyway and the particular query is largely irrelevant. A query that runs <1 sec on the test box takes 23 seconds on Prod.
February 17, 2011 at 3:48 am
What happens if you try this on both servers (post results back)?
set statistics io on;
set statistics time on;
go
set showplan_text on;
go
select count_big(all column_name)
from your_table
go
set showplan_text off;
go
set statistics time off;
set statistics io off;
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
February 17, 2011 at 4:14 am
mister.magoo (2/17/2011)
select count_big(all column_name)
Odd syntax. Why not count(*)?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 17, 2011 at 4:29 am
I have noticed in the past that COUNT(*) does not always force a full count of all rows
I remember pushing the FIZZBUZZ problem to a trillion rows and used count instead of selecting all the columns - during that process it became apparent that count(*) can take shortcuts, while count_big(ALL....) seemed to force a full count.
I guess in this case - as the table is a heap it should work out the same, (no indexes to use for the rowcount) but for my own sanity I use count_big to avoid overflow problems and (ALL column_name) to be sure of counting every row instead of shortcutting an answer.
I am prepared to be be taught how woefully wrong I am, but this is what I currently believe.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
February 17, 2011 at 5:16 am
Ok, I have had a quick check and can't find any noticable difference between them now, so maybe I just remembered something wrong....never mind - no harm done 😛
edit: checking BOL, I see that ALL is the default anyway, so I must have had an internal error or something in my head - just ignore me!
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
February 17, 2011 at 10:32 am
mister.magoo (2/17/2011)
I have noticed in the past that COUNT(*) does not always force a full count of all rows
It does. It reads the leaf level of the smallest index to count the rows. Count_big will be needed if you have over 2 billion rows. It can be inaccurate if there concurrent activity, but that's all.
I use count_big to avoid overflow problems and (ALL column_name) to be sure of counting every row instead of shortcutting an answer.
Count (column_name) does not count all rows, even with All (which is the alternative to distinct and is the default anyway). Count(column_name) returns the number of rows where column_name is not null.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 17, 2011 at 11:32 am
GilaMonster (2/17/2011)
mister.magoo (2/17/2011)
I have noticed in the past that COUNT(*) does not always force a full count of all rowsIt does. It reads the leaf level of the smallest index to count the rows. Count_big will be needed if you have over 2 billion rows. It can be inaccurate if there concurrent activity, but that's all.
I use count_big to avoid overflow problems and (ALL column_name) to be sure of counting every row instead of shortcutting an answer.
Count (column_name) does not count all rows, even with All (which is the alternative to distinct and is the default anyway). Count(column_name) returns the number of rows where column_name is not null.
Yes, thanks for clarifying that Gail, as I said in my previous post, I don't know what happened to make me think like that!
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
February 17, 2011 at 11:46 am
MissTipps-320357 (2/17/2011)
I have a query, a fairly straightforward, single table query with a slightly nasty WHERE clause that looks like this:'
right(column_name,1) != 'X'
(not my query)
The problem I have is that this query runs and returns around 6M rows in 1:30 on a test server, but running against the same dataset on a production box is taking 22:30. The servers are almost identical in spec, they're in different datacentres but are both SAN attached with similar configuration.
So far:
-its not related to blocking as this occurs even when no other process is accessing the server
-Neither table has any indexes (its Fact table in a cube build) and Execution plans are simple and identical
-Both servers have optimisation jobs running nightly which update stats
-I've run SQLIO against the disks with comparable results
-I see nothing freaky when I check out the basic counters in Perfmon
I'm running out of ideas - does anyone else have any?
I would start checking server and connection settings in this case.
For example, does one server have a different parallelism threshold than the other? How about max memory settings? Resource governor on one and not the other? Default and actual isolation levels?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 18, 2011 at 7:39 pm
The issue scales with volume anyway and the particular query is largely irrelevant. A query that runs <1 sec on the test box takes 23 seconds on Prod...
February 18, 2011 at 10:07 pm
yuanyelss (2/18/2011)
The issue scales with volume anyway and the particular query is largely irrelevant. A query that runs <1 sec on the test box takes 23 seconds on Prod...
What are you seeing for waits on the process as it is running?
David
@SQLTentmaker“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply