January 15, 2009 at 9:00 pm
Which of the following you think would run the quickest (I am thinking the 3rd - but I wanted to check)? Each of x and y has around 10 million rows
[font="Courier New"]Query - 1
========[/font]
Select
x.c1,
x.c2,
MIN(case when y.c3 = '' then '9' else coalesce(y.c3,'*') end) as c3,
MIN(case when y.c4 = '' then '9' else coalesce(y.c4,'*') end) as c4,
MIN(case when y.c5 = '' then '9' else coalesce(y.c5,'*') end) as c5,
MIN(case when y.c6 = '' then '9' else coalesce(y.c6,'*') end) as c6,
MIN(case when y.c7 = '' then '9' else coalesce(y.c7,'*') end) as c7
from
x
inner join
y
on
x.c1 = y.c1 and x.c2 = y.c2
group by x.c1, x.c2
[font="Courier New"]Query - 2
========[/font]
Select
x.c1,
x.c2,
MIN(c3) as c3,
MIN(c4) as c4,
MIN(c5) as c5,
MIN(c6) as c6,
MIN(c7) as c7
from
x
inner join
(select y.c1, y.c2,
case when y.c3 = '' then '9' else coalesce(y.c3,'*') end as c3,
case when y.c4 = '' then '9' else coalesce(y.c4,'*') end as c4,
case when y.c5 = '' then '9' else coalesce(y.c5,'*') end as c5,
case when y.c6 = '' then '9' else coalesce(y.c6,'*') end as c6,
case when y.c7 = '' then '9' else coalesce(y.c7,'*') end as c7
y)
on
x.c1 = y.c1 and x.c2 = y.c2
group by x.c1, x.c2
[font="Courier New"]Query - 3
========[/font]
With a scalar user defined function defined dbo.udf_coal(x) as -
Select
x.c1,
x.c2,
MIN(c3) as c3,
MIN(c4) as c4,
MIN(c5) as c5,
MIN(c6) as c6,
MIN(c7) as c7
from
x
inner join
(select y.c1, y.c2,
dbo.udf_coal(c3) end as c3,
dbo.udf_coal(c4) end as c4,
dbo.udf_coal(c5) end as c5,
dbo.udf_coal(c6) end as c6,
dbo.udf_coal(c7) end as c7
y)
on
x.c1 = y.c1 and x.c2 = y.c2
group by x.c1, x.c2
January 15, 2009 at 11:28 pm
Why don't you just run it and tell us? 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2009 at 12:58 am
Well, running them would be a far quicker way of seeing which is faster than asking.
I can say that the third will probably be the slowest. Don't know about the other 2.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 16, 2009 at 7:07 am
I'd guess 1. 3 will be slowest.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
January 16, 2009 at 7:13 am
Make it a poll instead... 🙂 I think 1 and 2 would be equal in performance (it seems that the same amount of rows are touched, and every row is just once COALESCed and then SUMmed), the third slowest.
Ronald HensbergenHelp us, help yourself... Post data so we can read and use it: http://www.sqlservercentral.com/articles/Best+Practices/61537/-------------------------------------------------------------------------2+2=5 for significant large values of 2
January 16, 2009 at 7:17 am
I think the queries 1 & 2 are similar, so I'd bet on both & more on 1..., 3 is out of question here for me.:)
--Ramesh
January 16, 2009 at 7:25 am
r.hensbergen (1/16/2009)
Make it a poll instead... 🙂 I think 1 and 2 would be equal in performance (it seems that the same amount of rows are touched, and every row is just once COALESCed and then SUMmed), the third slowest.
Heh... no... please don't make it a poll. I'm trying to be as nice as pie... I don't understand why people post code and ask which will be faster... that's pretty lazy... just run the bloody code and see. Then, if you don't understand why, post the results, the code, and ask why. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2009 at 7:32 am
I guess number one.
What is the prize if I get this right?
January 16, 2009 at 7:33 am
alrite ... first thanks for 'all the guidance' ....
second .. while i admit to being a little lazy, I do not have access to the database itself that this 'representative query' is to process ..... would have had to build 2 tables with around 10 million rows (not that it is a big deal) ... I just assumed the "experts" here would have been able to tell this without having to run the SQL and also generous enough to share their opinion - guess I was wrong ....
January 16, 2009 at 7:34 am
Jeff Moden (1/16/2009)
I'm trying to be as nice as pie... I don't understand why people post code and ask which will be faster... that's pretty lazy...
Cause it's easier than setting up a test environment?
Then, if you don't understand why, post the results, the code, and ask why. 😉
I wonder how many of the people who think that one or the other will be faster can explain why they say that. 😉
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 16, 2009 at 7:40 am
leonp (1/16/2009)
second .. while i admit to being a little lazy, I do not have access to the database itself that this 'representative query' is to process
Got a local copy of SQL and a test DB? Taking half an hour or so and building up a small sample DB (or just taking adventureworks) can save you massively in the future when you need to test stuff out and aren't allowed to do so on the production server. It's also a great learning environment for fooling around with SQL and seeing how it behaves.
I just assumed the "experts" here would have been able to tell this without having to run the SQL and also generous enough to share their opinion - guess I was wrong ....
It's the fishing and teaching to fish thing again. I can (and did) tell you that 3 would probably be slower. If you don't know why, that fact in itself is useless as it doesn't help in the future.
Run the stuff, see how it work, see if you can figure it out, and next time you won't have to ask the 'experts', you'll be able to figure it out yourself.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 16, 2009 at 11:30 am
ok ok ..... spare me the barrage, smart, sweet people ..... you were right .... 1st and 2nd were quicker than 3rd and in fact 1st was quicker than 2nd a little bit ...
I changed the coalesce in the 1st to ISNULL and that was even quicker ..... (not asking why - just an FYI)
January 16, 2009 at 12:13 pm
leonp (1/16/2009)
alrite ... first thanks for 'all the guidance' ....second .. while i admit to being a little lazy, I do not have access to the database itself that this 'representative query' is to process ..... would have had to build 2 tables with around 10 million rows (not that it is a big deal) ... I just assumed the "experts" here would have been able to tell this without having to run the SQL and also generous enough to share their opinion - guess I was wrong ....
Sorry, Leon... I didn't even consider that. I've recently been involved with a rash of posts where people just don't try and think that folks on this forum actually get paid for their answers. We don't, by the way. Anyway, please accept my apologies and a small "gift" for my bit of rudeness (don't know what else to call it)... dunno if you have one already, but I've found the following data generator to be a big help for things like this. Feel free to warp the code to your needs...
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
--===== Add a clustered index to keep it from being a "heap". This one happens to also be a PK.
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2009 at 12:17 pm
leonp (1/16/2009)
ok ok ..... spare me the barrage, smart, sweet people ..... you were right .... 1st and 2nd were quicker than 3rd and in fact 1st was quicker than 2nd a little bit ...I changed the coalesce in the 1st to ISNULL and that was even quicker ..... (not asking why - just an FYI)
I obviously haven't reviewed the source code for ISNULL or COALESCE, but I've found the same thing to be true especially in concatenation functions. The only thing I can figure is that ISNULL is designed to take two and only two operands. COALESCE is designed to take at least two up to some relatively unknown number. I can only imagine that the code to determine how many operands COALESCE has and to "split them out" is a wee bit longer than the code to split only two operands that are required.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2009 at 12:18 pm
Truly appreciate it Jeff .....
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply