Which SQL would run faster?

  • Which of the following you think would run the quickest (I am thinking the 3rd - but I wanted to check)? Each of x and y has around 10 million rows

    [font="Courier New"]Query - 1

    ========[/font]

    Select

    x.c1,

    x.c2,

    MIN(case when y.c3 = '' then '9' else coalesce(y.c3,'*') end) as c3,

    MIN(case when y.c4 = '' then '9' else coalesce(y.c4,'*') end) as c4,

    MIN(case when y.c5 = '' then '9' else coalesce(y.c5,'*') end) as c5,

    MIN(case when y.c6 = '' then '9' else coalesce(y.c6,'*') end) as c6,

    MIN(case when y.c7 = '' then '9' else coalesce(y.c7,'*') end) as c7

    from

    x

    inner join

    y

    on

    x.c1 = y.c1 and x.c2 = y.c2

    group by x.c1, x.c2

    [font="Courier New"]Query - 2

    ========[/font]

    Select

    x.c1,

    x.c2,

    MIN(c3) as c3,

    MIN(c4) as c4,

    MIN(c5) as c5,

    MIN(c6) as c6,

    MIN(c7) as c7

    from

    x

    inner join

    (select y.c1, y.c2,

    case when y.c3 = '' then '9' else coalesce(y.c3,'*') end as c3,

    case when y.c4 = '' then '9' else coalesce(y.c4,'*') end as c4,

    case when y.c5 = '' then '9' else coalesce(y.c5,'*') end as c5,

    case when y.c6 = '' then '9' else coalesce(y.c6,'*') end as c6,

    case when y.c7 = '' then '9' else coalesce(y.c7,'*') end as c7

    y)

    on

    x.c1 = y.c1 and x.c2 = y.c2

    group by x.c1, x.c2

    [font="Courier New"]Query - 3

    ========[/font]

    With a scalar user defined function defined dbo.udf_coal(x) as -

    Select

    x.c1,

    x.c2,

    MIN(c3) as c3,

    MIN(c4) as c4,

    MIN(c5) as c5,

    MIN(c6) as c6,

    MIN(c7) as c7

    from

    x

    inner join

    (select y.c1, y.c2,

    dbo.udf_coal(c3) end as c3,

    dbo.udf_coal(c4) end as c4,

    dbo.udf_coal(c5) end as c5,

    dbo.udf_coal(c6) end as c6,

    dbo.udf_coal(c7) end as c7

    y)

    on

    x.c1 = y.c1 and x.c2 = y.c2

    group by x.c1, x.c2

  • Why don't you just run it and tell us? 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, running them would be a far quicker way of seeing which is faster than asking.

    I can say that the third will probably be the slowest. Don't know about the other 2.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I'd guess 1. 3 will be slowest.

  • Make it a poll instead... 🙂 I think 1 and 2 would be equal in performance (it seems that the same amount of rows are touched, and every row is just once COALESCed and then SUMmed), the third slowest.

    Ronald HensbergenHelp us, help yourself... Post data so we can read and use it: http://www.sqlservercentral.com/articles/Best+Practices/61537/-------------------------------------------------------------------------2+2=5 for significant large values of 2

  • I think the queries 1 & 2 are similar, so I'd bet on both & more on 1..., 3 is out of question here for me.:)

    --Ramesh


  • r.hensbergen (1/16/2009)


    Make it a poll instead... 🙂 I think 1 and 2 would be equal in performance (it seems that the same amount of rows are touched, and every row is just once COALESCed and then SUMmed), the third slowest.

    Heh... no... please don't make it a poll. I'm trying to be as nice as pie... I don't understand why people post code and ask which will be faster... that's pretty lazy... just run the bloody code and see. Then, if you don't understand why, post the results, the code, and ask why. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I guess number one.

    What is the prize if I get this right?

  • alrite ... first thanks for 'all the guidance' ....

    second .. while i admit to being a little lazy, I do not have access to the database itself that this 'representative query' is to process ..... would have had to build 2 tables with around 10 million rows (not that it is a big deal) ... I just assumed the "experts" here would have been able to tell this without having to run the SQL and also generous enough to share their opinion - guess I was wrong ....

  • Jeff Moden (1/16/2009)


    I'm trying to be as nice as pie... I don't understand why people post code and ask which will be faster... that's pretty lazy...

    Cause it's easier than setting up a test environment?

    Then, if you don't understand why, post the results, the code, and ask why. 😉

    I wonder how many of the people who think that one or the other will be faster can explain why they say that. 😉

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • leonp (1/16/2009)


    second .. while i admit to being a little lazy, I do not have access to the database itself that this 'representative query' is to process

    Got a local copy of SQL and a test DB? Taking half an hour or so and building up a small sample DB (or just taking adventureworks) can save you massively in the future when you need to test stuff out and aren't allowed to do so on the production server. It's also a great learning environment for fooling around with SQL and seeing how it behaves.

    I just assumed the "experts" here would have been able to tell this without having to run the SQL and also generous enough to share their opinion - guess I was wrong ....

    It's the fishing and teaching to fish thing again. I can (and did) tell you that 3 would probably be slower. If you don't know why, that fact in itself is useless as it doesn't help in the future.

    Run the stuff, see how it work, see if you can figure it out, and next time you won't have to ask the 'experts', you'll be able to figure it out yourself.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ok ok ..... spare me the barrage, smart, sweet people ..... you were right .... 1st and 2nd were quicker than 3rd and in fact 1st was quicker than 2nd a little bit ...

    I changed the coalesce in the 1st to ISNULL and that was even quicker ..... (not asking why - just an FYI)

  • leonp (1/16/2009)


    alrite ... first thanks for 'all the guidance' ....

    second .. while i admit to being a little lazy, I do not have access to the database itself that this 'representative query' is to process ..... would have had to build 2 tables with around 10 million rows (not that it is a big deal) ... I just assumed the "experts" here would have been able to tell this without having to run the SQL and also generous enough to share their opinion - guess I was wrong ....

    Sorry, Leon... I didn't even consider that. I've recently been involved with a rash of posts where people just don't try and think that folks on this forum actually get paid for their answers. We don't, by the way. Anyway, please accept my apologies and a small "gift" for my bit of rudeness (don't know what else to call it)... dunno if you have one already, but I've found the following data generator to be a big help for things like this. Feel free to warp the code to your needs...

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

    --===== Add a clustered index to keep it from being a "heap". This one happens to also be a PK.

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • leonp (1/16/2009)


    ok ok ..... spare me the barrage, smart, sweet people ..... you were right .... 1st and 2nd were quicker than 3rd and in fact 1st was quicker than 2nd a little bit ...

    I changed the coalesce in the 1st to ISNULL and that was even quicker ..... (not asking why - just an FYI)

    I obviously haven't reviewed the source code for ISNULL or COALESCE, but I've found the same thing to be true especially in concatenation functions. The only thing I can figure is that ISNULL is designed to take two and only two operands. COALESCE is designed to take at least two up to some relatively unknown number. I can only imagine that the code to determine how many operands COALESCE has and to "split them out" is a wee bit longer than the code to split only two operands that are required.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Truly appreciate it Jeff .....

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply