When is a scan cheaper than a seek?

  • Also what indexes exist on both tables as well as if there are any foreign keys defined.

  • Gail and Lynn, it is also worth noting the the scan is feeding directly into a merge join, so no intermediate sorting or hashing is required for the join. There's no blocking. As the table is scanned, the join happens with almost no overhead. If a merge join wasn't possible, the optimizer might choose not to scan the major table.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (11/3/2011)


    Gail and Lynn, it is also worth noting the the scan is feeding directly into a merge join, so no intermediate sorting or hashing is required for the join.

    Yes, I noticed.

    If a merge join wasn't possible, the optimizer might choose not to scan the major table.

    If a merge join wasn't possible, the optimiser would likely have gone for a hash join with scan instead. More overhead, but less IO than the multiple seeks.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The other thing to keep in mind is that the time results are likely from a test copy of the database. The production server is likely to have more I/O bottlenecks, which means lower I/O can result in reduced overall runtime, unlike on a test server with no load on it. I've seen that happen many times.

    Might not be the case, if he's testing in production.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I agree with G^2. I would rather error on the side of reducing reads over picking the "fastest" time unless it is really disproportionate.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • dva2007 (11/3/2011)


    this sentence clarified lot of things for me.

    I always thought if it is big table then index seek would be faster. it is clear that for smaller table it is possible that even table scan is faster then index seek. But in this case it is different. probably this is the reason most of the answers relevant to performance tuning and index is "depends" (i..e table size, execution plan, existing index etc. etc.). Really interesting.

    Also keep in mind that when SQL creates an execution plan it doesn't take into account if all the necessary data is cached or not. The difference in performance between scan and seek may be more severe if the cache is empty and all the requests are going to the physical disk.


    Alex Suprun

  • Thanks for everyones input, I'm hapy to leave it as is then. My tests were conducted on a test server.

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • bteraberry (11/3/2011)


    I agree with G^2. I would rather error on the side of reducing reads over picking the "fastest" time unless it is really disproportionate.

    I agree - it's a very rare case that I'll choose either duration or CPU over reads+writes; IOPS and throughput are generally the bottlenecks one major systems. I've seen a CPU bottlenecked SQL server in two situations - one was a parallel reindexing with compression, which was CPU bound on a 16 core machine, and the other was a very complex, very large analysis query.

    Other than that, I'll almost always take lower reads - when many things run at once, I want as few reads as are required for the aggregate workload.

  • bteraberry (11/3/2011)


    I agree with G^2. I would rather error on the side of reducing reads over picking the "fastest" time unless it is really disproportionate.

    I'd have to add that "It Depends" on the type of reads. Logical reads aren't as nasty as some would think.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/4/2011)


    bteraberry (11/3/2011)


    I agree with G^2. I would rather error on the side of reducing reads over picking the "fastest" time unless it is really disproportionate.

    I'd have to add that "It Depends" on the type of reads. Logical reads aren't as nasty as some would think.

    Indeed, and in this case the scan is likely going to read more distinct pages while the multiple seeks read a smaller set of pages multiple times. That said, additional page reads means locks taken and released, so higher chance of blocking.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Artoo22 (11/3/2011)


    Is it really more efficient to scan 4,367,682 rows and perform a merge join than it is to perform 40,371 seeks?

    According to SQL Server's model of reality, the answer is apparently yes - at least in this specific case.

    It's important to realize that the model used is limited and generic; in particular it takes no account of the physical characteristics of your particular machine (for example CPU speed, I/O capability and memory configuration). The model often produces query plans that run acceptably well on most hardware, and that's pretty much as far as its ambition goes. The optimizer does not try to come up with the best possible plan (for whatever value of 'best' you care to define), it just tries to get things horribly wrong as infrequently as possible, without taking excessive time or resources to do so. Various people have varying opinions concerning how successful that effort is.

    There are all sorts of things you can do to produce a better plan than the optimizer - after all, the DBA tends to know more about the data, its distribution and correlations, which queries are vital, and whether ultimate execution speed is more important than resource use. He or she will also probably have a good feel for the whole of the workload, and can better judge whether a query should be optimized for speed, CPU usage, I/O impact, buffer pool space usage...or one of the other 5,000 factors that make query tuning so much of an art.

  • GilaMonster (11/5/2011)


    Jeff Moden (11/4/2011)


    bteraberry (11/3/2011)


    I agree with G^2. I would rather error on the side of reducing reads over picking the "fastest" time unless it is really disproportionate.

    I'd have to add that "It Depends" on the type of reads. Logical reads aren't as nasty as some would think.

    Indeed, and in this case the scan is likely going to read more distinct pages while the multiple seeks read a smaller set of pages multiple times. That said, additional page reads means locks taken and released, so higher chance of blocking.

    Agreed. Hmmmm... shifting gears a bit but still in the same car, perhaps some experimentation with blocking needs to be done with utility tables, such as the Tally Table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/24/2011)


    perhaps some experimentation with blocking needs to be done with utility tables, such as the Tally Table.

    Those kind of tables should be read-only in use (I don't mean on a read-only filegroup), hence there should never be blocking. Things like updating or inserting into Tally/Numbers/Calendar tables should be rare and unusual operations.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (11/24/2011)


    Jeff Moden (11/24/2011)


    perhaps some experimentation with blocking needs to be done with utility tables, such as the Tally Table.

    Those kind of tables should be read-only in use (I don't mean on a read-only filegroup), hence there should never be blocking. Things like updating or inserting into Tally/Numbers/Calendar tables should be rare and unusual operations.

    Ah... my apologies. I didn't mean "blocking". I meant simple shared locking and the escalation of such locks on larger Tally Tables.

    I also didn't want people to think that reads were a cause of blocking. Having a larger number of reads simply means more of a chance for those reads to be blocked by someone writing to the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply