Optimize hint has huge implications. Why?

  • Ninja's_RGR'us (12/2/2011)


    nico van niekerk (12/2/2011)


    The original query's construct, somehow, allowed SS to rely on the statistics alone and not look at the query from a new perspective. The OPTIMIZE hint steered it in the right direction. The object of this exercise was to see if a query with a different construct would steer SS to look at the query rather than the statistics.

    as said before really the last, last, last, after the headbanging last resort option and with a gun to your head.

    I am not sure what you mean by this. I think I was clear from the beginning that I wanted to pursue the fine-tuning of the query with the warped statistics in place to see if I could sharpen the query construct. It was just an exercise. No head-banging or gun required.

    nico van niekerk (12/2/2011)The temp table was not #tmp but @Tmp, a table definition and one cannot create indexes on table definitions. Its residence in memory makes it faster than an indexed table, which is why I favored it in the final solution, and because it is very narrow and only contains about 2,500 rows. My previous use of a table definition to host FileSummaries, didn't change the query's construct and still left SS in the dark as to the intent of the query.

    So wrong it's not even funny. Do yourself a huge favor and read this amazing articles.

    http://www.sqlservercentral.com/articles/Temporary+Tables/66720/

    All tables can spill on disks, no expections.

    Yes you can create an index in a table variable. You can create multiple unique constraints. You just need to reinclude the PK explicitely (it's there anyways). That allows you to build that extra index(es).

    I really wish you would quit the snide remarks. The decorum on this site is of a professional nature and your attitude does not pursue that goal. It turns out you are dead wrong.

    I was clear that one cannot "create indexes" on a table variable; nothing was ever said or intended to refer to constraints. I purely responded to another post that asked me if I created indexes on the #temp table by pointing out it was not a #temp table but a @temp table. The link you cited confirmed my position.

    Table variables are created in memory with the current batch as scope exclusively. #temp and other tables are explicitly created "on disk" as you put it, and they don't "spill" over onto disk as table variables do. That table variables can spill over onto disk is a true statement, but with a table of about a fixed 30K, one can safely assume that this table would not spill on to disk, which was a major consideration in this solution.

    2500 rows is not narrow in any way. The consequence here always lies in how you use that table. If you do simple in flight logging, or return all rows at the end of the proc then no problem. But as soon as that table is used in a join, where, group by & many more cases you're dead to get a bad plan most of the time. Getting a good plan is actually more lucky than anything else.

    The width of a table is not measured by its number of rows but rather by the number of bytes of each row. This table had an 11-byte row-width, which makes it a very narrow table. You may want to make a note of that.

    nico van niekerk (12/2/2011)

    Here is the solution that I came up with which runs in between 9 and 12 seconds without using OPTIMIZE FOR with the same statistics.

    DECLARE @Time datetime2

    Set @Time = '2009-01-01'

    declare @filesummaries table (starttime datetime null, [endtime] datetime null,mshtq_in_cml bit not null)

    insert into @filesummaries ([starttime],[endtime],mshtq_in_cml) select starttime, endtime,mshtq_in_cml from filesummaries where mshtq_in_cml & 1 = 1 and StartTime < @time

    select count(*) as Occurences,case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end as BinMean from rawimc r join (select [starttime],[endtime] from @filesummaries) f on r.devicetimestamp between f.[starttime] and f.[endtime]

    group by case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end

    order by BinMean

    What you have here my friend is the ONE single exception where a table variable in a join is beneficial. Assuming you only have pairs of begin/enddates that will <almost> NEVER overlap, then the BEST plan is to assume that the minimum # of rows will be returned, which is exactly what a table variable does. No amount of stats or unique constraints will let the optimizer know that you expect only 0 to 1 row in that join.

    And, as I said at the outset, I was looking for an exception that will avoid the statistics and it appears as if I have found it. I wouldn't go as far as saying that it is the "ONE single" exception. Smart people work on these things and there is always someone somewhere who will come up with something new or better. It's just a fact.

    It is also why I published my findings for the benefit of the SSC's members. Cutting the execution time down from 9 hours 40 minutes max to under 10 seconds without refreshing warped statistics, I'd say, is quite an important case study for earnest SQL developers.

    nico van niekerk (12/2/2011)

    My best guess is that the JOINs in the first query didn't give SS a clear indication of its intent and it reverted back to the statistics, while this query is less ambiguous.

    Thanks to everybody who contributed to this issue. It's much appreciated.

    Thanks

    Nico

    No the difference is that with any other table stats will be taken and the estimates will be wrong, causing a bad plan.

    I've wanted to toy with this for a while... maybe now's the time.

    What I had tried in the past is a cross/outer apply with TOP 1 + order by PK on a temp table. The estimates where correct however the sort came with a huge cost. Maybe you've hit a gem on the head with this one.

    Congrats!

    Just some friendly feedback: It appears as if you are lecturing the members, which sometimes leads you away from the essence of the topic. Some might appreciate that, but remember you don't know who you are dealing with; you might be dealing with folks who wrestled with these problems when the majority of the planet's population wasn't even born yet. Be humble, courteous and professional.

    Thanks for your contribution,

    Nico

  • I'm not snide, I have different POV in mind.

    People find these threads with google and they need to know the whole story.

    You got my info and best links. Happy hunting.

  • Ninja's_RGR'us (12/4/2011)


    I'm not snide, I have different POV in mind.

    People find these threads with google and they need to know the whole story.

    You got my info and best links. Happy hunting.

    I appreciate different points of view but not condescending phrases such as "So wrong it's not even funny" and "do yourself a huge favor" when you were the one that was wrong. It rather distracts from the whole story when responses veer off the topic and engage in lectures. The whole story is already in the thread, otherwise nobody would understand the question.

    I don't mean to beat this to death. Let's move on.

    Thanks for responding.

  • The Dixie Flatline (12/4/2011)


    nico van niekerk (12/2/2011)


    The temp table was not #tmp but @Tmp, a table definition and one cannot create indexes on table definitions. Its residence in memory makes it faster than an indexed table....

    This may not contribute to your solution, but the statements above can't remain unchallenged.

    Table variables live in temdb just like temporary tables.

    http://msdn.microsoft.com/en-us/library/ms190768.aspx

    Although indexes cannot be explicitly created for table variables (using CREATE INDEX), table variables can have indexes to support PRIMARY KEY and UNIQUE constraints.

    http://msdn.microsoft.com/en-us/library/ms175010.aspx

    I don't understand why you needed to challenge statements that I am in agreement with. I didn't comment nor based the solution on where table variables live. I merely said that a table variable (of the tiny size that this one was) will live in memory. How is it relevant that it is created in tempdb or not?

    Indexes: I was responding to someone else's question whether I created indexes on #tmp while I clearly said I created @Tmp. I responded by saying that it was a table variable and not a temp table and besides, one cannot create indexes on a table variable; just responding to a question, nothing else. If there never was any intent nor any mention of creating constraints, how does that become a matter for a challenge?

    I don't get it.

  • @Nico: Nobody is challenging you here (however many SSC guys here possess the knowledge / skills to do so). What my friends were attempting here you to know that you might not know already. If you think some portion of the thread is not per your need you can ignore it.

    For Your information, there are many students / IT professionals who don’t participate here actively but they are regular readers. So the suggestions here on SSC are always generic (assuming you are novice) and thus applicable to all.

  • nico van niekerk (12/4/2011)


    Ninja's_RGR'us (12/4/2011)


    I'm not snide, I have different POV in mind.

    People find these threads with google and they need to know the whole story.

    You got my info and best links. Happy hunting.

    I appreciate different points of view but not condescending phrases such as "So wrong it's not even funny" and "do yourself a huge favor" when you were the one that was wrong. It rather distracts from the whole story when responses veer off the topic and engage in lectures. The whole story is already in the thread, otherwise nobody would understand the question.

    I don't mean to beat this to death. Let's move on.

    Thanks for responding.

    I agree to disagree. You clearly know your stuff and that tone might not have been appropriate from you POV.

    In MY little world, I have daily cases of I've read x on a forum and now my db's dead or perf is abysmal. Ultimately I care more about the masses than 1 single individual since each thread gets way more traffic from google / links than that 1 original question.

  • nico van niekerk (12/4/2011)


    The Dixie Flatline (12/4/2011)


    nico van niekerk (12/2/2011)


    The temp table was not #tmp but @Tmp, a table definition and one cannot create indexes on table definitions. Its residence in memory makes it faster than an indexed table....

    This may not contribute to your solution, but the statements above can't remain unchallenged.

    Table variables live in temdb just like temporary tables.

    http://msdn.microsoft.com/en-us/library/ms190768.aspx

    Although indexes cannot be explicitly created for table variables (using CREATE INDEX), table variables can have indexes to support PRIMARY KEY and UNIQUE constraints.

    http://msdn.microsoft.com/en-us/library/ms175010.aspx

    I don't understand why you needed to challenge statements that I am in agreement with. I didn't comment nor based the solution on where table variables live. I merely said that a table variable (of the tiny size that this one was) will live in memory. How is it relevant that it is created in tempdb or not?

    Indexes: I was responding to someone else's question whether I created indexes on #tmp while I clearly said I created @Tmp. I responded by saying that it was a table variable and not a temp table and besides, one cannot create indexes on a table variable; just responding to a question, nothing else. If there never was any intent nor any mention of creating constraints, how does that become a matter for a challenge?

    I don't get it.

    Just to clear out the air about the indexes on a table variable. It can be done, including columns is a little less pretty tho =>

    DECLARE @tbl TABLE (id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, fname VARCHAR (50), lname VARCHAR(50), UNIQUE (fname, id), UNIQUE (lname, id))

    INSERT INTO @tbl (fname, lname) VALUES ('Ninja', 'RgRus')

    --see how this doesn't error out

    INSERT INTO @tbl (fname, lname) VALUES ('Ninja', 'RgRus')

    INSERT INTO @tbl (fname, lname) VALUES ('Jeff', 'Moden')

    INSERT INTO @tbl (fname, lname) VALUES ('Gail', 'Shaw')

    INSERT INTO @tbl (fname, lname) VALUES ('G', 'Squared')

    INSERT INTO @tbl (fname, lname) VALUES ('Grant', 'Fritchey')

    --Those will do UIX seeks

    SELECT fname, id FROM @tbl WHERE fname = 'Ninja'

    SELECT lname, id FROM @tbl WHERE lname = 'Shaw'

    --This one does CI scan

    SELECT * FROM @tbl WHERE lname = 'Shaw'

    --We can't use the include clause on the table variable but we can fake it as such

    GO

    DECLARE @tbl TABLE (id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, fname VARCHAR (50), lname VARCHAR(50), UNIQUE (fname, id, lname), UNIQUE (lname, id, fname))

    INSERT INTO @tbl (fname, lname) VALUES ('Ninja', 'RgRus')

    --see how this doesn't error out

    INSERT INTO @tbl (fname, lname) VALUES ('Ninja', 'RgRus')

    INSERT INTO @tbl (fname, lname) VALUES ('Jeff', 'Moden')

    INSERT INTO @tbl (fname, lname) VALUES ('Gail', 'Shaw')

    INSERT INTO @tbl (fname, lname) VALUES ('G', 'Squared')

    INSERT INTO @tbl (fname, lname) VALUES ('Grant', 'Fritchey')

    --Now this one does UIX seek

    SELECT * FROM @tbl WHERE lname = 'Shaw'

  • nico van niekerk (12/4/2011)


    ...Table variables are created in memory with the current batch as scope exclusively. #temp and other tables are explicitly created "on disk" as you put it, and they don't "spill" over onto disk as table variables do. That table variables can spill over onto disk is a true statement, but with a table of about a fixed 30K, one can safely assume that this table would not spill on to disk, which was a major consideration in this solution. [/b]

    Hey Nico,

    One point. Table variables get disk allocation out of the gate. Check this blog post[/url]. Use the check to see where the table variable is stored. I suspect you'll be surprised.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Fair enough.

  • There are probably 1000 myths about @t vs #t.

    The article I linked to list all the pros & cons of each.

    http://www.sqlservercentral.com/articles/Temporary+Tables/66720/

  • Nobody is challenging you here

    If nobody is challenging what I said, then what does Dixie Flatline's comment "But the above statements can't remain unchallenged" mean? Dixie also didn't challenge me, per se, as you stated. Dixie, correctly, challenged the statements, not the person. I responded by pointing out that my statements were, indeed, accurate, as they still are.

    however many SSC guys here possess the knowledge / skills to do so

    I am sure.

    What my friends were attempting here you to know that you might not know already

    I hope I interpreted your sentence correctly.

    If your friends were attempting to point out that which I might not know already, it is a very troubling admission of failure and I believe you speak for a very small minority of the members on this forum.

    So the suggestions here on SSC are always generic (assuming you are novice) and thus applicable to all.

    I disagree with your assessment. The issues on SSC have always been very specific. Not all generic statements are applicable to all, either. I introduced this case study with very specific parameters, stating clearly where I come from. Spinning this in a generic mold is to distract from the value that the solution could bring to everybody.

    If you think some portion of the thread is not per your need you can ignore it.

    I suppose I can, but what good will that do? If someone could point to an error on my part, it is always appreciated very much. That's how we learn.

    It's a pity that fringe issues, that had nothing to do with the solution of the case study, shielded the real issue at hand and the problem that was resolved. I guess true professionals would respond with, "that's the answer" or "I knew that" or even "I've learned something" as some have done.

    Your friends, however, chose to pick up on frivolous, fringe issues that had an inconsequential bearing on the case, believing they are talking to novices, as you apparently do too, and never saw the solution that just passed them by.

    If only one member of this forum, who read this, decided to in future look at the teachable moments that might exist, instead of pontificating about irrelevant issues, then this rant would have been worth it.

  • DUDE I'M SORRY. No offense intended.

    If your friends were attempting to point out that which I might not know already, it is a very troubling admission of failure and I believe you speak for a very small minority of the members on this forum.

    I'm sorry to say this is far from the reality. I must have seen at least 1000 bad posts on that topic over the years. Wrong reflex I guess.

  • Grant Fritchey (12/5/2011)


    nico van niekerk (12/4/2011)


    ...Table variables are created in memory with the current batch as scope exclusively. #temp and other tables are explicitly created "on disk" as you put it, and they don't "spill" over onto disk as table variables do. That table variables can spill over onto disk is a true statement, but with a table of about a fixed 30K, one can safely assume that this table would not spill on to disk, which was a major consideration in this solution. [/b]

    Hey Nico,

    One point. Table variables get disk allocation out of the gate. Check this blog post[/url]. Use the check to see where the table variable is stored. I suspect you'll be surprised.

    The blog you cited states "Other than that, [temporary tables and table variables] will reside completely in memory or will swap out to the disk through tempdb, depending on their size." In my particular case study that I posted on SSC, I was very clear on where the table variable would reside because of its very small size. In memory, as your reference clearly confirms. That was my only consideration. It had to be a table variable and not a temporary table, either, not because of memory, but overhead advantages that a table variable has, some of which are logging and statistics.

    Incidentally, the very small size of the table also makes indexes on it a burden rather than a benefit. Now there might be a lure in that statement for some to go off on a tangent again.

  • Ninja's_RGR'us (12/5/2011)


    There are probably 1000 myths about @t vs #t.

    The article I linked to list all the pros & cons of each.

    http://www.sqlservercentral.com/articles/Temporary+Tables/66720/

    Good article, but I would highlight a point that could easily be overlooked in the body of it: "SQL cannot build statistics against a table variable, like it can against temporary tables." This isn't a difference between the two, it's the main difference that should guide usage choices in a majority of cases. (Of course, there are always exceptions to every rule.)

    So the server inherently spends the overhead on the histogram for the temp table, which, if you need it, is great and otherwise is a waste. I've seen the choice of use be determined by so many irrelevant questions without this key issue being addressed. If you need the statistics then use a temp table. If you're going to scan the whole thing and use all the records in it anyway, save the overhead and stick to a table variable.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • nico van niekerk (12/2/2011)


    The original query's construct, somehow, allowed SS to rely on the statistics alone and not look at the query from a new perspective. The OPTIMIZE hint steered it in the right direction. The object of this exercise was to see if a query with a different construct would steer SS to look at the query rather than the statistics.

    Just some comments on the feedback: Pre-aggregation wasn't an option, although it's a good strategy of which I am very fond, but we were chasing an execution time of around 10 seconds and pre-aggregating 420 million rows would trash that expectation before we even got close to the resultset.

    The temp table was not #tmp but @Tmp, a table definition and one cannot create indexes on table definitions. Its residence in memory makes it faster than an indexed table, which is why I favored it in the final solution, and because it is very narrow and only contains about 2,500 rows. My previous use of a table definition to host FileSummaries, didn't change the query's construct and still left SS in the dark as to the intent of the query.

    Here is the solution that I came up with which runs in between 9 and 12 seconds without using OPTIMIZE FOR with the same statistics.

    DECLARE @Time datetime2

    Set @Time = '2009-01-01'

    declare @filesummaries table (starttime datetime null, [endtime] datetime null,mshtq_in_cml bit not null)

    insert into @filesummaries ([starttime],[endtime],mshtq_in_cml) select starttime, endtime,mshtq_in_cml from filesummaries where mshtq_in_cml & 1 = 1 and StartTime < @time

    select count(*) as Occurences,case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end as BinMean from rawimc r join (select [starttime],[endtime] from @filesummaries) f on r.devicetimestamp between f.[starttime] and f.[endtime]

    group by case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end

    order by BinMean

    My best guess is that the JOINs in the first query didn't give SS a clear indication of its intent and it reverted back to the statistics, while this query is less ambiguous.

    Thanks to everybody who contributed to this issue. It's much appreciated.

    Thanks

    Nico

    Hey Nico,

    I'm not trying to fight with you, at all. I've gone back and reread the solution (posted above so I'm not misstating things). I'm confused by a couple of things. You refer several times to the optimizer relying too much on statistics and not looking enough at the intent of the query. I'm confused by what you're trying to say. The optimizer always looks at the query. It has too. And then it looks at the statistics of the objects in the query to make the decisions it makes regarding the objects included, how they're joined, scans vs. seeks, etc. This seperation that you're placing between statistics and the query, as if one operates without the other doesn't make sense to me.

    I don't think you posted the new execution plan, but I suspect you still have tuning opportunities because of the functions in the GROUP BY clause are likely to be causing scans. From what I can see, reducing the data set by loading the table variable prior to running the main part of the query provides different filtering opportunities for the optimizer which resulted in faster execution. Just guessing without the exec plan. But, you might get even better performance with a temp table since it will have statistics. Because you're loading ~2500 rows and you're joining to the table variable, I think that it's posible that it's still hurting your performance.

    Sorry if you think my statements here are trivial or off topic, but, as I said, I'm confused by what your point of revelation is on the adjustments you made to the query. Sorry for my lack of understanding.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply