How integrated are execution plans with hardware?

  • I had a disastrous conundrum this weekend while executing a 1 time ETL move I've been working on for 3 months for the largest automotive manufacturer in the US.

    I developed a data transformation using SSIS on SQL Server 2005 SP1. It was developed on an HP blade server with 4 GB of RAM and 4 2.8 GHz CPUs hyperthreaded. The server runs Windows Enterprise 2003 and recognized the 4 processors as 4 physical and 4 virtual.

    The production box is similar except that the CPUs are not hyperthreaded and it runs Windows Server 2003 DataCenter edition.

    I went to run my app at the beginning of the maintenance window in front of clients, god, and the whole world and it just kept running and running and running and I sweated, crapped myself, and halted it. Something was wrong.

    Turns out 3 out of 45 SQL statements were using an execution plan that was so inefficient that they were never going to finish.

    I compared the execution plan on my development machine to the plan on the production machine only to find that they were quite different.

    I compared all tables, indexes, SPs, statistics, volume of data, index fragmentation, disk fragmentation, DBCC checkdisk errors, server level properties, EVERHTHING I could think of.

    The execution plan on the development machine showed a greater level of parallelism. My running hypothesis is that due to the hyperthreadedness of the dev machine, the query engine was able to make up for an inefficient query, whereas the production machine could not.

    Does this sound logical and just how much does the hardware setup affect a query execution plan???

  • The actual issue is that your stats and IO subsystems may have been impacted by "other" factors.

    Example:

    1. In a production machine you could have been blocked by some other processes.

    2. The IO subsystem could have been severely used by other queries.

    3. The Database Files were not large enough and autogrows were in your way.

    4. Tempdb was not configured for maximum performance.

    I could go on and on on what was the culprit ...

    When you move from DEV to PROD unless *all* is exactly the same you are not granted anything in terms of performance

    Just my $0.02


    * Noel

  • How much of a difference in database/table size between the two environments?  One of the things that will absolutely cause your query plans to change in when the data distribution and volume between the two environments is radically different...

    Joe

     

  • I took snapshots of the production data every day for two weeks so my development environment was always working on the actual data volume that I was going to be up against at execution time.

    Noeld, though those are all things that could affect the performance, none of them address the fact that the 2 machines were generating different query plans.

    The queries in question choked on the production machine because of a "distinct" keyword. Remove this and the plans were the same between machines. I even tried using "group by" instead to no avail.

    The use of distinct caused 1 machine to generate a different query plan when all database objects and database level properties were the same.

  • Sounds like the difference in operating systems, not in the hardware, to me. It's a thought.

  • Hmm... how did you take your snapshots?  Is the production database set to auto update statistics or is there a sheduled reorg/reindex that might not have taken place on the development machine?

    As for hardware, differences in the number of cpus in relation to storage (e.g. file groups or multiple files in different locations) could very well make a difference in the query plan generated.  Also check and make sure that maxdop isn't set on the production machine...?

    Joe

     

     

  • 1) You didn't explicitly mention if the SQL Server was the EXACT same version (sp and hotfixes).

    2) While doing your development, you should have made your test platform as similar to production as possible by disabling Hyperthreading at the bios level.

    3) The hardware configuration of the server will absolutely have an effect on the query optimization process.

    4) You didn't mention if the only difference was the number of parallelism thread or that queries that were parallelized in dev were NOT parallelized in production. If the latter, check out using optimization hints to force the issue.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Try disabling parameter sniffing done by sql server...get the arguments passed to the stored procedure into local variables and use the local variables.. hope this helps...

  • I wasn't aware that you could "disable parameter sniffing". Do you mean use WITH RECOMPILE on the sprocs perhaps?

    Also, the user stated that the exact same data was being used so that shouldn't matter.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • No I do not mean use the with recompile option...I am not sure why but the with recompile option did not work for me...then I changed the stored procedure to start using local variables..the sp was taking 15-20 mins and now it just takes 1-2 mins.... for more information http://omnibuzz-sql.blogspot.com/2006/11/parameter-sniffing-stored-procedures.html

  • I would certainly try recompile first. There are probably very few situations where this won't routinely result in the optimal query. Plus having a query plan developed for the 'average' value will be just plain inefficient a very large percentage of the time.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • You might want to try executing:

    • DBCC UPDATEUSAGE
    • UPDATE STATISTICS
    • sp_recompile for all tables
    • sp_refreshview for all views

    just prior to your production run.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • i had a similar issue on a publisher and subscriber where a query execution plan was too different. toned down the maintenance.

    we used to run alter index rebuild, then updatae statistics and a full scan on weekends. got rid of the update statistics on the publisher and the execution plan changed. also dropped and rebuilt some indexes from scratch.

  • if plans are different then there will be differences, the amount of resource should not affect a query plan, likewise hyperthreading is probably a red herring. However, that said, not all parallel plans are good and if your prod server had more cpu then your query could turn into the real query from hell - I've seen this happen. Likely there will be differences with set options and configuration - also data centre and dev are not the same editions of sql server I'd guess, not worked with data centre - but real point is that somewhere your test and prod systems should be identical if you want 100% validation, otherwise your testing is only valid to a point.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply