Massive Performance Problems when using Snapshots

  • Hi Folks,

    we are doing a database Migration that involves a conversion of non-Unicode data to Unicode. The process involves the conversion of every non-Unicode table in the database totalling around 2TB of data.

    The first stage of the Migration was to restore the database to a new Server which went without a hitch. Because the restore took around 4 hours I decided to take a snapshot of the database to err on the side of caution. What happened next surprised me: The conversion, which had been tested several times over although without the snapshots was running between 10-15 times slower than the practice runs on the same Server.

    When I looked a Little deeper I discovered in the error logs that there was massive contention in the TEMPDB, something we hadn't seen during practice. Furthermore, the database has 10 files and trace flag 1118 is also turned on. The question at this Point was "why was only one tempdb being used and not all 10"? When I used the Task Manager to see what activity was occuring, there was around 4x as much activity as in the source database! I can't understand why....

    When I removed the snapshot the database went back to its normal very fast state with no further interaction needed from me....

    Can anyone please explain this phenomenon and why a snapshot can so adversely affect database Performance?

    Many thanks in advance...

    Regards,

    Kev

  • kevaburg (2/20/2016)


    Hi Folks,

    we are doing a database Migration that involves a conversion of non-Unicode data to Unicode. The process involves the conversion of every non-Unicode table in the database totalling around 2TB of data.

    The first stage of the Migration was to restore the database to a new Server which went without a hitch. Because the restore took around 4 hours I decided to take a snapshot of the database to err on the side of caution. What happened next surprised me: The conversion, which had been tested several times over although without the snapshots was running between 10-15 times slower than the practice runs on the same Server.

    When I looked a Little deeper I discovered in the error logs that there was massive contention in the TEMPDB, something we hadn't seen during practice. Furthermore, the database has 10 files and trace flag 1118 is also turned on. The question at this Point was "why was only one tempdb being used and not all 10"? When I used the Task Manager to see what activity was occuring, there was around 4x as much activity as in the source database! I can't understand why....

    When I removed the snapshot the database went back to its normal very fast state with no further interaction needed from me....

    Can anyone please explain this phenomenon and why a snapshot can so adversely affect database Performance?

    Many thanks in advance...

    Regards,

    Kev

    Did you actually read up on this feature before deciding it would be something you really wanted to do? Once you do that you will find out information about version store and copy-on-write and see why tempdb is getting it's teeth kicked in.

    As for 10 tempdbs not being used, I presume you meant 10 data files. What errors were you seeing in the error log related to tempdb activity? Perhaps the "15 seconds for IO" ones? That would not be surprising.

    How did you see tempdb performance in Task Manager?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (2/20/2016)


    kevaburg (2/20/2016)


    Hi Folks,

    we are doing a database Migration that involves a conversion of non-Unicode data to Unicode. The process involves the conversion of every non-Unicode table in the database totalling around 2TB of data.

    The first stage of the Migration was to restore the database to a new Server which went without a hitch. Because the restore took around 4 hours I decided to take a snapshot of the database to err on the side of caution. What happened next surprised me: The conversion, which had been tested several times over although without the snapshots was running between 10-15 times slower than the practice runs on the same Server.

    When I looked a Little deeper I discovered in the error logs that there was massive contention in the TEMPDB, something we hadn't seen during practice. Furthermore, the database has 10 files and trace flag 1118 is also turned on. The question at this Point was "why was only one tempdb being used and not all 10"? When I used the Task Manager to see what activity was occuring, there was around 4x as much activity as in the source database! I can't understand why....

    When I removed the snapshot the database went back to its normal very fast state with no further interaction needed from me....

    Can anyone please explain this phenomenon and why a snapshot can so adversely affect database Performance?

    Many thanks in advance...

    Regards,

    Kev

    Did you actually read up on this feature before deciding it would be something you really wanted to do? Once you do that you will find out information about version store and copy-on-write and see why tempdb is getting it's teeth kicked in.

    As for 10 tempdbs not being used, I presume you meant 10 data files. What errors were you seeing in the error log related to tempdb activity? Perhaps the "15 seconds for IO" ones? That would not be surprising.

    How did you see tempdb performance in Task Manager?

    **Hanging head in shame** I didnt read up enough to understand the potential Impact it would have especially on this System....

    You are right, I meant 10 TempDB datafiles. It was the typical "15 seconds for IO" message that I associate with contention. What bothered me the most is that within the Task Manager only one datafile was being used. Once the snapshot was removed the message disappeared and the spped we associate with thisinstance was resumed...

  • You are NOT the first to do something with SQL Server without really understanding what you are doing or why. As a SQL Server consultant I make a lot of money off of that, BTW. 😎

    How did you get Windows Task Manager to show per-file metrics? I wasn't aware it had that capability. In fact I didn't know it could even show sql server database level activity. Maybe you meant SSMS Activity Monitor (which I don't use BTW)?

    Also, you say 10 data files for tempdb. Question: how many spindles are backing up your tempdb volume (and how many OTHER things are on those same spindles)? If you can't answer that question by looking up an existing document or web page your DBA/Admin person/team needs to get that done.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (2/24/2016)


    You are NOT the first to do something with SQL Server without really understanding what you are doing or why. As a SQL Server consultant I make a lot of money off of that, BTW. 😎

    How did you get Windows Task Manager to show per-file metrics? I wasn't aware it had that capability. In fact I didn't know it could even show sql server database level activity. Maybe you meant SSMS Activity Monitor (which I don't use BTW)?

    Also, you say 10 data files for tempdb. Question: how many spindles are backing up your tempdb volume (and how many OTHER things are on those same spindles)? If you can't answer that question by looking up an existing document or web page your DBA/Admin person/team needs to get that done.

    I'm going to play the guessing game πŸ™‚

    If I had to guess, it probably technically wasn't Task Manger, but Resource Monitor arrived at via Task Manager (i.e., from Task Manager go to the Performance tab, select Resource Monitor, go to the Disk tab, and watch the Disk Activity portion).

    Cheers!

  • Jacob Wilkins (2/24/2016)


    TheSQLGuru (2/24/2016)


    You are NOT the first to do something with SQL Server without really understanding what you are doing or why. As a SQL Server consultant I make a lot of money off of that, BTW. 😎

    How did you get Windows Task Manager to show per-file metrics? I wasn't aware it had that capability. In fact I didn't know it could even show sql server database level activity. Maybe you meant SSMS Activity Monitor (which I don't use BTW)?

    Also, you say 10 data files for tempdb. Question: how many spindles are backing up your tempdb volume (and how many OTHER things are on those same spindles)? If you can't answer that question by looking up an existing document or web page your DBA/Admin person/team needs to get that done.

    I'm going to play the guessing game πŸ™‚

    If I had to guess, it probably technically wasn't Task Manger, but Resource Monitor arrived at via Task Manager (i.e., from Task Manager go to the Performance tab, select Resource Monitor, go to the Disk tab, and watch the Disk Activity portion).

    Cheers!

    Thanks for the comments....

    Firstly....

    @kevin: This from TechNet as a reason why a snapshot could be used and the reason why I used it:

    β€’Before doing major updates, such as a bulk update or a schema change, create a database snapshot on the database protects data. If you make a mistake, you can use the snapshot to recover by reverting the database to the snapshot. Reverting is potentially much faster for this purpose than restoring from a backup; however, you cannot roll forward afterward.

    We were performing a Non-Unicode to Unicode conversion of around 5TB of Data. 15TB of space was available on the LUN where this conversion was taking place and a staging instance had been Setup off this LUN to Support it. This was a process that very few with this particular product have done and we had no baseline Information upon which we could fall back on. The reason the snapshot was made was viable but the fact only one TempDB file was being used is what surprised me. This was the reason Performance was so badly affected and it is this Point that I am asking to be clarified.

    As to the spindle question: I have used the word LUN but Volumes would be more accurate. The SQL Server storage space is dedicated and nothing else runs in that area. 30 spindles Support the TempDB. This has been confirmed by the SAN admins....

    And yes.....I said Task Manager but as has been correctly guessed I opened the Activity Monitor to look at the individual files activity. There was no Need to Analyse database activity because it was very obviously an issue with contention because a lot was Happening and only one TempDB file was being used.The logs confirmed that.

  • kevaburg (2/24/2016)


    It was the typical "15 seconds for IO" message that I associate with contention. What bothered me the most is that within the Task Manager only one datafile was being used. Once the snapshot was removed the message disappeared and the spped we associate with thisinstance was resumed...

    Allocation contention is not responsible for this warning message, I\O contention may well be though

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" πŸ˜‰

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply