"tempdb" size -- 550GB!

  • We use a 3rd party ETL tool instead of SSIS (not my choice). I put a "cap" on tempdb growth at 450GB and the weekend ETL job failed. Last week I upped the cap to 550GB. While I haven't heard from the developer about the weekend job I see that tempdb has grown up to the cap of 550GB.

    The 3rd party tool is GUI based and the developer has no notion why tempdb is getting so large. What tools or approach can I use to get a better understading of why this is happening.

    Barkingdog

  • Any additional logging options available in the 3rd party app?

    There may be an option somewhere to turn on more verbose logging perhaps?

    Failing that, I think SQL Profiler is going to be your best bet, that might give you some clue as to what kind of SQL statements are being fired at the server.

  • Barkingdog (5/4/2009)


    We use a 3rd party ETL tool instead of SSIS (not my choice). I put a "cap" on tempdb growth at 450GB and the weekend ETL job failed. Last week I upped the cap to 550GB. While I haven't heard from the developer about the weekend job I see that tempdb has grown up to the cap of 550GB.

    The 3rd party tool is GUI based and the developer has no notion why tempdb is getting so large. What tools or approach can I use to get a better understading of why this is happening.

    Barkingdog

    There is only thing that can cause TempDB tp can grow so large... a cross join of some sort in some code somewhere. There are a lot of things that will cause TempDB to grow pretty large, but only a full or partial cross join will blow it out of the water like that. You need to reset TempDB to a reasonable size and setup a trace for TempDB growth to help isolate the code that's causing this monstrosity.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I agree with what you wrote:

    >>

    You need to reset TempDB to a reasonable size and setup a trace for TempDB growth to help isolate the code that's causing this monstrosity.

    >>

    I'm just afraid that the Tracelog may be 100GB!

    Barkingdog

  • Barkingdog (5/4/2009)


    Jeff,

    I agree with what you wrote:

    >>

    You need to reset TempDB to a reasonable size and setup a trace for TempDB growth to help isolate the code that's causing this monstrosity.

    >>

    I'm just afraid that the Tracelog may be 100GB!

    Barkingdog

    The fear is understood... but you're only going to trace growth events. There will likely be only one thing that causes TempDB to grow beyond 400GB... Reset TempDB to 400GB with a growth of 500MB per... you're trace won't be more than a couple of lines that way.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good idea, Jeff.

    I will use the sql 2008 "Data File Auto Grow" and "Log File Auto Grow" trace events for the given database. (From what I understand, the 3rd party tool does not use stored procs -- essentially, in-line sql only. So I'll use Sql: StmtCompleted to try and catch the offenders.)

    Are there any other "counters" I should be tracing?

    Barkingdog

  • Nah... that should do it. Just make sure that "TextData" is one of the columns returned so you can see the actual offending SQL. Since you're not interested in catching the SQL Server "heartbeat" routines or anything the system spawns and you're really trying to catch what I believe is an accidental cross-join, you might try adding a filter on the ROWCOUNT column of something like > 1,000,000 rows.

    It's been a while since I've had to do this type of thing.... not 100% if I remember ROWCOUNT as the correct thing to filter on for this type of thing... I think so but no guarantee.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry... my bad... looking for a ROWCOUNT > 1,000,000 won't count the internal rows generated as is so typical of a cross join. Perhaps just setting the duration to >= 60,000 (one minute in milliseconds) would do the trick for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • More info:

    1. Temp 550GB was not enough. t-sql job ran into 550GB limit and failed.

    2. Found statement causing the issue. The essence is shown below:

    insert into

    ..

    select distinct ....

    ....

    from

    tableA join tableB join tableC join tableD join tableE

    (with lots of AND, ORs, etc.)

    with the following rowcounts:

    TableA = 60 millions rows (!)

    TableB = 700,000 rows

    TableC = 85,000 rows

    TableD = 157 rows

    TableE = 2.2 millons rows

    This is one massive SELECT in a single transaction. NO wonder the trans log is getting full! Any suggestions how to deal with this monster? (Part of the problem is also that I have no way of estimating how to size tempdb properly. 550GB, 1 TB? 1PB?)

    Barkingdog

  • Barkingdog (5/9/2009)


    More info:

    1. Temp 550GB was not enough. t-sql job ran into 550GB limit and failed.

    2. Found statement causing the issue. The essence is shown below:

    insert into

    ..

    select distinct ....

    ....

    from

    tableA join tableB join tableC join tableD join tableE

    (with lots of AND, ORs, etc.)

    with the following rowcounts:

    TableA = 60 millions rows (!)

    TableB = 700,000 rows

    TableC = 85,000 rows

    TableD = 157 rows

    TableE = 2.2 millons rows

    This is one massive SELECT in a single transaction. NO wonder the trans log is getting full! Any suggestions how to deal with this monster? (Part of the problem is also that I have no way of estimating how to size tempdb properly. 550GB, 1 TB? 1PB?)

    Barkingdog

    Just because the tables have a lot of rows, doesn't mean that TempDB should blow out like that. There's an accidental cross join somewhere in the code. Find the really big arrow (large number when you hover on it) in the estimated execution plan for the query... that will lead you to the cause of the problem. Then, some common sense after that will be the cure for the problem.

    I had a similar problem with similar rowcounts... I fixed it by moving the selection of data from one of the tables to a separate much smaller temp table. Then I joined on that. The query ran literally in seconds with nearly no TempDB usage instead of failing after hours of running because it blew out TempDB.

    On a slightly different problem, the criteria was simply incorrect. The developer had left out a rather critical join predicate and tried to use DISTINCT to make up for it. Sound familiar? Adding the correct criteria made the query run in seconds.

    No system should need a half tera byte of TempDB especially with those relatively small rowcounts.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Barkingdog (5/9/2009)Found statement causing the issue.

    Please consider asking the questions below to yourself:

    1- Is this something that was working before and out of the blue started behaving this way?

    2- Did the bahavior changed after new code was added/altered?

    Either way I would suggest to check the execution plan of the select statement populating the insert -It wouldn't surprise me if there are missing/wrong WHERE conditions causing the issue.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Jeff,

    Per the Extimated Execution Plan, TableA (with 60 millon rows) uses about 56% of the query for a table scan! (I will investigate the "predicate" shown in the display for that tale for a hint on creating an index.) It also found another table scan (this time recommending a non-clustered index) but that would account for about %15 per cent of the query. Everything else is in the 1 - 5 % range.

    While the indexes can dramatically impact the processing speed I don't see the relationship between them and tempdb size. Maybe I'm looking in the wrong place still?

    Barkingdog

  • Barkingdog (5/9/2009)


    Jeff,

    Per the Extimated Execution Plan, TableA (with 60 millon rows) uses about 56% of the query for a table scan! (I will investigate the "predicate" shown in the display for that tale for a hint on creating an index.) It also found another table scan (this time recommending a non-clustered index) but that would account for about %15 per cent of the query. Everything else is in the 1 - 5 % range.

    While the indexes can dramatically impact the processing speed I don't see the relationship between them and tempdb size. Maybe I'm looking in the wrong place still?

    Barkingdog

    Yes, the wrong place... well, almost the wrong place. You're looking at all the "boxes". This is one time when, instead of reading between the lines, you should be reading the lines. Look for thick arrows, right click on them, and see what the actual rowcount is. You will likely be amazed. Then, follow those arrows back to their "source boxes" and that will help you isolate the problem (most likely a cross join or two).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • See if there is any kind of sort in execution plan. Cross join can produce a huge result, but not necessarily anything in tempdb.

    If you have an "order by" or "union" or "distinct" and there is no index to use, it has to materialize the whole resultset and sort it to order it and/or remove duplicates.

  • I'd recommend to run a trace, and look for autogrow events.

    Paralelly start a script which runs in every minutes, and writs out the inputbuffer of each spid>49, plus the last batch start time.

    I expect you will see that there are multiple autogrowth events from the same command, which is running very long time, and you can catch waht does it executing.

    Adding sql/SP completed events to the trace could help further.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply