Large Log size for TempDB in Stored Procedure run

  • Hi all,

    I am having trouble with the size of my log file for the TempDB on a SqlServer 2005 server.

    The situation is this:

    We have an application that is used by our clients to manage their financial data. One function of the application is to generate a financial report. This financial report uses a very complex and long stored procedure that relies on a large number of temp tables (over 100). For most of our clients the stored procedure performs well because the actual amount of data for those clients is small. One particularly large client however has a large amount of data that causes some of the temp tables to grow very large (50 million rows). I was surprised to see the tempdb.mdf file and the tempdb log file for this client grow each to around 23 GB. I suppose the growth of the TempDB makes sense given the size of the temp tables. But, I thought the log file would only grow if there were uncommitted transactions. There are no transaction statements in the stored procedure so each DML statement should commit upon completion allowing the Log to be re-used. At least that's my understanding but if someone can correct my interpretation I'd appreciate it.

    I have two questions about this particular situation.

    1. Given that there are no transaction blocks in this stored procedure, why would the TempDB log file grow so large? Is it possible that 23GB is the size of logging an insert into a single 50 million row table? (Columns would be around 10, all int/numeric/date fields) If not, why is it not re-using the Log file?

    2. Currently all data is handled with single statements. In terms of the size of temp tables, is there a point at which it makes more sense to try to batch inserts and updates? Currently, the inserts look like this:

    INSERT INTO #tbl1

    (col1, col2, col3, col4, col5)

    SELECT c.col1, c.col2, c.col3, c.col4, c.col5 * a.fair_market_value as col5

    FROM #tbl2 c inner join #tbl3 a on c.col2 = a.col1

    order by c.col2, c.col3

    If the temp table is especially large, does it make more sense to break up the insert with some kind of dynamic batching:

    Pseudo-Code:

    if count(*) #tbl1 > 10,000,000

    BEGIN

    WHILE @start < count(*)

    BEGIN

    Insert into #tbl1

    ...

    select ...

    from #tbl2 inner join #tbl3

    where pk between @start and @end

    ...

    set @start = @start + 1,000,000

    ...

    END

    END

    Would this prevent the excessive log file growth? At what magic number does this make sense to do rather than letting Sql Server handle all the data at once? (I know that's a question no one will answer for this particular scenario, but is it a reasonable question?)

    Thanks in advance for any suggestions or advice.

  • "Working with tempdb in SQL Server 2005"

    http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx

    addresses several of your points.

    Regarding your #2 question, yes, there are times when it makes sense to break up large inserts, updates, and deletes. When to do it depends more on the amount of data involved than just the number of rows. A few hundred records with stuffed image data can be too much while a hundred thousand ints can sail by. You have to run tests to find out what works for your app.

  • If the query is really long and it's creating and loading huge temporary tables, it won't complete the transaction and clear the log of those temporary tables until the context within which they are running is cleared. Meaning, until the proc itself finishes.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply