Creating tables in tempdb

  • I've seen situations where people create a temporary "staging" table in tempdb before saving the final data into a user database and dropping the table in the tempdb.

    Is this common practice and if so, what are the benefits of creating staging tables in tempdb?

    The only reason I can think of is to avoid the user database files from growing too large (my understanding is that if/when the SQL Server is restarted, the tempdb is purged and recreated, so any space previously used by the staging tables will be reclaimed by the OS - does that sound right?).

  • One big reason for doing this is concurrency.  If you have a large number of transactions where the same proceedure can be called multiple times in a short period of time, there is a good chance that the procedure will be called a second time before the first call is completed.  By using temp tables, the table name is actually modified for each session and two tables with the same virtual name can co-exist.

    To see this, do this.

    CREATE TABLE #Test ( ID int)

    SELECT * FROM tempdb..SysObjects where Name LIKE '%Test%'

    You should find a table that is not named exactly #Test but instead #Test___________________(something here)

     

    Otehr reasons for using temp tables:

    Lazy - Temp tables are automatically dropped after the session ends.  No need to check if the table exists and drop if it does.  Also easy to SELECT * INTO #Temp FROM Table

    Performance - Usually the TempDB resides on a different physical disk drive or array. By having it on a different drive, you have two drives getting the data for you.

    Habit\Old Code - With SQL 2000, Table variables could be used to store data in memory.  Some programmers may not either like this approach or just have the habit of using temp tables.

     

    I'm sure there are lots of other reasons to use temp tables as well.

     

  • It's also useful for ad hoc results, as cleaning up databases is a pain when you have to determine whether a given table is really needed or not. It's sort of like a self-cleaning sandbox database.

  • Another big reason for using Temp tables is to help prevent blocking and deadlock situations along with data that you need to reference multiple times throughout the code. 

    Since rows are blocked while a select is occurring, if you need to manipulate data while you step through results of a query, the data in the select cannot be modified.

    So save the read only results to a temp table. Release the locks, and then the table is free to be updated again.

    Now you can repeat the query muliple times, or you can store the results in a "Temp Table"  That you can reference over and over again until you are done. 

    As for the question is it a common practice.  Yes

  • Last, but not least... security... everybody can make a table in TempDB... might not be true on the "given" DB for whatever user the sproc is logged in as...

    AND, TempDB is set to a SIMPLE recovery mode which can be blazingly fast for staging table creation if done correctly...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (21/09/2007)


    Last, but not least... security... everybody can make a table in TempDB... might not be true on the "given" DB for whatever user the sproc is logged in as...

    AND, TempDB is set to a SIMPLE recovery mode which can be blazingly fast for staging table creation if done correctly...

    Thank you for all the replies. 🙂

    Jeff, could you give an example of how you would create a staging table? I know of a few different ways to create a table but I don't know which would be more efficient. Or are you referring to the entire process of creating and populating it being fast if done correctly?

  • Heh... I was mostly referring to the entire process.  For example, If headers are present in the data, they must have the same number of delimiters as the data or you'll miss some data and generated an error even when you identify the "First" row to load.  If the data has mixed delimiters (especially the infamous "quoted CSV" format), then you must have a correctly setup BCP Format file (instead of loading the whole row into a single column and parsing using a split function).  And, if you're using dynamic SQL to BULK INSERT into a temp table, some additional permissions will be required in temp db.

    And, finally, I always add an IDENTITY column to the staging table and make it the Clustered Primary Key so I can easily do things like "dupe checks"... I'm careful to NOT add any other indexes on the table until after I get the data loaded so the logging is kept to an absolute minimum.  Here's what BOL 2k says about it...

    Logged and Minimally Logged Bulk Copy Operations

    When using the full recovery model, all row-insert operations performed by bcp are logged in the transaction log. For large data loads, this can cause the transaction log to fill rapidly. To help prevent the transaction log from running out of space, a minimally logged bulk copy can be performed if all of these conditions are met:

    • The recovery model is simple or bulk-logged.
    • The target table is not being replicated.
    • The target table does not have any triggers.
    • The target table has either 0 rows or no indexes.
    • The TABLOCK hint is specified. For more information, see Controlling the Locking Behavior.

    Any bulk copy into an instance of Microsoft® SQL Server™ that does not meet these conditions is logged.

    Before doing bulk copy operations, it is recommended that you set the recovery model to bulk-logged if you usually use full recovery. This will prevent the bulk copy operations from using excessive log space and possibly filling the log. However, even with bulk-logged recovery, some transaction log space will be used. You may want to create transaction log backups during the bulk copy operation to free up transaction log space.

    When bulk copying a large number of rows into a table with indexes, it can be faster to drop all the indexes, perform the bulk copy, and re-create the indexes. For more information, see Optimizing Bulk Copy Performance.

    Note  Although data insertions are not logged in the transaction log when a minimally logged bulk copy is performed, SQL Server still logs extent allocations each time a new extent is allocated to the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Note: Temp tables can also leave in memory, not necessarily always in the harddrive. When a system uses replication heavily you should try to minimize all changes on the "replicated" ones and temp tables are very handy for that.


    * Noel

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply