September 21, 2007 at 9:16 am
I've seen situations where people create a temporary "staging" table in tempdb before saving the final data into a user database and dropping the table in the tempdb.
Is this common practice and if so, what are the benefits of creating staging tables in tempdb?
The only reason I can think of is to avoid the user database files from growing too large (my understanding is that if/when the SQL Server is restarted, the tempdb is purged and recreated, so any space previously used by the staging tables will be reclaimed by the OS - does that sound right?).
September 21, 2007 at 9:58 am
One big reason for doing this is concurrency. If you have a large number of transactions where the same proceedure can be called multiple times in a short period of time, there is a good chance that the procedure will be called a second time before the first call is completed. By using temp tables, the table name is actually modified for each session and two tables with the same virtual name can co-exist.
To see this, do this.
CREATE TABLE #Test ( ID int)
SELECT * FROM tempdb..SysObjects where Name LIKE '%Test%'
You should find a table that is not named exactly #Test but instead #Test___________________(something here)
Otehr reasons for using temp tables:
Lazy - Temp tables are automatically dropped after the session ends. No need to check if the table exists and drop if it does. Also easy to SELECT * INTO #Temp FROM Table
Performance - Usually the TempDB resides on a different physical disk drive or array. By having it on a different drive, you have two drives getting the data for you.
Habit\Old Code - With SQL 2000, Table variables could be used to store data in memory. Some programmers may not either like this approach or just have the habit of using temp tables.
I'm sure there are lots of other reasons to use temp tables as well.
September 21, 2007 at 11:46 am
It's also useful for ad hoc results, as cleaning up databases is a pain when you have to determine whether a given table is really needed or not. It's sort of like a self-cleaning sandbox database.
September 21, 2007 at 12:12 pm
Another big reason for using Temp tables is to help prevent blocking and deadlock situations along with data that you need to reference multiple times throughout the code.
Since rows are blocked while a select is occurring, if you need to manipulate data while you step through results of a query, the data in the select cannot be modified.
So save the read only results to a temp table. Release the locks, and then the table is free to be updated again.
Now you can repeat the query muliple times, or you can store the results in a "Temp Table" That you can reference over and over again until you are done.
As for the question is it a common practice. Yes
September 21, 2007 at 5:47 pm
Last, but not least... security... everybody can make a table in TempDB... might not be true on the "given" DB for whatever user the sproc is logged in as...
AND, TempDB is set to a SIMPLE recovery mode which can be blazingly fast for staging table creation if done correctly...
--Jeff Moden
Change is inevitable... Change for the better is not.
September 24, 2007 at 3:37 am
Jeff Moden (21/09/2007)
Last, but not least... security... everybody can make a table in TempDB... might not be true on the "given" DB for whatever user the sproc is logged in as...
AND, TempDB is set to a SIMPLE recovery mode which can be blazingly fast for staging table creation if done correctly...
Thank you for all the replies. 🙂
Jeff, could you give an example of how you would create a staging table? I know of a few different ways to create a table but I don't know which would be more efficient. Or are you referring to the entire process of creating and populating it being fast if done correctly?
September 24, 2007 at 8:10 am
Heh... I was mostly referring to the entire process. For example, If headers are present in the data, they must have the same number of delimiters as the data or you'll miss some data and generated an error even when you identify the "First" row to load. If the data has mixed delimiters (especially the infamous "quoted CSV" format), then you must have a correctly setup BCP Format file (instead of loading the whole row into a single column and parsing using a split function). And, if you're using dynamic SQL to BULK INSERT into a temp table, some additional permissions will be required in temp db.
And, finally, I always add an IDENTITY column to the staging table and make it the Clustered Primary Key so I can easily do things like "dupe checks"... I'm careful to NOT add any other indexes on the table until after I get the data loaded so the logging is kept to an absolute minimum. Here's what BOL 2k says about it...
When using the full recovery model, all row-insert operations performed by bcp are logged in the transaction log. For large data loads, this can cause the transaction log to fill rapidly. To help prevent the transaction log from running out of space, a minimally logged bulk copy can be performed if all of these conditions are met:
Any bulk copy into an instance of Microsoft® SQL Server™ that does not meet these conditions is logged.
Before doing bulk copy operations, it is recommended that you set the recovery model to bulk-logged if you usually use full recovery. This will prevent the bulk copy operations from using excessive log space and possibly filling the log. However, even with bulk-logged recovery, some transaction log space will be used. You may want to create transaction log backups during the bulk copy operation to free up transaction log space.
When bulk copying a large number of rows into a table with indexes, it can be faster to drop all the indexes, perform the bulk copy, and re-create the indexes. For more information, see Optimizing Bulk Copy Performance.
Note Although data insertions are not logged in the transaction log when a minimally logged bulk copy is performed, SQL Server still logs extent allocations each time a new extent is allocated to the table.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 24, 2007 at 8:34 am
Note: Temp tables can also leave in memory, not necessarily always in the harddrive. When a system uses replication heavily you should try to minimize all changes on the "replicated" ones and temp tables are very handy for that.
* Noel
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply