December 14, 2005 at 9:05 am
Hello,
I have been attempting to move a large amount of data from a view into a permanent table. The permanent table has NO indexes on it currently. I have been attempting the following:
insert into perm_table
select * from myView
My problem is that I run a job that does this overnight, and after 11 hours, the tempDB eats up all available space on the server (FYI... the table is about 60 GB, and the tempDB file grows to about 180 GB). Nothing else is happening overnight on the server to increase the size of tempDB.
So, I have 2 questions:
1) Why does this insert cause tempDB to grow so big?
2) Based on my space constraints, is BCP really the only way to go?
My end result is to get this data moved to the permanent table in the quickest time possible, and without running out of space on the server.
Any help that you all can provide would be GREATLY appreciated.
December 14, 2005 at 9:33 am
Here are my suggestion:
1.Put index on your original table.
2.I would advise that you do devide and conquere strategy because this way you don't use up all your tempdb and the sort.
3.By having index it should be faster to return your query for data.
4.Sometimes all it takes is a little bit of patient to get it done. (You could write up a loop or cursor to go through batches of 500,000 row at a time for example.)
Good Luck
mom
December 14, 2005 at 10:07 am
Thank you for the feedback. The data I am trying to move is not in a table, but a view. I am looking into the possibility of indexing the view, but that is something new to me.
Even if I use the divide and conquer strategy, do you agree BCP would be faster and have the added benefit of not using any tempDB space?
December 14, 2005 at 2:20 pm
I am not sure. If you BCP out, you have to BCP in unless you do buck insert. I have never try buck insert because I haven't had a situation in which I need one.
mom
December 15, 2005 at 7:22 am
One simple solution is to create indexes on your view. So the view will be materialized (a physical copy of data). SQL server still maintains the data in the view when data are inserted/deleted/updated in its base tables. Another benifit is you do not need to change your SPs and queries that use the view (view name is not changed).
December 15, 2005 at 7:54 am
You need to break the insert into chunks. Is there something in the view that could be used like a key (doesn't have to be unique), preferably a field that is a clustered key or index on some of the underlying tables? You could use something like this:
set
@minkey = -1 -- Assuming -1 is smaller than any possible value
while 1=1 begin
select @maxkey = max(key) from (
select top 100000 key from myview
where key > @minkey order by key) x
if @maxkey is null break
insert into perm_table
select * from myview
where key > @minkey and key <= @maxkey
set @minkey = @maxkey
end
It doesn't matter if there is an index on perm_table, it would only slow down the insert. You might look at the execution plan for the view however. The growth in tempdb suggests it is doing a lot of work building hash tables and intermediate sorting.
December 15, 2005 at 8:56 am
Thanks for all of the feedback. I could not index the view because it broke a few of the "rules" surrounding indexed views (some columns couldn't be schema bound, there were left outer joins etc).
I decided to create a number of views that "chunk" the data, and then I created a .bat file that performed the BCP in, BCP out, and deletion of the .bcp file. Worked well, and I believe it was faster and tempDB took much less of a hit.
December 15, 2005 at 9:06 am
Your view does reference real tables in the SELECT statement of the view. Look at the JOINs in your view and be sure to have an index on both sides of the JOIN, i.e.,
SELECT sls.ProdID, prd.ProdName, sls.Qty, ..., ..., and so on..
FROM dbo.SalesHistory sls
JOIN dbo.Products prd
ON prd.ProdID = sls.ProdID
Make sure you have a ProdID Index on Products and SalesHistory. You also need to look at your WHERE clause. If you're filtering for a certain SalesDate, then Index that column as well.
I'd also look for and avoid any inline functions in your select statement - I'd opt for rewriting the code to get the same results as part of the select statment itself.
SELECT ProdName = fnGetProdNameEvenIfItTakesLonger(sls.ProdID), sls.Qty, ..., ...,
FROM dbo.SalesHistory sls
If you STILL have problems, then I'd try batching the data as suggested.. or perhaps running the query multiple times for different date ranges, perhaps 6 or 12 months at a time. But as suggested already, do not index the target table until you're done populating it. Too much overhead will be spent updating the Indices during each INSERT. Wait till you're done and suffer thru it once.
Anyone else? Comments, questions or snide remarks?
- Mark
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply