June 8, 2011 at 1:26 pm
I have a fairly simple query that inserts distinct records from table A into table B. There are approximately 700 million records to be inserted. The query has been running for a while now and, when I look at the table properties, under storage, I see that the row count is pretty much where I would guess that it should be upon completion and, likewise, has not increased for a couple of hours now.
I'm a bit confused as to why the query is still running and why it seems to be making many reads/writes from TempDB. I'm also confused as to why there's a GHOST CLEANUP process that is running against my database, as I was under the impression that this was only for cleaning records that were marked for deletion.
I do have a multi-column index on the destination table but I've noticed that, at least according to the table stats, the index space has remain largely unchanged, which is not what I would've expected.
My own intuition tells me that, more than likely, the table properties (as seen through the SSMS GUI) are misleading and that it just shows an estimate of the rows to be inserted, not the number of rows that actually exist in the table right now. However, when I run
SELECT rows
FROM sysindexes
WHERE id = OBJECT_ID(‘table_name‘) AND indid < 2
- I am presented with my expected row count. I ran a similar query with a slightly smaller source table (around 500 million rows to be inserted) and had no problems. I'm sure that this is something stupid that I've overlooked, but would appreciate any hints.
June 8, 2011 at 6:14 pm
My initial response is that if you are doing a select distinct on a large table then you are forcing an expensive sort that has probably used a great deal of tempdb space. the insert may have committed but the query is not complete because it has to clean up all the work it did in tempdb.
June 8, 2011 at 6:19 pm
Makes sense. Thanks.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply