July 6, 2005 at 5:31 pm
Table A has 75 million records of raw data. Certain fields were pulled from Table A for certain other processing and put into Table B. After that processing, 62 million records survived. I have to join tables A and B to stack the raw data from A to the processed data in B.
I am creating a table (table C) of about 65 million records using a join for this purpose.
First I created the shell for Table C.
Then I am using an INSERT statement with the fields I need from the two tables.
I am joining them on a field (rec_id) that is a unique clustered index on each of the two tables. The rec_ids are the same in A and B.
The join has been running for quite some time so I am starting to get concerned.
How long should a table join/insert using source tables of this size take (within reason)?
Is there a way to check on the status of how many records have been inserted into table C that won't disrupt the process? (Using Enterprise Manager, the table/properties shows a rowcount of 0)
How can I increase performance on this type of situation?
Your thoughts are appreciated.
Thanks.
July 7, 2005 at 3:28 am
Try doing:
SELECT * FROM TableC WITH (NOLOCK)
That sometimes works. Let me know if it works or not - i'd be interested to know.
Jamie Thomson
http://sqlblog.com/blogs/jamie_thomson
July 7, 2005 at 4:01 am
The delay is possibly related to the size of the single transaction into tablec.
Can this process be broken into smaller sub-steps with smaller commit points, where each successive execution, loads a seperate previously unprocessed batch of data?
Have you also got the best possible execution plan on the 'select into tablec' statement?
July 7, 2005 at 1:04 pm
Thanks...
I tried
SELECT * FROM TableC WITH (NOLOCK)
but that appeared to hang and in one instance rendered both the original process and the select...with (nocount) into sleeping mode in Enterprise Manager.
One problem we discovered is that Table A had the rec_id field as integer, while table B had it as varchar. We copied table B into a new table with the rec_id as an integer and then ran a clustered index on it.
Result: a set rowcount 10000 + the complex join to select records returned a recordset in 20 seconds. However, when next trying to select that resultset into table C ( select ... into C from ...join yadayadayada) the process ran 20 minutes before I killed it. (the select count with nolock produced 0)
July 27, 2005 at 3:25 am
"One problem we discovered is that Table A had the rec_id field as integer, while table B had it as varchar"...DATA conversion issues will ALWAYS cause performance (and data quality) problems
That's why implementing Referential Integrity/Constraints in the database must always be done....it will prevent this sort of problem.
Re solving your overall problem, have a look at the "estimated execution plans" and/or use "Profiler" to determine what actions are taking place under the covers. There must be a solution. If necessary reduce that SET ROWCOUNT to 100 or 10.
July 27, 2005 at 3:40 pm
Thank you for your reply.
We made several modifications, including:
1) making sure datatypes on the join fields were consistent between tables
2) breaking it up into smaller chunks
3) viewing execution plans and optimizing based on that
4) banging head against wall...
wait, I did that before, and it didn't help much.
Seriously, though, we took all the advise into consideration and it helped immensely. This file and dataset were larger and beyond the scope than the present systems were design to handle. Basically, all the flaws had never come up before because the file sizes were small enough the the flaws were masked or simply ignored and the results of processing were within accepted norms.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply