January 12, 2006 at 2:41 am
Hello everybody!
I'm using the SQL script below to refresh one file from another.
ukmf_final_complete is being refreshed by ukmfjan06 using a multiple set statemenmt (that I assume is allowed)
There are about 1.7 million records involved.
I've run the query but have had to stop it as it has not completed after 19 hours.
Can anyone see where it could be going wrong.
UPDATE [ukmf_final_complete]
SET
[ukmf_final_complete].[Company] = [ukmfjan06].[Business Name],
[ukmf_final_complete].[Address1] = [ukmfjan06].[Address Line 1],
[ukmf_final_complete].[Address2] = [ukmfjan06].[Address Line 2],
[ukmf_final_complete].[Address3] = [ukmfjan06].[Address Line 3],
[ukmf_final_complete].[Address4] = [ukmfjan06].[Address Line 4],
[ukmf_final_complete].[Town] = [ukmfjan06].[Town],
[ukmf_final_complete].[County] = [ukmfjan06].[County],
[ukmf_final_complete].[Postcode] = [ukmfjan06].[Postcode],
[ukmf_final_complete].[Tel] = [ukmfjan06].[Telephone Number],
[ukmf_final_complete].[No_of_Employees] = [ukmfjan06].[Nr of Employees (Company)],
[ukmf_final_complete].[updated] = 'true'
FROM [ukmf_final_complete], [ukmfjan06]
WHERE
[ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]
Thanks in advance
Ben
January 12, 2006 at 4:28 am
Having browsed the web, I think my problem may be related to having two tables in the from clause, I'm therefore attempting the same using the EXISTS claus.
UPDATE [ukmf_final_complete]
SET
[ukmf_final_complete].[Company] =
(SELECT [ukmfjan06].[Business Name] FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]),
[ukmf_final_complete].[Address1] =
(SELECT [ukmfjan06].[Address Line 1] FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]),
[ukmf_final_complete].[Address2] =
(SELECT [ukmfjan06].[Address Line 2] FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]),
[ukmf_final_complete].[Address3] =
(SELECT [ukmfjan06].[Address Line 3] FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]),
[ukmf_final_complete].[Address4] =
(SELECT [ukmfjan06].[Address Line 4] FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]),
[ukmf_final_complete].[Town] =
(SELECT [ukmfjan06].[Town] FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]),
[ukmf_final_complete].[County] =
(SELECT [ukmfjan06].[County] FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]),
[ukmf_final_complete].[Postcode] =
(SELECT [ukmfjan06].[Postcode] FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]),
[ukmf_final_complete].[Tel] =
(SELECT [ukmfjan06].[Telephone Number] FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]),
[ukmf_final_complete].[No_of_Employees] =
(SELECT [ukmfjan06].[Nr of Employees (Company)] FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number]),
[ukmf_final_complete].[updated] = 'True'
WHERE EXISTS
(SELECT * FROM [ukmfjan06]
WHERE [ukmf_final_complete].[Duns_no] = [ukmfjan06].[D-U-N-S Number])
I'll let you know how it goes
January 12, 2006 at 5:27 am
Hi Ben,
I think your first UPDATE looked better from a set based approach. The second one has too many sub selects for me.
I would probably look at ...
1. Indexes and the estimated Execution Plan.
2. LOG space & placement and LOG growth.
Allen
January 12, 2006 at 7:37 am
Thanks Allen,
I ran a test on both and they both worked.
I agree with you the first looks much nicer.
I'm going to look into indexing to see if I can improve things
What do you mean by Log space/growth?
Thanks for your help
Ben
January 12, 2006 at 7:50 am
I've got the people who want the data on my back telling me that the bloke who I replaced used to do it in 'a couple of hours'
It's a bit of a nighhtmare!
The second script I posted above has now been going 4 hours.
Is there a maximum number of records SQL server can go through in one query?
Or does anyone know of a way to see how far it has got or how many records its processed?
Ta
Ben
January 12, 2006 at 8:56 am
Having looked at it
There are about 1.6 million matches
So thats 1.6M x 10 = 16Million fields to update
I'm now thinking that's going to take at least a day.
I don't think I should have stopped it after 19 hours.
Am I right?
You live and learn.
Now I've got another 20 hours of being hassled!
ta
Ben
January 12, 2006 at 9:47 am
When you run a large update like this, it requires log space. If your logfile isn't large enough, it may auto-grow. Auto-growing takes time, and depending on your auto-grow parameters may be an issue.
Also, large updates can sometimes be performed faster using "batches" in combination with SET ROWCOUNT. Since your table has a flag to indicate Updated, you could do a million at a time by using SET ROWCOUNT 1000000 and adding a:
WHERE [ukmf_final_complete].[updated] <> 'true'
January 12, 2006 at 12:19 pm
You may try dropping indexes on the table that includes any of the updated fields, then add it back and rebuild it after the update is complete.
January 13, 2006 at 3:50 am
Hi Ben - hope you are OK! Does the execution plan look OK, Is it using indexes on the dunns number?
January 13, 2006 at 4:26 am
I've now realised (thanks to your help) that I shoudl have indexed the Duns column.
I imported both the files from CSV files and, being a fool, did't index!
I can't stop it now though because its been running 24 hours and I have been given a drop dead date of Monday.
If I don't get the expected results or its still running on Monday I'll be looking for another job!
I'm looking at how to use the execution plan now on the web.
Tahnks for your hel[p
ben
January 13, 2006 at 4:32 am
Having looked at the predicted execution plan I have 12 table scans and having seen this comment
"Index or table scans: May indicate a need for better or additional indexes."
at http://www.sql-server-performance.com/query_execution_plan_analysis.asp
I think that shows that an index or two wouldn't hav gone a miss.
you live and learn as I find myself saying on a regular basis at the moment (thats a good thing i think)
Ben
January 13, 2006 at 4:37 am
Yep - Estimated execution plan is a great tool.
I know you have a drop dead of Monday but I wold consider killing the job and putting a CLUSTERED INDEX on the Dunns number in both tables. It can probably ? be a UNIQUE CLUSTERED.
You have 60+ hours.
January 13, 2006 at 4:41 am
Thanks
I agree, I'm concerned that the un indexed one could go forever
ben
January 13, 2006 at 8:22 am
update: 28 hours and counting for my query.
Buy we have another shed of a server I call Oldboy
I copied the tables in question accross to Oldboy
then reduced the field sizes from 8000 varchars to 250 varchars
and added indexes to the Duns number column and changed them to Ints (as they are in other tables) form 8000 varchars.
I then ran my original script and it did it in 1 min 27 seconds!
I've ran checks that columns have been updated and are matching - by looking for non matching columns
I Even cahnged a column value to nake sure the checks picked up mismatches and they did.
It looks like it worked. But I'm suspicious, its friday and maybe I'm missing something.
Could it possibly have updated 16M fields in 1min 27 secs???
Does clustered indexing really make that much difference??
January 13, 2006 at 8:36 am
Great news. The clustered index physically orders the data on disk. Updating one table from another probably does a Join and Joining on data from two tables where the data is read of disk in the same (or close) order should make the join easy. From what I have read there is so little overhead in having a clusterd index its almost always worth putting one in.
I'm guessing you are doing mail file prep.
Looks like you just got your weekend back
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply