October 17, 2008 at 6:31 am
Consider this statement:
update hl7_patienten
set woonplaats = woonplaatsnaam
--SELECT t1.patientnummer, woonplaats, woonplaatsnaam
from hl7_patienten t1
inner join ptr_init_patienten_z_woonp t2
on t1.patientnummer = t2.patientnummer
Table hl7_patienten has 1500000 records, no index on WOONPLAATS, table ptr_init_patienten_z_woonpl has 13000 records, no index on woonplaatsnaam. The related records in hl&-patienten have value NULL in woonplaats. Both tables have patientnummer as clustered primary key.
I have tried placing indices here, but it makes no difference, the update takes about an hour to run, which seems excessive to me. If I do a SELECT I have the result in a few seconds. When I look with SHOWPLAN I see the proces starts with a clustered index seek on both tables.
Any hints?
Greetz,
Hans Brouwer
October 17, 2008 at 8:34 am
I don't think adding indexes to the updated column or the column used for the update will help. In fact adding an index to the updated column will likely slow down the update as it will cause the data in the index to be re-arranged. You should look at locking and blocking. What are the number of rows actually being updated?
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
October 17, 2008 at 8:45 am
It's a long shot, but is there any difference if you use the alias in the UPDATE? I know it can, under some circumstances, make a substantial difference to performance.
update t1 set woonplaats = t2.woonplaatsnaam
--SELECT t1.patientnummer, woonplaats, woonplaatsnaam
from hl7_patienten t1
inner join ptr_init_patienten_z_woonp t2
on t1.patientnummer = t2.patientnummer
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 17, 2008 at 9:01 am
FreeHansje (10/17/2008)
Consider this statement:update hl7_patienten
set woonplaats = woonplaatsnaam
--SELECT t1.patientnummer, woonplaats, woonplaatsnaam
from hl7_patienten t1
inner join ptr_init_patienten_z_woonp t2
on t1.patientnummer = t2.patientnummer
Table hl7_patienten has 1500000 records, no index on WOONPLAATS, table ptr_init_patienten_z_woonpl has 13000 records, no index on woonplaatsnaam. The related records in hl&-patienten have value NULL in woonplaats. Both tables have patientnummer as clustered primary key.
I have tried placing indices here, but it makes no difference, the update takes about an hour to run, which seems excessive to me. If I do a SELECT I have the result in a few seconds. When I look with SHOWPLAN I see the proces starts with a clustered index seek on both tables.
Any hints?
How many records in hl7_patienten have a null value in woonplaats?
As this is posted in a SQL Server 7/2000 forum, I'd give this a try:
set rowcount 5000 -- modify this to increase the batch size if needed
while exists(select 1 from hl7_patienten where woonplaats is null)
begin
update hl7_patienten set
woonplaats = t2.woonplaatsnaam
--SELECT t1.patientnummer, woonplaats, woonplaatsnaam
from
hl7_patienten t1
inner join ptr_init_patienten_z_woonp t2
on (t1.patientnummer = t2.patientnummer)
where
woonplaats is null
end
set rowcount 0
π
October 18, 2008 at 1:28 am
Tnx for answering all.
I will try the suggestions given here.
The number of records having woonplaats IS NULL is exactly the number of records in the table T2: almost 13000. I found out, that adding
AND WHERE woonplaats IS NULL
increases performance tremendously: from 60 minutes to 22 minutes! Still a ridiculous time, but definitely an improvement.
Tnx again.
Greetz,
Hans Brouwer
October 18, 2008 at 8:18 am
t1.patientnummer = t2.patientnummer
Is this a 1 to 1 relationship? It's hard to imagine that an update like this could take so long with a proper join.
October 18, 2008 at 10:07 am
Heh... not difficult to imagine at all if there are a dozen indexes on the table and no maintenance has been done either on the indexes or the statistics. π
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2008 at 10:09 am
FreeHansje (10/18/2008)
Tnx for answering all.I will try the suggestions given here.
The number of records having woonplaats IS NULL is exactly the number of records in the table T2: almost 13000. I found out, that adding
AND WHERE woonplaats IS NULL
increases performance tremendously: from 60 minutes to 22 minutes! Still a ridiculous time, but definitely an improvement.
Tnx again.
When you actually run that statement, how many rows does it tell you got updated after you added the criteria? And, while the update is running, is there anything else happening to the table from somewhere else?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 20, 2008 at 3:27 am
Again,
Tnx for answering all.
There is a 1-on-1 relation between the 2 tables. There are 2 indices, 1 of them the PrimKey on t1, only 1 PrimKey on t2. Statistics are not bad, although there is no regular maintenance on this database, since we are working in a test environment. That means no1 else is busy with this database, something I checked when running this statement.
I'm not sure how many actual updates have been done, since I get an update result for each individual update; a guestimate makes me believe there are about 13000 updates, the right/expected amount.
However, since this is a 1-time process 22 minutes is acceptable, but it vexes me WHY it takes so long. The CLUSTERD INDEX SCAN in SHOWPLAN on table t1 tells me it goes through all the 1.5 mil records in that table; I would expect a much lower number.
Greetz,
Hans Brouwer
October 20, 2008 at 9:02 am
For S & G, put an index on woonplaats. If there are 1.5 million rows, but only 13,000 rows with a null value in woonplaats, you might see a marked improvement in the update query.
π
October 20, 2008 at 7:04 pm
FreeHansje (10/20/2008)
I'm not sure how many actual updates have been done, since I get an update result for each individual update; a guestimate makes me believe there are about 13000 updates, the right/expected amount.
I just want to know what the rowcount returns as when you run you're update code... you're not suggesting that you run one update statement per record are you???
--Jeff Moden
Change is inevitable... Change for the better is not.
October 21, 2008 at 2:55 am
Something funny is going on, I have to go deeper into this.
Jeff, the statement is as given in my first post. I get about 13000 1-row-effected messages in my resultpane.
However, I'm beginning to think I have a paralellism problem here. My initial testing was on a server with 2 processors. The final run was on a server with 1 processor, and here it ran in seconds...
Strangely tho SHOWPLAN mentioned nothing about paralellism.
I have to do more research on this, tnx for answering all.
Greetz,
Hans Brouwer
October 21, 2008 at 3:10 am
Sounds to me as if there is a trigger on the updated table firing some other action per updated row.
This might as well slow down everything trmendously.
devloping robust and performant databaseapplications with Microsoft SQL-Server
October 21, 2008 at 5:10 am
FreeHansje (10/21/2008)
Something funny is going on, I have to go deeper into this.Jeff, the statement is as given in my first post. I get about 13000 1-row-effected messages in my resultpane.
However, I'm beginning to think I have a paralellism problem here. My initial testing was on a server with 2 processors. The final run was on a server with 1 processor, and here it ran in seconds...
Strangely tho SHOWPLAN mentioned nothing about paralellism.
I have to do more research on this, tnx for answering all.
Parallelism could be the problem... one way to find out is to use OPTION(MAXDOP 1).
The reason I keep asking about the rowcount is that I wanted to make sure that you weren't experiencing some form of runaway code like what happened to me during the not so famous IS NULL bug and I wanted to make sure you didn't have any triggers happening. Also, as Horst just mentioned, if you get more than just one row count, there's a trigger in play and it needs to be repaired for performance.
Of course, there could be about a zillion other differences between the two boxes you tested on both in disk/hardware, operating system preferences, and the SQL Server setup.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply