June 1, 2006 at 12:05 pm
I understand what you're meaning now. I just wasn't sure. I am currently sending all of the records via a select statement into a staging table, which will allow me to simply drop one of the others and work around this problem, but I am wondering why in the world a select statement using the same inner join does night and day difference versus an update query even after all extra indexes were removed from the host tables.
June 1, 2006 at 12:11 pm
Yes that is basically it - there may be something about the complexity of the query or the repeated instances of the table that is contributing to perfomance problems. So separating the SELECt from the UPDATE would be worth trying.
If that doesn't work, dropping the clustered index then trying the insert from the staging table would test whether the clustered index rebuild is causing the problem. Once the cause is established, a solution will be easier to find.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 1, 2006 at 12:14 pm
16 is fine (ANSI_PADDING ON). They just have to be the same.
June 1, 2006 at 12:23 pm
OK so the staging table sorts it out - correct? That means it's not the performance of the joins alone, and it's not the UPDATE itself or its side-effects. It seems like it's having the UPDATE and joins in the same statement. Can you post execution plans for the staged version - the SELECT and the UPDATE?
Also, you might try putting the SELECT into an inline view (derived table, joined subselect), in the hope of getting the optimiser to perform that first before joining the resulting small recordset to the target table. There are other possibilities for achieving this which we could go into if necessary...
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 1, 2006 at 12:27 pm
Didn't see this asked, and it was probably assumed based on the join syntax ... but, are both clustered indexes unique on the phone number columns ?
If SELECT is fast, and UPDATE painfully slow, 1 key difference between the 2 operations is transaction log writes for the UPDATE. Any disk IO issues with the device where the transaction log resides ?
June 1, 2006 at 12:34 pm
Plan for the select query (where I inserted the records to a new table)
|--Compute Scalar(DEFINE[Expr1002]=right([C3].[PhoneC3], 2)))
|--Nested Loops(Inner Join, OUTER REFERENCES[C2].[Phonec2]) WITH PREFETCH)
|--Clustered Index Scan(OBJECT[Cold_Leads].[dbo].[Cold2].[IX_Phone2] AS [C2]), ORDERED FORWARD)
|--Clustered Index Seek(OBJECT[Cold_Leads].[dbo].[Cold3].[IX_Phone] AS [C3]), SEEK[C3].[PhoneC3]=[C2].[Phonec2]) ORDERED FORWARD)
(Note: There were 13Million matches so it's not that small of a subset)
As for PW's questions, both clustered index's are NOT created as Unique, however all of the records in their fields are Unique at this time (it's customer information for phone numbers,etc).
Also the Transaction log and the data drives are both on SATA Drives using seperate raid controller cards to manage them, so I don't believe it's an IO issue.
June 1, 2006 at 12:37 pm
Sorry, forgot to ask for the plan for the non-parallelised (maxdop 0) version of the problem update, too.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 1, 2006 at 12:58 pm
Also notice that the plan for the (paralellised) slow update doesn't use the clustered index from the target table in performing the join. I've never heard of this, but could it be that the optimiser tries to avoid using the clustered index if it has to update it in the same statement? You also update the index it does use. Has anyone heard of this kind of interference between joins and updates? Seems very odd, but the optimiser is a strange beast.
You could try one of the functions below to run the update statement (example at the bottom), again as an experiment:
function dbo.fn_GetColdUpdateValues()
@retval table(???_PK int, LC_Adsa varchar(8000), Attempts int, LastDistributed datetime)
@retval
C3.???_PK, C2.LC_Adsa,
Cold3 C3
join Cold2 C2 on C3.Phone=C2.Phone
function dbo.fn_GetColdUpdateValuesIL()
table
C3.???_PK, C2.LC_Adsa,
Cold3 C3
join Cold2 C2 on C3.Phone=C2.Phone
c
LC_Adsa = f.LC_Adsa, Attempts = f.Attempts, LastDistributed = f.LastDistributed, Available=1, Updated=1
Cold3 c
dbo.fn_GetColdUpdateValues() f
c.???_PK = f.???_PK
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 2, 2006 at 3:03 pm
What about
update T2 set T2.Field1=T1.Field1,T2.Field2=T1.Field1 from Table2 T2 inner join Table1 T1 WITH (NOLOCK) on T2.Phone=T1.Phone
Maybe try some ROWLOCK hints depending on your needs.
Erik
June 2, 2006 at 5:14 pm
I wouldn't recommend using NOLOCK, but certainly locking might fit the outline of the problem. Finish diagnostics then look at the solution.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 5, 2006 at 7:29 am
I've tried it using (nolock) and without and everything I did ended up still causing a ridiculous amount of time to do the inner join update.
Therefore, I took a hybrid of the replies from this thread and did the following:
Seperated my table from 17M rows worth of data down to smaller tables of 3-4M rows (Broke out each state seperately).
I then did a while statement that went through the entire 17M rows worth of data and parsed out which state to insert the records into, and while at it, also checked against my 22M row table to find any updates that were necessary using a nested select statement.
All of this, 2 hours roughly. After having spent nearly 2 days trying to find the root of the problem and beating my head against the wall. I still don't understand why it does this, and am probably going to simply change the manner in which I try do updates against tables like this since with my knowledge of SQL this shouldn't ever be an issue (update a single table using an inner join on another table that matches exactly for clustered indexes)
I appreciate all of the help on this though.
June 5, 2006 at 7:42 am
I understand that you are not some sort of latter-day Sherlock Holmes with nothing to do but solve interesting mysteries (and in his case, hard drugs and fiddling), but you must admit it is pretty intriguing. Did you try using the UDFs to 'insulate' the UPDATE from the join? Have you run sp_who2 and sp_lock? Once again, no worries if you are too busy.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 6, 2006 at 7:09 am
Honestly Tim, I'd love to solve this problem because the routine I'm using takes literally 8 minutes to update 25k rows against a 4M row database (it's horribly written and I'll be the first to admit it, but unfortunately for whatever reason it's faster than an inner join on 2 clustered index's which is so perplexing).
As for the UDF's and the sp_who's type stuff, I'm what you might call at best a middling SQL guy who's doing his best to learn the ins and outs while still getting the job done. Therefore in all honesty, I've never used either of those ideas before in my SQL stuff. It's all been pretty vanilla so far which means that obviously there is a long road ahead in order to make things run faster and more smoothly.
June 6, 2006 at 7:27 am
The ideas are suggestions to help you diagnose the problem. If you're not sure how to do them, ask.
First thing to try:
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
Viewing 14 posts - 16 through 28 (of 28 total)
You must be logged in to reply to this topic. Login to reply