July 7, 2005 at 2:16 pm
Sounds like you're finally agreeing with me .
July 7, 2005 at 2:27 pm
Well if you read my FIRST reply
... To check that you have to take a Profiler trace and deal with the prepare commands to see if that assumption is correct. ....
YOU are Agreeing with me
* Noel
July 7, 2005 at 2:30 pm
I implied that in my answers on other threads .
July 7, 2005 at 2:32 pm
Ah, Scott One more thing!!
I noticed you said that ADO was using SERVER-SIDE cursor Type in the ADO recordset. While the .NET is using client side. The good thing about that is that the server side cursor brings down only the rows that you scroll to while the clientside will bring them ALL so if you bring a massive amount down there is your performance bottleneck
* Noel
July 7, 2005 at 2:34 pm
Never said I didn't Agree with your anwsers either
* Noel
July 7, 2005 at 2:35 pm
Maybe that's the case, but what if he fetchs only a range of rows instead of all rows... wouldn't it all come back to the same thing in the end?
July 7, 2005 at 2:46 pm
Again it is difficult to prescribe anything at this point of facts. (NO MORE GUESSING!)
If you ask for large amount of records, lock escalation comes into place and there will be a fight to get to the table. "smart" paging is the solution for clientside disconnected recordsets.
* Noel
July 7, 2005 at 2:51 pm
"(NO MORE GUESSING!)"
Heard that one before .
Anyways we'll have to see what he has to say about it.
July 7, 2005 at 5:01 pm
The server-side cursor is on the old program, and if it is asking for a large chunk of data based on a field other than the clustered index it takes a long time to open the recordset. I think I have the CommandTimeout property set to 3000 seconds. That's part of the reason for the rewrite. But once multiple copies of the program have all gotten a their recordsets populated, they can go at an aggregate rate of over 1,000,000 updates per hour. (The fact that we measure it in records per hour gives you some idea of the size of the jobs).
In both programs, all the records pulled are processed sequentially and each record is processed. Only the necessary columns are included in the recordset. The tesing I am doing on the new program uses batches of non-overlapping ranges of the clustered primary key, so there should be minimal locking interference between multiple clients.
I profiled both programs. The ADO recordset update generates an sp_executesql call with an UPDATE statement that looks like "UPDATE tbl SET f1=@p1, f2=@p2, ... WHERE pk=@p14 and f1=@p15 and f2=@p16 ...". (In addition to specifying the primary key, it is making sure that none of the update fields have been modified since the recordset was opened.) Each command required 381 reads, 0 writes (?!!), and 0-16ms (or less that 1 tick).
The new program generates commands like "UPDATE tbl SET f1=a, f2=b, ... WHERE pk=x". They require 3 reads, 0 writes, and 0-16ms (but a lot more 0's that the first one).
July 8, 2005 at 10:01 am
Scott, Even though I can't see the trace and you posted partially what you believed is releveant I can confirm my suspissions.
1. @Px are PARAMETERS! therefore execution plan is cashed .
2. The Second case ( .NET) is using literals! execution plan has to be recompiled for every one of them!!!!!
3. if you set up a parameterized update or an stored procedure you will get the benefit of plan reuse!!
4. Eventhough you said that you asked for a large amount of recor on the server side case, behind the scene the cursor is retrieving one record ata time (that's I meant) eventually they flow to the client but the cursor is doing round trips!!!!
5. PLEASE, try the temp table "play" and you will be surpised
* Noel
July 8, 2005 at 12:50 pm
I realized the ADO method was getting the benefit of a cached execution plan when I saw it was using sp_executesql. I wll probably try that. It also cuts down on the string manipulation because the UPDATE command and parameter list are fixed and you just have to concatenate the argument values to the end.
I will probably try the temp table version at some point also, but I have a concern about the sequence. With the UPDATE approach (whether with a stored proc or sp_executeSQL, single command at a time or spooled batches), I can use async mode to send a command and then immediately start on the next record. It doesn't check for execute complete until the next command is ready. With the temp table method the joined UPDATE must be synchronous, the program can't delete or truncate the temp table to go on to the next batch until it finishes. The only way around this that I can think of is to save the connection somewhere and open a new connection (which would then have a separate temp table) for the next batch. It could periodically check the status of the saved connections and close them when they are finished. In the worst case scenario (slow updates due to non-clustered primary key and/or lots of page splits, a dozen clients each having a connection collection) there could be a huge number of connections.
I did try a version with an updatable cursor, mostly because I've never used one before. I'm still using a disconnected recordset to read the entire batch of input columns in advance, the cursor just has the columns being updated. So the cursor data is not making row-by-row round trips on the network (just using FETCH NEXT with no INTO)*. I didn't use any of the other changes discussed (spooling or sp_executeSQL). The only difference is changing "UPDATE ... WHERE <pk>=" to "UPDATE ... WHERE CURRENT OF". The aggregate performance from multiple instances on 2 client PCs more than tripled (6,000 rows/min to over 20,000). This is not the final version, but at least I've surpassed the original ADO version (maybe 18,000 rpm from two PCs).
I will probably try adding sp_executeSQL to both the cursor and non-cursor versions and see what happens. I have to test with various numbers of instances on 1-n PCs. Eventually I'll test the temp table version. If you get any bright ideas for other variations, let me know.
*I realize that there is a huge assumption here that the detatched recordset and the cursor will have exactly the same set of records in the same order, but I think it's safe because nothing else is touching these tables. Both queries are using the same "WHERE <pk> BETWEEN ... ORDER BY <pk>" clause. Our work is mostly large batch processes with a single user (not the same guy, but only one person at a time is doing any given job step). If someone was modifying a table I'm working with I could just get out the Big Foam Cluebat (see http://ars.userfriendly.org/cartoons/?id=20030311 and http://ars.userfriendly.org/cartoons/?id=20030211), which should be standard issue for all DBAs.
July 8, 2005 at 1:31 pm
Scott, Glad you got the cached execution plan.
Now. If all you care is total time it takes I don't see why do you need to delete or truncate the temp table at all or the process be asynchronous?
1.The temptable is create ONCE! for ALL your records in the set! (think set based!)
2.Then the update is done ONCE! for all those record (assuming no chunking)
3. Then drop the table
what is the problem with that?
Oh, and yes I have used my ClubBat with a LOT of developers before
* Noel
July 8, 2005 at 1:33 pm
What's that really? ("ClubBat")
July 8, 2005 at 1:35 pm
Sorry "ClueBat"
* Noel
July 8, 2005 at 1:40 pm
Still wanna know what it really means (have a clue but not too sure).
Viewing 15 posts - 16 through 30 (of 45 total)
You must be logged in to reply to this topic. Login to reply