December 21, 2010 at 11:00 pm
Hi,
In my project i am in critical situation. I am having two scenarios. In single application, multiple user update and retrieve the data from single table. First one is when the user insert the data at the same time another user retrieve the data in same table from different machine using same application. For ex, a healthcare product a user insert information to a table and another user retrieve his information that already he stored in the same table. I am using Transaction. when insert, i am using upload the file. If the file size is large, the user inserting to the table. Begin transaction is going but still it is not committed. At the particular time another user retrieve data from same table. Problem if i am using NOLOCK, the retrieving process is executing still the inserting query till committed.So the user need to wait to get his data.My clients are feeling that this is performance issue.
Second one is when i am using READPAST, it is working that it is lock the particular row and the user can retrieve the data. But in some other scneario if the user is updated na i cant show the particular row until it is committed. Please give the clear solution for performance tune and what is the problem will come while using NOLOCK and READPAST.
It is very urgent!!!!!hope yours reply!!!
Thanks
December 23, 2010 at 12:42 pm
How large is the upload file? You may be better off loading the file into a staging table and then doing your inserts in small batches from a stored procedure.
Todd Fifield
December 23, 2010 at 1:23 pm
A similar question would be the current design of the upload process. If it's RBAR based (any kind of a loop) there might be the issue right away.
We'd need to see the upload process for a more detailed answer.
December 25, 2010 at 8:10 pm
This one is simply answered: don't use either of NOLOCK or READPAST in data entry applications. If your client values the correctness of the data she puts in/gets out of the database, she needs to appreciate the fact that in rare situations a single call may be delayed for some milliseconds for some other user's actions on the same data to finish.
It is your responsibility as the (technical) designer/developer that transactions never span any user interaction; i.e. that all transactions are always short (measured in milliseconds, not seconds). Never, ever start a transaction, let the user do some action(s) and then commit the transaction. Typically a data entry application needs to read it's data from the database to show the entry form, then let the end user do it's task(s) on the data in that entry form and only when the end user selects to save the form a transaction is started and the data is saved. Only if the save is successful a commit is performed. One condition for an unsuccessful save could be that another user has modified the data since it was read. This can f.e. be implemented using a timestamp column: adding a where clause comparing the current timestamp value with that which was initially read is a safe way of knowing the data wasn't altered since it was read.
December 26, 2010 at 8:59 am
R.P.Rozema (12/25/2010)
This one is simply answered: don't use either of NOLOCK or READPAST in data entry applications. If your client values the correctness of the data she puts in/gets out of the database, she needs to appreciate the fact that in rare situations a single call may be delayed for some milliseconds for some other user's actions on the same data to finish.It is your responsibility as the (technical) designer/developer that transactions never span any user interaction; i.e. that all transactions are always short (measured in milliseconds, not seconds). Never, ever start a transaction, let the user do some action(s) and then commit the transaction. Typically a data entry application needs to read it's data from the database to show the entry form, then let the end user do it's task(s) on the data in that entry form and only when the end user selects to save the form a transaction is started and the data is saved. Only if the save is successful a commit is performed. One condition for an unsuccessful save could be that another user has modified the data since it was read. This can f.e. be implemented using a timestamp column: adding a where clause comparing the current timestamp value with that which was initially read is a safe way of knowing the data wasn't altered since it was read.
I'll add to the good thoughts Rozema posted above. The reason why many people use such hints as NOLOCK and READPAST is because the code that is doing an INSERT, UPDATE, or DELETE is just too bloody slow and it interferes with the supposed performance of SELECTs. Instead of taking the time to fix those problems, they cheat with NOLOCK and READPAST... and they are frequently punished for such cheating because the system continues to get slower and the likelyhood of duplicate rows appearing in a query because of the hints continues to increase.
If a query is causing lots of "waits", don't shoot yourself in the mouth by simply ignoring the real problem of performance challenged code and using query hints as "the fix" because it's no fix. Instead, spend the time to fix the performance challenged code. You'll be amazed at what a difference it makes across the board.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2010 at 8:39 am
+1 on never have a transaction that waits on user input; collect all user input first, _then_ start transactional DML.
In response to "(nolock) and READPAST is always bad", while on a pure OLTP system I would agree, I'd note that there are situations, particularly when one starts combining OLTP and OLAP style work in the same database, that (nolock) and similar transaction isolation states becomes acceptable (to me).
Example: OLTP DML is run on tightly coupled, small sets of multimillion row tables at a time, only affecting a few rows each transaction, continuously throughout the day, at a high rate, with a requirement for low latency, with essentially arbitrary selection. Simultaneously, large OLAP style multi-million row per table historical queries need to be across several sets of tables joined together for non-financial reporting purposes.
Sample rules for this type of scenario:
OLTP must always work, must always be perfect or fail, and must have low latency (subsecond).
OLAP should return results to within X% of correctness (where X is nontrivially greater than 0), is allowed to take significant time (minutes; and the pure I/O may take many seconds even with covering indexes), but must return results during the business day.
OLAP, therefore, cannot be allowed to interfere with OLTP, and can accept small amounts of bad data; as such, a very low transaction isolation level is an acceptable compromise.
Edited: clarified that OLTP/OLAP rules above are simply a sample scenario to illustrate my point.
December 28, 2010 at 12:33 pm
Nadrek,
I disagree that OLAP can allow for a small percentage of bad data. If I were to show the owners of companies that I consult for 2 different results for the same query run a few minutes apart I would no longer be consulting at those companies. It makes us look stupid or incompetent.
In data warehouse type applications, when that data remains static up until it is time to refresh it (once a day at night or something like that) NOLOCK wouldn't hurt at all, as long as nothing is run at the same time the data is being loaded.
Any other use would not be advisable.
Todd Fifield
December 28, 2010 at 7:25 pm
tfifield (12/28/2010)
Nadrek,I disagree that OLAP can allow for a small percentage of bad data. If I were to show the owners of companies that I consult for 2 different results for the same query run a few minutes apart I would no longer be consulting at those companies. It makes us look stupid or incompetent.
In data warehouse type applications, when that data remains static up until it is time to refresh it (once a day at night or something like that) NOLOCK wouldn't hurt at all, as long as nothing is run at the same time the data is being loaded.
Any other use would not be advisable.
Todd Fifield
Heh... I agree. Anyone for a pancaked Martian Lander or some nice healthy SOX violations? How about a room full of financial analysts trying to find out where 2 cents that should have been there went only to find out that someone allowed for a small percentage of bad data? 😛
It's strange how some things come full circle... I've always believed that absolute accuracy is a must and that's coupled very closely by performance. For the last 7 years on this forum, a lot of folks have been telling me that performance doesn't matter so long as the accuracy is there. Now we have someone saying that so long as the performance is there, it's ok to have a small percentage of inaccuracy.
To be absolutely straight about this, I agree 100% with Todd. Unless you have to allow for uneven selvege in cloth, accuracy is the paramount concern in all things having to do with a database or data in general. All else is secondary.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 29, 2010 at 8:35 am
For many things, absolute, repeatable, verifiable accuracy is indeed a must, I agree.
The general requirement should default to accuracy first, yes. Finance, auditing, anything that gets reconciled to anything else, anything that needs to generate the same results on a second run as it did on the first, all require high accuracy to meet the business requirement.
That said, I must maintain that in certain actual business cases, absolute accuracy is not required by the business use; simplified dashboards for management is an example of this (if you're translating precise dollar and cent amounts from tens of thousands of transactions into "RED/YELLOW/GREEN", for example).
Some database schemas do not actually allow for repeatable runs to provide the same value in the first place; if individual rows change second by second (often with retroactive valid times), the report takes minutes to run, and there is no transaction time stored (much less history of transaction time). Essentially, you have a "This is as things look right now" database with no history. If you run your report twice, you're almost certain to get different results, barring a statistical fluke; changes happen faster than the report can run. In many cases (financial, etc.) you still need to provide perfectly accurate results as of the time it was run. In some other cases (management reporting for dashboards), there is no such requirement, nor are there significant consequences for being slightly off. There may be significant consequences for a shared lock being acquired for the runtime of the report, however.
This is essentially the same argument used to support comparison-by-hash vs. comparison-by-matching-everything. If you do it (hash, or nolock/READ UNCOMMITTED), then you're deliberately accepting inaccurate results at least some of the time.
I would also note that READ COMMITTED is vulnerable to nonrepeatable reads, phantom reads and lost updates. Even REPEATABLE READ is vulnerable to phantom reads; how many of us use stricter levels than this regularly for statements that result in an UPDATE?
Valid time vs. Transaction Time is used as defined inDeveloping Time-Oriented Database Applications in SQL (Snodgrass)
Transaction isolation information primarily from SQL Lion[/url] and Brad McGehee.
December 29, 2010 at 3:04 pm
Nadrek,
That said, I must maintain that in certain actual business cases, absolute accuracy is not required by the business use; simplified dashboards for management is an example of this (if you're translating precise dollar and cent amounts from tens of thousands of transactions into "RED/YELLOW/GREEN", for example).
This probably holds true if the business is very dynamic with constantly changing data and you want the most recent data.
However, I'm going to give you a real example that actually happened to me.
I got called in to a client using an Order Entry system that I was familiar with. They complained that reports run for data up to yesterday would sometimes show more and sometimes show less for orders during the last few days. These reports would be sales totals by salesman, product, etc.
I found that the stored procedures for the reports always used the NOLOCK hint. I was told this was for performance reasons. The Primary Key and Clustered Index for the Orders table was just a sequential number that came from a sequence table. The order line records used the same key plus a line number as the primary and clustering key. The order numbers were actually character data rather than numeric done with a CONVERT(VARCHAR, SequenceNo) on the number from the sequence table. The numbers looked like '10001', '10002', etc. Nice sequential ever increasing type of clustered index.
The data for yesterday and other days prior in the week never changed. Once it was entered and printed it was locked - never changed. You would think that NOLOCK wouldn't affect the reports run for the week up through yesterday.
What I found was that when they were out of stock on popular items the reports would act strangely sometimes. When the shipping department was out of something they would have to enter a back order. The back order number was the original order number plus '-01' or even '-02' if they had to do more than 1 back order.
This resulted in inserting order rows into the same clustered index pages (data pages) as the original order from earlier in the week. There were page splits. If reports were run during these page splits then sometimes orders would disappear, sometimes they would double up.
I took the NOLOCK hints out of the report procedures and the problem never happened again.
So it is a very bad practice to use dirty reads in an OLTP system. A warehouse where it is loaded or updated once a day is OK since you control when that occurs.
If you want ballpark numbers rather than exact up to right now, well I guess dirty reads would be acceptable. Most managers don't like ballpark numbers when managing. They like exact statistical data.
Todd Fifield
December 29, 2010 at 3:11 pm
tfifield (12/29/2010)
Nadrek,That said, I must maintain that in certain actual business cases, absolute accuracy is not required by the business use; simplified dashboards for management is an example of this (if you're translating precise dollar and cent amounts from tens of thousands of transactions into "RED/YELLOW/GREEN", for example).
This probably holds true if the business is very dynamic with constantly changing data and you want the most recent data.
...
If you want ballpark numbers rather than exact up to right now, well I guess dirty reads would be acceptable. Most managers don't like ballpark numbers when managing. They like exact statistical data.
Todd Fifield
Agreed; if they want exact data, a higher isolation level is required. If, as my example indicated, they're merely interested in RED (<5 units), YELLOW (5-15 units), or GREEN (>15 units), and those numbers vary continuously, it's unlikely that a 1% or less error is going to be important, particularly as we get away from talking dollars or items and move towards softer metrics like call times.
When we have a case like I'm describing, particular in combination with a higher isolation level producing unacceptable blocking, then a (nolock) is something to consider alongside other options. It's the cheapest... as long as everyone understands, and accepts, the risks.
December 31, 2010 at 1:06 pm
Nadrek (12/29/2010)
tfifield (12/29/2010)
Nadrek,That said, I must maintain that in certain actual business cases, absolute accuracy is not required by the business use; simplified dashboards for management is an example of this (if you're translating precise dollar and cent amounts from tens of thousands of transactions into "RED/YELLOW/GREEN", for example).
This probably holds true if the business is very dynamic with constantly changing data and you want the most recent data.
...
If you want ballpark numbers rather than exact up to right now, well I guess dirty reads would be acceptable. Most managers don't like ballpark numbers when managing. They like exact statistical data.
Todd Fifield
Agreed; if they want exact data, a higher isolation level is required. If, as my example indicated, they're merely interested in RED (<5 units), YELLOW (5-15 units), or GREEN (>15 units), and those numbers vary continuously, it's unlikely that a 1% or less error is going to be important, particularly as we get away from talking dollars or items and move towards softer metrics like call times.
When we have a case like I'm describing, particular in combination with a higher isolation level producing unacceptable blocking, then a (nolock) is something to consider alongside other options. It's the cheapest... as long as everyone understands, and accepts, the risks.
I guess that really depends. If immediate action is required for RED and NOLOCK allows and extra count through driving it to YELLOW, you could be in deep Kimchi. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply