March 10, 2008 at 9:11 pm
Well, unfortunately, I do know of a case where a cursor is the best option (at least at the time). Last employer we had a restrictions table for documents (800+ rows) that needed to be applied to a table of documents that at my last recognition was around 750,000+ records. The cursor based solution runs nightly and takes about 45 minutes.
My early attempts at a set based solution were horrendously longer and unacceptable for daily updates to the web front end.
Since leaving that company I have learned new tricks that could be used to possibly create a faster, set based process, but I'm in no position now to work on that process. (oh, well).
And it wasn't that I gave up too early; we had a workable solution, and there were many other things I needed to work on that were considered of greater importance than trying to continually work on improving that process. Whenever I had some free time, I would revisit the process. I really wish I could work on it now, after some of the things I have learned from this site in the last couple of years. I truely think I could finally develop a fast, working set based solution.
😎
March 10, 2008 at 9:23 pm
Lynn Pettis (3/10/2008)
And it wasn't that I gave up too early; we had a workable solution, and there were many other things I needed to work on that were considered of greater importance than trying to continually work on improving that process.
And, that my friend, is the primary reason why people "give up too early"... lot's of folks are told "Good enough, ship it". Doesn't ever come up again until things like your 750,000 rows becomes 7,500,000 rows and your 45 minute process is taking 7.5 hours. It's also really embarassing when it runs for 7 hours and 15 minutes and get's bumped by a deadlock or a timeout. And it always happens at critical times like month end processing or that really big report that the head kahuna likes so much... 😉
It's why I do code reviews and, thankfully, have the power to reject most production code even though someone wants it real bad (little play on words there...).
--Jeff Moden
Change is inevitable... Change for the better is not.
March 10, 2008 at 9:54 pm
There is a side note to the cursor solution, each pass (a single record from the restrictions table) is applied to the inventory records in a set based fashion. Each pass only affected a subset of the data, and as restrictions were applied (most restrictive first), those records would be excluded from processing by less restrictive restrictions. So even if the inventory grows quite large, there may not be a big impact in processing. The reverse, however would not be true, but knowing the nature of the restrictions table, I don't think you would see it grow from 800+ to 8000+. It may grow 10 or 20 records in a year, if lucky. That table is fairly static.
😎
March 11, 2008 at 5:05 am
Hi,
If you are fetching data from a table and want change on each row at a field then you can use cursor. it runs once time for each row.
Below is the example of cursor
DECLARE my_cursor2 CURSOR FOR
SELECT tbl_field FROM @tempTable
OPEN my_cursor2
FETCH NEXT FROM my_cursor2
INTO @tbl_field
WHILE @@FETCH_STATUS = 0
Begin
-- Write query to execute
FETCH NEXT FROM my_cursor2
INTO @tbl_field
End
CLOSE my_cursor2
DEALLOCATE my_cursor2
Regards
Sarvesh Kumar Gupta
March 11, 2008 at 5:13 am
Sarvesh Kumar Gupta (3/11/2008)
If you are fetching data from a table and want change on each row at a field then you can use cursor. it runs once time for each row.
Yep... you can do that... but why even have a database, then? Just load up the ol' reel-to-real tapes and have a go at it. Using a cursor is not a good thing to do in 99.999% of all code. It's slow, it's resource hungry, causes unnecessary locks in all but one type, and it uses RBAR instead of the real power of an RDBMS.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 11, 2008 at 5:27 am
Hi,
You are right that cursor is not a good idea for this but when you are using frontend language and backend.
If there is any such type of work that we have to change in each row then it can be done by using frontend and backend then there is better idea backend than frontend. bcoz backend is fast running than frontend.
Regards
Sarvesh Kumar Gupta
March 11, 2008 at 5:59 am
Yes... for example, see the following... it solves one of the worst performance problems in SQL Server that there is... Running Totals... will do a running total on a million rows in 7 seconds on my poor ol' 1.8 GHZ desktop...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
--Jeff Moden
Change is inevitable... Change for the better is not.
March 11, 2008 at 6:31 am
Jeff, the real killer in that process at my previous employer wasn't the cursor, but the data. About half of the processing time was a result of about 20 to 30 records in the restrictions table that I know resulted in table scans due to the nature of the restrictions and how wild cards used in a where clause can result in not using indexes.
Having learned MORE from SSC and people like you, I now think I could truely improve that process if I was in a position to do so.
Cursors may be evil, I agree, and thier use carefully evaluated. There are still instances where they can perform well. In this case, it did extremely well considering the alternatives I had tried (which probably contained hidden RBAR (a term I was not familiar within until the last couple of years), that was worse than a cursor).
Each record from the restriction table was applied in a set based fashion against the inventory table. Many of the records only affected a very small number of records sometimes as few as 1 to maybe a few hundred. There were about 30 or 40 records that affected between 100,000 and 200,000 records. These usually resulted in the table scans, but also, as restrictions were applied, the number of records that were affected by subsequent updates was reduced, lowering the number of records considered for processing.
Some of the tricks I have learned from you, could quite possibly make enormous improvements, but as I keep saying, it isn't my problem anymore even though it bothers me that I know I could make improvements to the process (Aside, is it wrong to feel that way?) today.
😎
March 11, 2008 at 6:45 am
it isn't my problem anymore even though it bothers me that I know I could make improvements to the process (Aside, is it wrong to feel that way?) today.
No... not a bad thing to feel that way... Like anyone else, I've learned some tricks over the years that would have served the code well in years previous. Although I realize it's not my problem anymore, it would be satisfying to be able to go back and fix some mistakes that I now know I made. Some were real doozies, too! 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
March 12, 2008 at 3:43 am
An actual answer to the question could be:
If there is no other way than to use a cursor here is a simple example you should be able to translate to your situation.
--You need to store some temporary information in variables.
DECLARE
@IDint,
@Field2 varchar(2)
--Then you define a cursor and load it with information from a select statement
DECLARE MyCursor FOR
SELECT
ID, Field2-- Any number of fields you need from the select
FROM
Table1
--When openening the cursor, it will be empty. So before you start the loop, load the first record
OPEN MyCursor --Cursor starts here
FETCH NEXT FROM MyCursor INTO @ID, @Field2 --Read the initial information from the first record
WHILE @@FETCH_STATUS <> -1 --NOT an empty select statement
BEGIN
IF @@FETCH_STATUS <> -2 --NOT past the last row
BEGIN
--youre code goes here
END
FETCH NEXT FROM MyCursor INTO @ID, @Field2--Read the next record
END
--Terminate cursor and free memory
CLOSE MyCursor
DEALLOCATE MyCursor
Good luck and use it if needed.
Marcel
March 12, 2008 at 8:35 am
What about for migrating data from an old database into a new database? If a cursor is not the best solution, please by all means let me know what is. I have been working on that over the past month off and on, because a developer I work with had said that was just what you do. Not that we had certain options, but that it was the business standard, and since I am still learning I just took that as the gospel. I have only migrated about half of the tables from the old database into the new database. We were going from an old access application, to Microsoft CRM. If there is a faster more effective way of doing this, let me know. I have ran into some snags, and had to write and re-write and write again! Not to mention theres only a bajillion columns in each of these tables, so it is very tedious.
March 12, 2008 at 8:44 am
Cursor would be about the last thing I would use to import old data. I would usually use a staging table (something with no major constraints, strings attached, etc...) so as to get the data in. It will import the data in a fraction (I've seen load times hundreds and thousands of times faster that way than a cursor). Once it's in - I then cleanse the data (look for inconsistent stuff, confirm the data types, etc...), and then import into the "real" data tables.
Assuming I have the time, I'd also take a moment to see if there's any cleanup on the data model that can be done. Your "bajillions of columns" tells me you'd need to do that too. Badly normalized tables are perf killers IMO, unless you plan your denormalization carefully.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 12, 2008 at 8:59 am
Ok... before I continue with this post, please keep an open mind and again know that I am 24, and it takes a lifetime to really understand SQL completely so although I really love what I have learned, and done so far, I have a lot to learn. It also seems that until you actually encounter something in your everyday life, you can read it in a book a million times but there is so much TO SQL that it is very easy to not retain all of the data, all of the time. Now that I am done with that tangent, hopefully you will keep that in mind when you read my next question. 😛
It is my understanding that using a cursor walks each record into the table one by one. This allows for all the data in the columns to stay where they need to stay, and a record to stay intact.
How can I move the data into a staging table while keeping the records completely intact (all columns linked to the uid, remain with that uid), and then move that data into the new table? If it isnt too much trouble to explain... (I hope I worded that properly)
March 12, 2008 at 10:22 am
No problem - we all have things to learn. I learn stuff every day, and I've been playing with various database formats for nearly 20 years now.
Whether importing using a cursor or set-based - the columns will stay with the original row. What you might not see if the physical order of the rows being maintained. Meaning - due to various optimizations SQL server does - it assumes that the physical order in an external file is irrelevant, and makes no guarantees that lines 1,2,3 will end up being in the same physical order once it's brought in.
In this case - the staging table I'm talking about would have the right number of columns needed to import each row. Depending on how clean the data from that particular source has been in the past - I might even make the staging table have all of the correct data types up front. If the data has NOT been clean previously, or if I just plain don't know - I might default a lot of the fields to bigger varchar (variable-length character) fields, so that I can get it in FIRST, and then worry about data quality/cleanliness.
That would also mean that the staging would have data in a de-normalized form if it was sent that way. For example - a "person demographics file" might have three addresses next to each person (with repeating columns of address1, address2, city, state, zip, phone, fax). I can then take that opportunity to push that data into a normalized, separate table (person_addresses, for example).
As to creating the keys - well - they can sometimes be tricky. One of the tricks available in 2005 is the OUTPUT clause available during an insert (giving you "back" rows that have been inserted, with possible new fields like an identity column added to it, so that you have access to the uid).
Bottom line though - although it's a different way to tackle the issue - it tends to be MUCH faster. As in - cutting a 2 hour process down to a 45 second process, for example.
It's going to be impossible to give you specifics without starting with specifics, but let's just say that by starting out your import and cleanup process by saving yourself 2 hours on the import process, you have a LOT of time to spend on cleaning up and normalizing the data.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 12, 2008 at 10:32 am
Should create a staging table under the database where the original data is, or create a new database for migration, and then create the tables under that? What you are saying is insert the data from the original tables into staging tables in order to allow me to clean it up, then insert the data directly into the new database and that is it????? I mean I have to be missing something because that sounds way too simple. Row orders have no relevance because sorting is all handled by the external application.
Viewing 15 posts - 16 through 30 (of 37 total)
You must be logged in to reply to this topic. Login to reply