March 13, 2008 at 9:17 pm
I would like to keep an open mind on this but until someone can demonstrate a straight-forward example of cursors being necessary (except for the case that I cited above) or even just better than other options, I cannot see the point any longer.
I can think of one real world reason.
Note: this is a simplified example but it covers the point.
You have a flat file which contains
UniqueID, Lastname, Firstname, Address line 1, Address line 2, City, Postcode, Country
You need to load this into a vendors database which contains three tables
Entity
(
EntityRef int identity(1,1) PK
, LastName varchar(200)
, FirstName varchar(200)
)
EntityCustomAttribute
(
EntityRef int FK
, UniqueID varchar(10)
)
Address
(
AddressRef int identity(1,1)
, EntityRef int FK
, Line1 varchar(200)
, Line2 varchar(200)
, City varchar(200)
, Postcode varchar(200)
, ISO2CharCountryCode char(2)
)
Now, you can do anything you like to your staging table, but you are not allowed to modify the vendor database structure or utilise columns for other than their intended purpose.
Considerations:
1. There is duplicate data values in the file with different UniqueID's. These duplicates must be loaded as individual records after loading into the vendor database.
2. Each Address is distinct to an Entity so if five different different people live at the same address you will have that address recorded five times (yes it breaks some normalisation theory, but if one person changed their address, you don't want it to apply to all five - well in some cases you might but we're not allowed to assume that).
4. You can't Identity Insert, as that will interfere with the 24x7 application that sits on top and doesn't expect to generate EntityRef values.
5. Minimal locking is required on vendor tables to avoid blocks from external service channels
6. Large data set manipulation is to be avoided due to log shipping over a strangled WAN link (the Entity table and EntityCustomAttribute contains approx 700,000 records and the Address table contains approx 2,000,000 records).
So ideas on how avoid cursors (or a while loop which I see as the same thing i.e. RBAR) and use set based processing are welcome. 🙂
BTW this load process deals with approx 100 - 200 records daily in this manner so time is not a large consideration.
March 13, 2008 at 9:31 pm
So tell me... why do you think any of that requires a cursor? Or am I misunderstanding your post?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 13, 2008 at 9:59 pm
The need to return the EntityRef identity value from the Entity table to insert into the Address and EntityCustomAttribute table. As FK's they cannot be null, and the Key must exist during the insert of this data.
I did simplify the example a little in that there are about 50 columns in the file which are mapped data structure.
I'm keen to hear your feedback though, as I would like to learn something new today 🙂
March 13, 2008 at 10:20 pm
Nick M. (3/13/2008)
The need to return the EntityRef identity value from the Entity table to insert into the Address and EntityCustomAttribute table. As FK's they cannot be null, and the Key must exist during the insert of this data.I did simplify the example a little in that there are about 50 columns in the file which are mapped data structure.
I'm keen to hear your feedback though, as I would like to learn something new today 🙂
If you want to go for new - take a look at the OUTPUT clause for an insert statement. You shove the records in without the unique identifier, and the output "hands them back" WITH the identifiers. You can then reuse them for insert the next level.
There were other tricks you might use in 2000, but this one is REALLY cool. And - it costs you no extra scan of the database to get them.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 13, 2008 at 10:33 pm
Matt beat me to it (more often than not, lately 😉 ) OUTPUT command in 2k5 will do it just fine.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 13, 2008 at 10:49 pm
Matt Miller (3/13/2008)
If you want to go for new - take a look at the OUTPUT clause for an insert statement. You shove the records in without the unique identifier, and the output "hands them back" WITH the identifiers. You can then reuse them for insert the next level.There were other tricks you might use in 2000, but this one is REALLY cool. And - it costs you no extra scan of the database to get them.
Okay so it's the wrong forum, but then the question didn't limit to a specific version (plus I see I wasn't clear about that either).
I've inherited this process under SQL2K rather 2K5, so I'd need to learn the other tricks.
I do love 2K5, as it solves a lot of these kind of things, but alas thats only in our dev environment at the moment
Theres a few other things that I didn't mention either, such as the secondary load processes that require the UniqueID to link other records to the Entity (more data from other systems), and then the fact the UniqueID is used as an anchor to maintain the Entity records - the load process maintains approx 500,000 Entity records.
I can think of a way of doing this process using sets, but I do get left in a hole if there is a failure during processing. The current process (DTS package) can fail at any point and can be rerun and will self recover.
Perhaps I should ask in a 2K forum....
March 14, 2008 at 3:52 am
Yes You can do with as all above says or use ## to use it in procedure
March 14, 2008 at 12:38 pm
Nick M. (3/13/2008)
I can think of one real world reason.Note: this is a simplified example but it covers the point.
...
So ideas on how avoid cursors (or a while loop which I see as the same thing i.e. RBAR) and use set based processing are welcome. 🙂
...
Sure, I can do this without cursors. Even on SQL2000 I can do it. I can do it even if you keep this obviously arbitrary constraint:
4. You can't Identity Insert, as that will interfere with the 24x7 application that sits on top and doesn't expect to generate EntityRef values.
Why arbitrary? Because you assume that IDENTITY_INSERT needs to be on for more than a couple of seconds. I am certain that with IDENTITY_INSERT I would be done long before your App's Lock_Timeouts expired.
Without IDENTITY_INSERT it is harder, slower and more complex, but still definitely doable.
Let me know which one you want me to demonstrate.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 14, 2008 at 12:53 pm
shamas saeed (3/14/2008)
Yes You can do with as all above says or use ## to use it in procedure
Global temp tables are a bad idea... if more than one instance of the proc happens to light off... BOOM!
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 31 through 38 (of 38 total)
You must be logged in to reply to this topic. Login to reply