February 19, 2014 at 5:31 pm
I have a SQL server table RealEstate with columns - Id, Property, Property_Value. This table has about 5-10 million rows and can increase even more in the future. I want to insert a row only if a combination of Id, Property, Property_Value does not exist in this table.
Example Table -
1,Rooms,5
1,Bath,2
1,Address,New York
2,Rooms,2
2,Bath,1
2,Address,Miami
Inserting 2,Address,Miami should NOT be allowed. But, 2,Price,2billion is okay. I am curious to know which is the "best" way to do this and why. The why part is most important to me.
Check if a row exists before you insert it.
Set unique constraints on all 3 columns and let the database do the checking for you.
Is there any scenario where one would be better than the other ?
Thanks.
February 19, 2014 at 5:41 pm
I would let the database do the work for me. Just to be clear, a single constraint that involves all 3 columns should be fine instead of 3 unique constraints.
February 19, 2014 at 5:47 pm
Really it depends on how fine you want your error controls, and if you're doing bulk or singleton inserts.
If your inserts come down one at a time, that's fine. If it's bulk, you'll reject the batch on a single failure (the entire insert is one transaction). You have to determine if that's valid.
If you're going to test yourself, you have to test twice. First you have to test the batch coming in to make sure it doesn't contain duplicates within itself, then check all the rows in the batch against the target.
Regarding error controls, either you just work with the error code that SQL barfs up at you (which can change between versions), or you check yourself for a custom check and finer controls as to what happens on a failure.
Also... EAV table. Ewwwwwww.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 19, 2014 at 9:50 pm
blasto_max (2/19/2014)
I have a SQL server table RealEstate with columns - Id, Property, Property_Value. This table has about 5-10 million rows and can increase even more in the future. I want to insert a row only if a combination of Id, Property, Property_Value does not exist in this table.Example Table -
1,Rooms,5
1,Bath,2
1,Address,New York
2,Rooms,2
2,Bath,1
2,Address,Miami
Inserting 2,Address,Miami should NOT be allowed. But, 2,Price,2billion is okay. I am curious to know which is the "best" way to do this and why. The why part is most important to me.
Check if a row exists before you insert it.
Set unique constraints on all 3 columns and let the database do the checking for you.
Is there any scenario where one would be better than the other ?
Thanks.
I normally check rows before I insert them even if the proper constraints are in place. Rollbacks are expensive and I'd rather reserve things like TRY/CATCH for real errors instead of "see if it sticks to the wall" types of code.
You could also create a UNIQUE index on the columns with the "IGNORE DUPES" option. The problem with that is it returns an informational message about dupes being ignored that GUIs can misinterpret as errors. It also doesn't allow me to identify the duplicates for troubleshooting or data repair or just for keeping track of "bad rows" sent by a 3rd party.
That's the big reason why I prequalify the data before doing an insert. Of course, something could change between the time you do such validation and the actual insert. And, of course, I'll do a rollback in such an instance so that I can get the correct reporting when I tray again.
The other thing that you might want to consider (and I haven't because I'm still stuck on 2005 at work) is the MERGE command which has the best of all of those thing especially when combined with the OUTPUT operator.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply