Merge to upsert when target table might have legitimate duplicate tuples

Question

Merge to upsert when target table might have legitimate duplicate tuples

isaac_654135

Default port

Points: 1414
More actions
October 3, 2017 at 2:47 pm

#340580

Am I correct in saying that it's no use to try to use the Merge statement for an Upsert (meaning, in this case, try to figure out how to perfectly SYNC both tables) to a target table, if the target (and source) have rows that are 100% the same, all values, all columns?
(Please, diatribes regarding why such a table might exist are unneeded; it's not my table, it wasn't my design, and for all I know, it's a staging table for incoming garbage where there is nothing we can do but tolerate rows with the same values; perhaps they are being usefully aggregated somewhere else).

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110640 More actions · Answer 1

Without the addition of some kind of unique key to such a table, you are correct. You would be far better served by finding out whether or not duplicate records in the table should exist or not.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

sgmunson SSC Guru Points: 110640 More actions · Answer 2

And there's no possible way to sync two such nightmares into a single table. It simply cannot be done.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

isaac_654135 Default port Points: 1414 More actions · Answer 3

Thank you.
I will do that.
I also wonder sometimes if the popularity of Merge hasn't perhaps overdone, just a bit, its value over truncate / insert, drop table / select into, etc. etc.

I mean this table with almost 3 million records seems to do any of the above in a couple seconds. I realize that cost is cost and time is money, but for a single event occurring in an overnight job, I'm trying to make sure I remain balanced here too. (but mostly using Merge for the practice of it, when possible).

Any feedback on that topic appreciated. Thanks again.

isaac_654135 Default port Points: 1414 More actions · Answer 4

Yeah...I think in reality, there is a column that can be used as a natural key, given the fact that out of 3 million records, it never duplicates from what I can see.
If business users confirm, then I can maybe just do a merge based on that.

sgmunson SSC Guru Points: 110640 More actions · Answer 5

pisorsisaac - Tuesday, October 3, 2017 2:55 PM
Thank you.
I will do that.
I also wonder sometimes if the popularity of Merge hasn't perhaps overdone, just a bit, its value over truncate / insert, drop table / select into, etc. etc.
I mean this table with almost 3 million records seems to do any of the above in a couple seconds. I realize that cost is cost and time is money, but for a single event occurring in an overnight job, I'm trying to make sure I remain balanced here too. (but mostly using Merge for the practice of it, when possible).
Any feedback on that topic appreciated. Thanks again.

Merge isn't actually all that popular, in part because it doesn't always seem to work. I hesitate to use it for anything but drop dead simple stuff. Where 3 million records are concerned, just getting rows in from another database table or even a scripted insert can often happen in the relative blink of an eye if the process being used is; once again; drop dead simple. It's mostly a matter of row size at that point. Where things slow down is when JOINs are more complex or there aren't good indexes.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

isaac_654135 Default port Points: 1414 More actions · Answer 6

Maybe my impression of it has come from a few roles or individuals I rubbed shoulders with in the past year or two, folks deemed to be my developer-betters, who always raved about MERGE and wanted to make sure I realized that "truncate & insert was so 10 years ago" kind of thing.
Absent any sophisticated tests (which likely wouldn't change my overall opinion, unless efficiency was extremely critical in the context), it always did seem to me just intuitively that it wasn't so great compared to the traditional ways.

Thanks for your comments and insight.

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 7

pisorsisaac - Tuesday, October 3, 2017 2:47 PM

>> (Please, diatribes regarding why such a table might exist are unneeded; it's not my table, it wasn't my design, and for all I know, it's a staging table for incoming garbage where there is nothing we can do but tolerate rows with the same values; perhaps they are being usefully aggregated somewhere else). <<

You really need to hear the diatribe, so you know what a table is. This doesn't qualify at all. We also need to go back and read some of Dr. Codd's old stuff. He introduced the concept of a "degree of duplication" column in tables; a count of the number of times a particular row appears in the multiset. It has to be equal to or greater than one. Every time you find a duplicate, increment that column..

Please post DDL and follow ANSI/ISO standards when asking for help.

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 8

sgmunson - Tuesday, October 3, 2017 3:02 PM

pisorsisaac - Tuesday, October 3, 2017 2:55 PM

>> Merge isn't actually all that popular, in part because it doesn't always seem to work. <<

The real problem is that Microsoft didn't do a good job of implementing MERGE If you were using DB2, PostGres, Oracle, etc. it would be far more dependable, faster and work the way it's supposed to. This is probably why this guys developer friends sing its praises. Hell, we don't even get a warning about the quirky update and it's cardinality errors 🙁

Please post DDL and follow ANSI/ISO standards when asking for help.

sgmunson SSC Guru Points: 110640 More actions · Answer 9

jcelko212 32090 - Wednesday, October 4, 2017 7:46 AM

pisorsisaac - Tuesday, October 3, 2017 2:47 PM
>> (Please, diatribes regarding why such a table might exist are unneeded; it's not my table, it wasn't my design, and for all I know, it's a staging table for incoming garbage where there is nothing we can do but tolerate rows with the same values; perhaps they are being usefully aggregated somewhere else). <<
You really need to hear the diatribe, so you know what a table is. This doesn't qualify at all. We also need to go back and read some of Dr. Codd's old stuff. He introduced the concept of a "degree of duplication" column in tables; a count of the number of times a particular row appears in the multiset. It has to be equal to or greater than one. Every time you find a duplicate, increment that column..

Don't waste any of your time on this utterly unnecessary commentary. Joe tends to bloviate at random, whenever one of his hot-button issues gets it's panties in a wad from something he reads. While you could do the numbering with ROW_NUMBER(), that wouldn't do you any favors at all if there's a second similar table to be updated that is in the same condition. As you already have no unique identifiers, you can't necessarily know exactly how many dupes you are actually supposed to have of some given record if both tables contain a different number of dupes of that record. And of course, if one record in a duplicate pair changes on one side, but not on the other, then what? Might as well toss the whole lot of data and forget about it at that point. Joe tends to not think that far through these things.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)