Update using Join to multiple Records - which one wins?

  • Hello there, it feels like a surprisingly basic question.

    In the update statement below we have a few examples where alt_code2 is not unique in table b and description differs (it is also not unique in table a)

    My question is could we be certain that we would retrieve the same b record for each alt_code2? (i.e I accept we wouldn't know which matching record (b) would take precedence but would it be consistent?)

    UPDATE a

    SET --a.DIAGNOSIS_CODE = b.CODE ,

    a.DIAGNOSIS_DESC = b.DESCRIPTION

    FROM #DimDiagnosisCodes a

    INNER JOIN dbo.Ref_ICD_10_edition4 b ON a.ALT_CODE2 = b.ALT_CODE2;

    (Of course, in reality the join is incorrect and our fix will be to correct the join - this is about improving my SQL knowledge)

     

  • it doesn't sound like the join is incorrect, it seems like the problem is the data, and that you have a many-to-many relationship. Seems like you need to fix the data; seems odd that a code can appear multiple times so seems you need to fix that and have one definition for a single code.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A wrote:

    it doesn't sound like the join is incorrect, it seems like the problem is the data, and that you have a many-to-many relationship. Seems like you need to fix the data; seems odd that a code can appear multiple times so seems you need to fix that and have one definition for a single code.

    the M:N Part is correct, but that is only because the join is "wrong"

    the join column is what makes it M:N in the first place

     

    The question is if the results are consistent in:

    • what happens on the next time the update is executed, is there a possibility that one of the other possible "options" now replaced a previous one just based on a new exection of the update

    I just tested the following:

    created a inner join that has M:N aswell and ran an update on it, making a backup of the updated column and reran the update, the updated columns atleast were identical with the backuped column values for all rows

    i added more columns to the 2nd table i was joining in, the number of rows became even higher but after the update it was still the same result, back uped column = updated column

    • do the columns with the same ALT_CODE2 have the same update or can it differ

    when looking at the test table i updated, i have no different values for the join column i decided to use for the M:N join

    so it seems they all got the same values

     

    atleast it seems like there is some consistency on the updated in that test, cant garuantee though that would someone have to answer that actually knows how execution plans work

     

     

    EDIT:

    BIG UFF ON MY PART

    i coincidentally have a issue of this topic, i get different results using differect SSMS Windows (connections), in my tests i always used the same holy sh1t thats tough

    • This reply was modified 5 years, 4 months ago by  ktflash.
    • This reply was modified 5 years, 4 months ago by  ktflash.
  • As I understand it, if multiple matches are possible, SQL can match any row.  That is, the results are unknown and you should assume they are effectively random.

    Similarly, if you specify "SELECT TOP (1) *" without an ORDER BY, SQL can return any row in the table.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • This was removed by the editor as SPAM

  • As Scott says, the result is highly unreliable and as far as I know, undocumented. This is actually a significant flaw with SQL Server's unconventional and non-standard method of doing joins in an UPDATE statement. The problem is arguably Sybase's fault not Microsoft's because it dates from pre-Microsoft SQL Server times. The fix is to modify your join clause but I would argue that SQL Server ought to fail-safe in this case and return a warning or error message rather than silently updating your data with unpredictable results.

    The equivalent MERGE statement does the right thing in this case:

    MERGE INTO #DimDiagnosisCodes a
    USING dbo.Ref_ICD_10_edition4 b
    ON a.ALT_CODE2 = b.ALT_CODE2
    WHEN MATCHED THEN UPDATE
    SET DIAGNOSIS_CODE = b.CODE,
    DIAGNOSIS_DESC = b.DESCRIPTION;

    If ALT_CODE2 is not unique in table b then when attempting to use MERGE you should get the following error message: "The MERGE statement attempted to UPDATE or DELETE the same row more than once."

    The MERGE version being standard SQL it perhaps also has the advantage of being more likely to be understood by more people.

  • Thanks for all answers - Nova yes a really thought-provoking answer

  • If you know which row you want to use from table dbo.Ref_ICD_10_edition4 then you can change the update to use a cross apply with select top(1) and order by.

    UPDATE a
    SET a.DIAGNOSIS_DESC = b.DESCRIPTION
    FROM #DimDiagnosisCodes a
    CROSS APPLY(SELECT TOP(1) DESCRIPTION
    FROM dbo.Ref_ICD_10_edition4 b
    WHERE a.ALT_CODE2 = b.ALT_CODE2
    ORDER BY b.ColumnToOrderOn) b;

     

  • I usually solve the problem by doing a JOIN first to get the set of records with a "SortSeq" column that calculates "1" for my preferred row, the JOIN that back to the update table.  In this case, all the necessary data is contained in the second table, so that can be used directly.

    UPDATEa
    SETa.DIAGNOSIS_DESC = b.DESCRIPTION
    FROM#DimDiagnosisCodes a
    INNER JOIN (
    SELECTALT_CODE2
    DESCRIPTION,
    SortSeq= ROW_NUMBER() OVER (PARTITION BY ALT_CODE2 ORDER BY {{sorting fields}})
    FROMdbo.Ref_ICD_10_edition4
    ) b
    ONa.ALT_CODE2 = b.ALT_CODE2
    WHEREb.SortSeq = 1;

    Make sure that your sorting fields calculate "1" for the desired row.

    If your main table is large and the temp table is small, you may get better performance by joining the temp table to the base table first, then  joining the result back to the temp table to do the update.

    UPDATEu
    SETu.DIAGNOSIS_DESC = x.DESCRIPTION
    FROM#DimDiagnosisCodes u
    INNER JOIN (
    SELECTa.ALT_CODE2,
    b.DESCRIPTION,
    SortSeq= ROW_NUMBER() OVER (PARTITION BY a.ALT_CODE2 ORDER BY {{sorting fields}})
    FROM#DimDiagnosisCodes a
    INNER JOIN dbo.Ref_ICD_10_edition4 b
    ONa.ALT_CODE2 = b.ALT_CODE2
    ) x
    ONu.ALT_CODE2 = x.ALT_CODE2;
    WHEREx.SortSeq = 1

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply