Auto update table with correlated subquery

  • I have the following query that is supposed to update a date-column in a table based on the lowest value of another date from the same table for every userid:

    UPDATE dbo.S_TABLE

    SET LOAD_END_DTS = GETDATE()

    WHERE LOAD_END_DTS IS NULL AND

    LOAD_DTS = (SELECT MIN(X.LOAD_DTS) FROM dbo.S_TABLE X

    WHERE USERID = X.USERID)

    AND USERID IN

    (SELECT Y.USERID FROM dbo.S_TABLE Y

    WHERE Y.LOAD_END_DTS IS NULL

    GROUP BY Y.USERID

    HAVING COUNT(Y.LOAD_DTS)>1)

    The bold part is the correlated subquery that should make sure that the action is performed for EACH USERID.

    The second part of the query is just there to avoid updating rows for a USERID that only have one row with LOAD_END_DTS as null.

    However, the correlation does not seems to work as the MIN(X.LOAD_DTS) is established for the entire table and not for each USERID.

    What am I missing, this used to work perfectly in - for instance - Oracle..

  • If i understand it correctly,

    you want to update to the load_end_dts date

    where load_dts is the lowest value and where users have more then entry in the table?

    i think you should write the query this way:

    Update s

    Set s.load_end_dts = GetDate()

    From dbo.s_table as s

    Where s.load_end_dts is NULL

    AND Load_DTS = (SELECT MIN(x.Load_dts) from dbo.s_table x

    where x.userid = s.userid)

    and s.userid IN (SELECT Y.USERID FROM dbo.S_TABLE Y

    WHERE Y.LOAD_END_DTS IS NULL

    GROUP BY Y.USERID

    HAVING COUNT(Y.LOAD_DTS)>1)

    that should do it...

  • How about using a derived table

    Update s

    Set s.load_end_dts = GetDate()

    From dbo.s_table as s

    Join (Select UserId, Min(Load_Dts) as MinDts From dbo.s_table Group By UserId) as z

    On s.UserId = z.UserId And s.Load_Dts = z.MinDts

    Where s.load_end_dts is NULL

    and s.userid IN (SELECT Y.USERID FROM dbo.S_TABLE Y

    WHERE Y.LOAD_END_DTS IS NULL

    GROUP BY Y.USERID

    HAVING COUNT(Y.LOAD_DTS)>1)

    It might be more efficient than a correlated subquery.

  • Exactly, Go for the derived table it is much much more performant than a sub query.

  • I dont think that i agree,

    More readable? maybe, but more performant? I dont think so.

    If you can show where the performance gains are then i wil be more then happy to agree with you.

  • I said it might be more efficient.

    You can never tell for sure without testing. Comparative performance depends on data volumes, statistics, indexes, etc.

  • Richard, my reply was not directed to you. It was directed towards arjun. Testing would be necessary to determine which method is faster and indeed it all depends.

    my previous post was intended to say that testing would be the only way of being certain, but i now see that it did not come out that way.

    In the end, we have both solved the problem, being it thru different routes, it is now up to the OP to decide which way to go.

    maybe if i have some time when i am home tonight, i will recreate the problem, fill the table up to 200.000 rows and do some performance testing (because i am curious).

    I will let you know my results.

  • I have only had 1 chance to test the first suggestion by Hans, but it did not have the proper result. Looks like it gave the proper result for the lowest key (that matches the requirement that 2 records must exist for a given key with load_end_dts = null)

    Strange..

  • Blom, can you give an example, i am not sure i am following you.

    When you say key, what is the key? The subquery as i wrote it down gets the min value for Load_dts and not the key...

    I am more then happy to see where it goes wromg, bu i need a small example. Thanks!

  • I have the following table:

    GEBIDLOAD_DTS LOAD_END_DTS

    10017-7-2008 0:00:00 NULL

    10028-7-2008 0:00:00 14-7-2008 0:00:00

    100215-7-2008 0:00:00 NULL

    100220-8-2008 0:00:00 NULL

    10031-8-2008 0:00:00 NULL

    100320-8-2008 0:00:00 NULL

    I use the script:

    Update s

    Set s.load_end_dts = dateadd(ss,-1,(GetDate()))

    From dbo.s_gebruikers2 as s

    Where s.load_end_dts is NULL

    AND Load_DTS = (SELECT MIN(x.Load_dts) from dbo.s_gebruikers2 x

    where x.gebid = s.gebid)

    and s.gebid IN (SELECT Y.GEBID FROM dbo.s_gebruikers2 Y

    WHERE Y.LOAD_END_DTS IS NULL

    GROUP BY Y.GEBID

    HAVING COUNT(Y.LOAD_DTS)>1)

    Which results in:

    GEBIDLOAD_DTS LOAD_END_DTS

    10017-7-2008 0:00:00 NULL

    10028-7-2008 0:00:00 14-7-2008 0:00:00

    100215-7-2008 0:00:00 NULL

    100220-8-2008 0:00:00 NULL

    10031-8-2008 0:00:00 26-8-2008 11:17:05

    100320-8-2008 0:00:00 NULL

    Where I would expect the bold record to be updated to 26-8-2008 11:17:05 as well.

    I looked very hard, but not found the cause yet..

  • Blom,

    This should do what you need:

    Update s

    Set s.load_end_dts = dateadd(ss,-1,(GetDate()))

    From dbo.s_gebruikers2 as s

    Where s.Load_DTS = (SELECT MIN(x.Load_dts) from dbo.s_gebruikers2 x

    where x.gebid = s.gebid and x.load_end_dts is NULL)

    and s.gebid IN (SELECT Y.GEBID FROM dbo.s_gebruikers2 Y

    WHERE Y.LOAD_END_DTS IS NULL

    GROUP BY Y.GEBID

    HAVING COUNT(Y.LOAD_DTS)>1)

    it was excluding the 1002 row before, because in the main query it was looking for rows with load_end_dts is null, 1002 had a value so it was excluded.

    Try this and let me know!

    Hans..

  • Yep,

    Hans, This seems to work out right. Pretty obvious, but the more one stares, the more one misses such details.

    Thanks again for your input!

  • Heh... one person says it might be more performant, another says, with some vigor I may add, probably not. Neither has done a test and posted it. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • From my point of view the performance issue is secondary , since we're talking in the range of 5000 - 100000 records for a table. With SQL server 2005 and a decent server I expect this to be a breeze for either solution.

    However, your point is still valid.

    For those interested in the outcome, let us create a 100000 record sample table and test performance 🙂

  • blom0344 (9/4/2008)


    From my point of view the performance issue is secondary , since we're talking in the range of 5000 - 100000 records for a table. With SQL server 2005 and a decent server I expect this to be a breeze for either solution.

    However, your point is still valid.

    For those interested in the outcome, let us create a 100000 record sample table and test performance 🙂

    So... anybody gonna test this or what? 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply