August 20, 2008 at 5:58 am
I have the following query that is supposed to update a date-column in a table based on the lowest value of another date from the same table for every userid:
UPDATE dbo.S_TABLE
SET LOAD_END_DTS = GETDATE()
WHERE LOAD_END_DTS IS NULL AND
LOAD_DTS = (SELECT MIN(X.LOAD_DTS) FROM dbo.S_TABLE X
WHERE USERID = X.USERID)
AND USERID IN
(SELECT Y.USERID FROM dbo.S_TABLE Y
WHERE Y.LOAD_END_DTS IS NULL
GROUP BY Y.USERID
HAVING COUNT(Y.LOAD_DTS)>1)
The bold part is the correlated subquery that should make sure that the action is performed for EACH USERID.
The second part of the query is just there to avoid updating rows for a USERID that only have one row with LOAD_END_DTS as null.
However, the correlation does not seems to work as the MIN(X.LOAD_DTS) is established for the entire table and not for each USERID.
What am I missing, this used to work perfectly in - for instance - Oracle..
August 20, 2008 at 9:47 am
If i understand it correctly,
you want to update to the load_end_dts date
where load_dts is the lowest value and where users have more then entry in the table?
i think you should write the query this way:
Update s
Set s.load_end_dts = GetDate()
From dbo.s_table as s
Where s.load_end_dts is NULL
AND Load_DTS = (SELECT MIN(x.Load_dts) from dbo.s_table x
where x.userid = s.userid)
and s.userid IN (SELECT Y.USERID FROM dbo.S_TABLE Y
WHERE Y.LOAD_END_DTS IS NULL
GROUP BY Y.USERID
HAVING COUNT(Y.LOAD_DTS)>1)
that should do it...
August 20, 2008 at 5:32 pm
How about using a derived table
Update s
Set s.load_end_dts = GetDate()
From dbo.s_table as s
Join (Select UserId, Min(Load_Dts) as MinDts From dbo.s_table Group By UserId) as z
On s.UserId = z.UserId And s.Load_Dts = z.MinDts
Where s.load_end_dts is NULL
and s.userid IN (SELECT Y.USERID FROM dbo.S_TABLE Y
WHERE Y.LOAD_END_DTS IS NULL
GROUP BY Y.USERID
HAVING COUNT(Y.LOAD_DTS)>1)
It might be more efficient than a correlated subquery.
August 20, 2008 at 11:34 pm
Exactly, Go for the derived table it is much much more performant than a sub query.
August 21, 2008 at 5:25 am
I dont think that i agree,
More readable? maybe, but more performant? I dont think so.
If you can show where the performance gains are then i wil be more then happy to agree with you.
August 21, 2008 at 10:07 am
I said it might be more efficient.
You can never tell for sure without testing. Comparative performance depends on data volumes, statistics, indexes, etc.
August 21, 2008 at 11:01 am
Richard, my reply was not directed to you. It was directed towards arjun. Testing would be necessary to determine which method is faster and indeed it all depends.
my previous post was intended to say that testing would be the only way of being certain, but i now see that it did not come out that way.
In the end, we have both solved the problem, being it thru different routes, it is now up to the OP to decide which way to go.
maybe if i have some time when i am home tonight, i will recreate the problem, fill the table up to 200.000 rows and do some performance testing (because i am curious).
I will let you know my results.
August 23, 2008 at 4:18 am
I have only had 1 chance to test the first suggestion by Hans, but it did not have the proper result. Looks like it gave the proper result for the lowest key (that matches the requirement that 2 records must exist for a given key with load_end_dts = null)
Strange..
August 23, 2008 at 5:36 am
Blom, can you give an example, i am not sure i am following you.
When you say key, what is the key? The subquery as i wrote it down gets the min value for Load_dts and not the key...
I am more then happy to see where it goes wromg, bu i need a small example. Thanks!
August 26, 2008 at 3:34 am
I have the following table:
GEBIDLOAD_DTS LOAD_END_DTS
10017-7-2008 0:00:00 NULL
10028-7-2008 0:00:00 14-7-2008 0:00:00
100215-7-2008 0:00:00 NULL
100220-8-2008 0:00:00 NULL
10031-8-2008 0:00:00 NULL
100320-8-2008 0:00:00 NULL
I use the script:
Update s
Set s.load_end_dts = dateadd(ss,-1,(GetDate()))
From dbo.s_gebruikers2 as s
Where s.load_end_dts is NULL
AND Load_DTS = (SELECT MIN(x.Load_dts) from dbo.s_gebruikers2 x
where x.gebid = s.gebid)
and s.gebid IN (SELECT Y.GEBID FROM dbo.s_gebruikers2 Y
WHERE Y.LOAD_END_DTS IS NULL
GROUP BY Y.GEBID
HAVING COUNT(Y.LOAD_DTS)>1)
Which results in:
GEBIDLOAD_DTS LOAD_END_DTS
10017-7-2008 0:00:00 NULL
10028-7-2008 0:00:00 14-7-2008 0:00:00
100215-7-2008 0:00:00 NULL
100220-8-2008 0:00:00 NULL
10031-8-2008 0:00:00 26-8-2008 11:17:05
100320-8-2008 0:00:00 NULL
Where I would expect the bold record to be updated to 26-8-2008 11:17:05 as well.
I looked very hard, but not found the cause yet..
August 27, 2008 at 8:57 am
Blom,
This should do what you need:
Update s
Set s.load_end_dts = dateadd(ss,-1,(GetDate()))
From dbo.s_gebruikers2 as s
Where s.Load_DTS = (SELECT MIN(x.Load_dts) from dbo.s_gebruikers2 x
where x.gebid = s.gebid and x.load_end_dts is NULL)
and s.gebid IN (SELECT Y.GEBID FROM dbo.s_gebruikers2 Y
WHERE Y.LOAD_END_DTS IS NULL
GROUP BY Y.GEBID
HAVING COUNT(Y.LOAD_DTS)>1)
it was excluding the 1002 row before, because in the main query it was looking for rows with load_end_dts is null, 1002 had a value so it was excluded.
Try this and let me know!
Hans..
September 3, 2008 at 1:22 am
Yep,
Hans, This seems to work out right. Pretty obvious, but the more one stares, the more one misses such details.
Thanks again for your input!
September 3, 2008 at 7:36 pm
Heh... one person says it might be more performant, another says, with some vigor I may add, probably not. Neither has done a test and posted it. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
September 4, 2008 at 1:31 am
From my point of view the performance issue is secondary , since we're talking in the range of 5000 - 100000 records for a table. With SQL server 2005 and a decent server I expect this to be a breeze for either solution.
However, your point is still valid.
For those interested in the outcome, let us create a 100000 record sample table and test performance 🙂
September 4, 2008 at 8:38 pm
blom0344 (9/4/2008)
From my point of view the performance issue is secondary , since we're talking in the range of 5000 - 100000 records for a table. With SQL server 2005 and a decent server I expect this to be a breeze for either solution.However, your point is still valid.
For those interested in the outcome, let us create a 100000 record sample table and test performance 🙂
So... anybody gonna test this or what? 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply