June 17, 2009 at 10:32 am
I have an imported table with group names that are sharing a special code among rows. The business rule states that the most recently entered group should have the special code and that all others should be blank. The example will probably make more sense. My question is this: is there a more efficient way to do my update query?
The scenario requires that the table be loaded as is seen in the example. The only option to change the data is through an UPDATE statement.
Thanks!
IF OBJECT_ID('tempdb..#Test') IS NOT NULL DROP TABLE #Test
--
CREATE TABLE #Test (
GroupName varchar(50) NOT NULL,
SpecCode varchar(10) NULL,
DateEntered datetime )
--
INSERT #Test (GroupName, SpecCode, DateEntered)
SELECT 'Test Group 1', '1111-1111', '1/1/2009'
UNION ALL
SELECT 'Test Group 2', '1111-1111', '1/2/2009'
UNION ALL
SELECT 'Test Group 3', '1111-1111', '2/1/2009'
UNION ALL
SELECT 'Test Group 4', '2222-2222', '1/1/2009'
UNION ALL
SELECT 'Test Group 5', '3333-3333', '1/1/2009'
UNION ALL
SELECT 'Test Group 6', '3333-3333', '1/2/2009'
--
/*
Table looks like this:
--
GroupName SpecCode DateEntered
Test Group 11111-11112009-01-01
Test Group 21111-11112009-01-02
Test Group 31111-11112009-02-01
Test Group 42222-22222009-01-01
Test Group 53333-33332009-01-01
Test Group 63333-33332009-01-02
--
Needs to look like this after UPDATE:
--
GroupName SpecCode DateEntered
Test Group 1 2009-01-01
Test Group 2 2009-01-02
Test Group 3 1111-1111 2009-02-01
Test Group 4 2222-2222 2009-01-01
Test Group 5 2009-01-01
Test Group 6 3333-3333 2009-01-02
*/
--
-- Update table so that for all rows that are
-- sharing a SpecCode, only the most recently
-- entered row keeps the code. The other rows
-- have a blank SpecCode. But this only applies
-- to rows which are sharing the SpecCode
--
SELECT *
FROM #Test
--
UPDATE t
SET SpecCode = ''
FROM #Test t
JOIN (
SELECT GroupName, DateEntered, SpecCode, ROW_NUMBER() OVER (PARTITION BY SpecCode ORDER BY DateEntered DESC) AS RowRank
FROM #Test
WHERE SpecCode IN (
SELECT SpecCode
FROM #Test
WHERE SpecCode IS NOT NULL
GROUP BY SpecCode
HAVING COUNT(*) > 1 )
) x
ON t.GroupName = x.GroupName
AND t.DateEntered = x.DateEntered
AND t.SpecCode = x.SpecCode
AND x.RowRank <> 1
--
SELECT *
FROM #Test
June 17, 2009 at 11:27 am
Here's how I'd write the update. You'd have to test on a volume of data to see whether it's enough better to be worth it.
;with CTE as
(select SpecCode,
row_number() over (partition by SpecCode order by DateEntered desc) as Row
from #Test)
update CTE
set SpecCode = ''
where Row > 1;
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 17, 2009 at 11:32 am
How about this:
update a
set speccode = ''
from #test a
where not exists
(select 1
from #test b
where a.speccode = b.speccode
group by speccode
having max(dateentered) = a.dateentered)
Randy
June 17, 2009 at 12:15 pm
The requirement is "the most recently entered group should have the special code and that all others should be blank."
GSquared's solution needs a minor correction to partition by GroupName instead of SpecCode but is much faster than an exists solution.
GSquared solution:
Table 'Groups'. Scan count 1, logical reads 5817
SQL Server Execution Times: CPU time = 60 ms, elapsed time = 107 ms.
Exists solution:
Table 'Groups'. Scan count 2, logical reads 5834
Table 'Worktable'. Scan count 101, logical reads 18313
SQL Server Execution Times: CPU time = 291 ms, elapsed time = 349 ms.
CREATE TABLE Groups (
GroupName varchar(50) NOT NULL,
DateEntered datetime NOT NULL,
SpecCode varchar(10) NULL,
Constraint Groups_P primary key (GroupName, DateEntered))
/*
The master.dbo.Tally table is the conventional "numbers" or "sequences" table and the SQL to create that table and populate is in Jeff Moden's article at http://www.sqlservercentral.com/articles/T-SQL/62867/
*/
INSERT Groups
(GroupName, SpecCode, DateEntered)
SELECT'TEST GROUP ' + cast( Groups.N as varchar(8) )
,cast( Groups.N as varchar(8) )
,DATEADD(dd, Days.N,'2009-06-01')
FROMmaster.dbo.Tally as Groups
CROSS JOIN
master.dbo.Tally as Days
WHEREGroups.N between 1 and 100
ANDDays.N between 1 and 30
go
update statistics Groups with fullscan
go
set statistics io on
set statistics time on
go
begin tran
;with CTE as
(select SpecCode,
row_number() over (partition by GroupName order by DateEntered desc) as Row
from Groups)
update CTE
set SpecCode = ''
where Row > 1;
select * fromGroups
rollback
begin tran
UPDATEGroups
setSpecCode = ''
whereEXISTS
(Select 1
fromgroups as GroupLater
whereGroupLater.GroupName = Groups.GroupName
andGroupLater.DateEntered> Groups.DateEntered
)
select * fromGroups
rollback
SQL = Scarcely Qualifies as a Language
June 17, 2009 at 3:12 pm
Here is another way:UPDATE
T
SET
SpecCode = ''
FROM
(
SELECT SpecCode, ROW_NUMBER() OVER (PARTITION BY SpecCode ORDER BY DateEntered DESC) AS RowRank
FROM #Test
) AS T
WHERE T.RowRank > 1
EDIT: I'm getting lazy and didn't read all of Carl's solution. I just saw that he was using a CTE which is not needed so I came up with this solution, which I see is bascially the same as Carls CTE.
June 17, 2009 at 4:20 pm
Actually, GSquared proposed the CTE solution but I was interested in comparing the performance to the "legacy" exists solution, which is needed when the RDBMS does not support window functions such as RANK.
Here is another solution that uses except and also works better than the EXISTS solution and shows the need for the "corresponding" option on EXCEPT.
Table 'Groups'. Scan count 3, logical reads 6051
Table 'Worktable'. Scan count 1, logical reads 6114
SQL Server Execution Times: CPU time = 80 ms, elapsed time = 136 ms.
begin tran
UPDATEGroups
setSpecCode = ''
FROMGROUPS
JOIN(select GroupName, DateEntered
from GROUPS
EXCEPT
select GroupName, MAX(DateEntered)
from GROUPS
GROUP BY GroupName
) as GroupLatest
on GroupLatest.GroupName = Groups.GroupName
rollback
SQL = Scarcely Qualifies as a Language
June 17, 2009 at 4:24 pm
OOPS - For the EXCEPT, missed a condition in the ON, so please add:
and GroupLatest.DateEntered = Groups.DateEntered
Table 'Groups'. Scan count 3, logical reads 5851
Table 'Worktable'. Scan count 1, logical reads 5914
SQL Server Execution Times: CPU time = 110 ms, elapsed time = 149 ms.
SQL = Scarcely Qualifies as a Language
June 17, 2009 at 4:35 pm
Carl Federl (6/17/2009)
Actually, GSquared proposed the CTE solution but I was interested in comparing the performance to the "legacy" exists solution, which is needed when the RDBMS does not support window functions such as RANK.
Ahh yes.. indeed GSquared did. *sigh* I need to get my head in the game.. 🙂
June 17, 2009 at 4:50 pm
For fun here is a one more way using a LEFT OUTER JOIN:UPDATE
T
SET
speccode = ''
FROM
#Test AS T
LEFT OUTER JOIN
(
select speccode, MAX(dateentered) AS MaxDate
from #test
group by speccode
) AS D
ON T.SpecCode = D.SpecCode
AND T.dateentered = d.MaxDate
WHERE
D.SpecCode IS NULL
June 18, 2009 at 7:31 am
No, if you change mine to partition by GroupName, it doesn't get the desired end result. It needs to be the way I wrote it to comply with the specification.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply