June 25, 2008 at 11:30 am
The follow is the data I have in a single table:
createdateIDcolumnCustnumber
12/1/20052100
5/1/2007 1100
5/29/20072100
5/30/20073100
6/1/120074100
6/28/20075100
2/1/2006 2200
6/1/200071200
6/5/2007 2200
6/7/2007 3200
2/5/2005 4300
6/10/20081300
For the above data, if the first row's createdate is less than second row's createdate for the same custnumber 100,
then the IDcolumn of the first row should be updated to 1, and accordingly if the second row's createdate is less than
3rd row, then second row's IDcolumn should be 2 etc., the following is the output I am expecting:
createdateIDcolumnCustnumber
12/1/20051100
5/1/2007 2100
5/29/20073100
5/30/20074100
6/1/120075100
6/28/20076100
2/1/2006 1200
6/1/200072200
6/5/2007 3200
6/7/2007 4200
2/5/2005 1300
6/10/20082300
Any ideas are greatly appreciated, Thanks!!
June 25, 2008 at 2:03 pm
I happened to have one: :w00t:
declare @temp table
(
Createdate varchar(10),
IDColumn smallint,
CustNumber int
)
-- INSERT DATA for testing
insert into @temp values('12/1/2005', 2, 100)
insert into @temp values('5/1/2007', 1, 100)
insert into @temp values('5/29/2007', 2, 100)
insert into @temp values('5/30/2007', 3, 100)
insert into @temp values('6/1/12007', 4, 100)
insert into @temp values('6/28/2007', 5, 100)
insert into @temp values('2/1/2006', 2, 200)
insert into @temp values('6/1/20007', 1, 200)
insert into @temp values('6/5/2007', 2, 200)
insert into @temp values('6/7/2007', 3, 200)
insert into @temp values('2/5/2005', 4, 300)
insert into @temp values('6/10/2008', 1, 300)
-- CREATE A TEMP TABLE holding the sorted data
create table #temp
(
ID int identity(1, 1),
Createdate varchar(10),
IDColumn smallint,
CustNumber int
)
INSERT INTO #temp(CREATEDATE, IDColumn, CustNumber)
SELECT *
FROM @temp
ORDER BY CustNumber, CreateDate, IDColumn
UPDATE A -- UPDATE every first record of CustNumber
SET IDColumn = 1
FROM #temp A
INNER JOIN
(SELECT CustNumber, MIN(CreateDate) AS CreateDAte
FROM @temp
GROUP BY CustNumber
) B
ON A.CustNumber = B.CustNumber AND A.CreateDate = B.CreateDate
declare @row int
SELECT @row = MAX(X.CT)
FROM (SELECT CustNumber, COUNT(IDColumn) AS CT
FROM @temp
GROUP BY CustNumber) X
WHILE @row > 0
BEGIN
UPDATE A
SET A.IDColumn = B.IDColumn + 1
FROM #temp A
INNER JOIN #temp B
ON A.ID = B.ID + 1 AND A.CustNumber = B.CustNumber
SET @row = @row - 1
END
UPDATE A
SET A.IDColumn = B.IDColumn
FROM @temp A
INNER JOIN #temp B
ON A.CustNumber = B.CustNumber AND A.CreateDate = B.CreateDate
---- CHECKING
select * from @temp
drop table #temp
go
--******** RESULT*********--
Createdate IDColumn CustNumber
---------- -------- -----------
12/1/2005 1 100
5/1/2007 2 100
5/29/2007 3 100
5/30/2007 4 100
6/1/12007 5 100
6/28/2007 6 100
2/1/2006 1 200
6/1/20007 2 200
6/5/2007 3 200
6/7/2007 4 200
2/5/2005 1 300
6/10/2008 2 300
(12 row(s) affected)
June 25, 2008 at 2:44 pm
Try this:
UPDATE#temp
setIDColumn
= (Select count(*)
from #temp as CustPrior
whereCustPrior.CustNumber = #temp.CustNumber
and (CustPrior.Createdate < #temp.Createdate
or ( CustPrior.Createdate = #temp.Createdate
andCustPrior.IDColumn<= #temp.IDColumn
)))
SQL = Scarcely Qualifies as a Language
June 25, 2008 at 3:48 pm
Good code... but be careful when you get a lot of rows or the groups of rows become large. Take a look at the following article...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
--Jeff Moden
Change is inevitable... Change for the better is not.
June 27, 2008 at 3:28 pm
Nice article, Jeff.:D
June 27, 2008 at 6:36 pm
Aye. Thanks for the feedback... I sometimes wonder if posts like that do any good. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
June 28, 2008 at 6:04 am
Jeff is quite correct on the performance affect of a triangular join so I decided to compare using the RANK window function available under SQL Server 2005 and 2008.
For 100 Customers and 1,000 rows for each customer, the resoures used are:
[font="System"]
TableScanslogical reads
TriangleRankTriangleRank
#temp1000012 805064 574
WorkTable1 2203266405932
MillisecondsCPU msElapsed
Triangle35,95136,734
rank 1,462 1,508
[/font]
In short, using RANK is 36 times better than using a Triangle Join.
set statistics io off
set statistics time off
go
IF object_id('tempdb..#temp') is not null drop table #temp
create table #temp
(CustNumberintegernot null
,CreateDatedatetimenot null
,IDColumnsmallintnot null
, constraint temp_p primary key (CustNumber,Createdate))
INSERT into #temp
(CustNumber,CreateDate, IDColumn )
selectCustomer.N
,CAST( '2000-01-01' as datetime) + DayAdd.N
,0 as IDColumn
frommaster.dbo.TALLY as Customer
cross join
master.dbo.TALLY as DayAdd
whereCustomer.N between 1 and 100
andDayAdd.N between 1 and 1000
update statistics #temp
go
set statistics io on
set statistics time on
-- SQL Server 2000
UPDATE #temp
set IDColumn
= (Select count(*)
from #temp as CustPrior
where CustPrior.CustNumber = #temp.CustNumber
and ( CustPrior.Createdate < #temp.Createdate
or ( CustPrior.Createdate = #temp.Createdate
and CustPrior.IDColumn <= #temp.IDColumn
)))
-- SQL Server 2005 or 2008
UPDATE#temp
SETIDColumn = New.NewRank
from#tempas Old
JOIN(select Ranker.CustNumber
,Ranker.CreateDate
,RANK ( ) OVER (PARTITION BY Ranker.CustNumber order by Ranker.CreateDate) as NewRank
from#temp as Ranker
) as New
onNew.CustNumber= Old.CustNumber
andNew.CreateDate= Old.CreateDate
andNew.NewRank<> Old.IDColumn
SQL = Scarcely Qualifies as a Language
June 28, 2008 at 9:46 am
Just as a follow-up, here are the resource usages when performing a SELECT instead of an UPDATE
[font="System"]
ResourceScanlogical readsCPU MSElapsed MS
Triangle Join:100001 455,648 23,103 24,798
Rank 1 287 100 899
[/font]
set statistics io on
set statistics time on
-- Triangle Join
SelectCustomer.CustNumber
,Customer.Createdate
,count(*) as NewRank
from#temp as Customer
join#temp as CustomerPrior
on CustomerPrior.CustNumber = Customer.CustNumber
and CustomerPrior.Createdate <= Customer.Createdate
group by Customer.CustNumber
,Customer.Createdate
-- Rank Window for SQL Server 2005 or 2008
selectCustomer.CustNumber
,Customer.CreateDate
,RANK ( ) OVER (PARTITION BY Customer.CustNumber order by Customer.CreateDate) as NewRank
from#temp as Customer
SQL = Scarcely Qualifies as a Language
June 28, 2008 at 10:36 am
If you change Carl's code to 2000 per customer, you see where Triangular Joins really start to fall apart... time goes up by a factor of 4 for the Triangular join... time for the RANK stays just about the same. And that's the way it is with Triangular joins... double the number of rows, the amount of time goes up by a factor of about 4.
If you need to do ranking type functions in SQL Server 2000, please see the following article. The neat thing is that the code in the article is nearly as fast as that of the Rank function and is directly portable to SQL Server 2005...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply