August 28, 2008 at 10:41 pm
rbarryyoung (8/28/2008)
Jeff Moden (8/28/2008)
It's also part of the reason I wrote that forum ettiquette article on how to provide actual test data so that good folks like yourself don't have to spend so much time creating data and can get right to trying to solve the problem, instead.Man, I think that I linked to that article like 20 times today. Now I know why you and Jack put it into your signature!
Heh... yeah... I got tired of looking it up. Sure wish Steve could find a way to make it a "sticky" in some of the forums.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 28, 2008 at 10:59 pm
GSquared (8/28/2008)
The first query breaks them up into sets, based on the SetStart and SetEnd columns and the Date column.The second query then takes each set and gives it a sequence number within the set.
Is that what you're looking for? Is it clear enough?
Unfortunately this type of problem is not so clear to most users even in its recognition let alone a solution. Sql server doesn't help matters by offering no construct(s) to directly solve the problem. And what is the clearest, simplest way to characterize a problem like this so it becomes easy to communicate about it? This problem, along with relational division, appears to be sqls version of porn. You can't nail down the definition but you know it when you see it:)
These are two pretty good threads about approaching this type of problem. They do contain spoons but forks as well. FWIW I'm dassin, steve.
'searching for the longest subsequence of ones'
'Roll Up IP-Location Database'
August 29, 2008 at 8:08 am
I finally had time to test this, and I found something I hadn't noticed before. Your test data (first post) has duplicate IDs and all the dates and times are the same. Is that an accident, or does your real data also have duplicate ID values? Also, the test table has no primary key. Is that true of the real table or just an oversight on the test?
In the hope that your data doesn't actually have those problems, I created the following test and code:
CREATE TABLE #mytable
( ID INT IDENTITY(1,1) ,
DateValue DATETIME, Prd nvarchar(250), Statusid INT )
INSERT INTO #mytable
(DateValue,Prd, Statusid)
SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','B','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','B','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','B','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','B','2' UNION ALL
SELECT 'Oct 17 2007 12:00AM','B','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','B','2' UNION ALL
SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','A','1' UNION ALL
SELECT 'Oct 17 2007 12:00AM','A','2'
update #MyTable
set DateValue = dateadd(minute, ID, DateValue)
alter table #MyTable
add SetFirst bit not null default(0), SetLast bit not null default(0)
;with CTE (Row, Date, Prd, StatusID, SetFirst, SetLast) as
(select row_number() over (partition by Prd order by DateValue),
DateValue, Prd, StatusID, SetFirst, SetLast
from #MyTable)
update C1
set
SetFirst =
case
when C2.row is not null then 1
when not exists
(select *
from #MyTable mt
where mt.prd = c1.prd
and mt.datevalue < c1.date) then 1
else 0
end,
SetLast =
case
when C3.row is not null then 1
when not exists
(select *
from #MyTable mt
where mt.prd = c1.prd
and mt.datevalue > c1.date) then 1
else 0
end
from CTE C1
left outer join CTE C2 -- The prior row is a 2
on C1.Prd = C2.Prd
and C1.Row = C2.Row+1
and C1.StatusID = 1
and C2.StatusID = 2
left outer join CTE C3 -- Status 2 and next row is a 1
on C1.Prd = C3.Prd
and C1.Row = C3.Row-1
and C1.StatusID = 2
and C3.StatusID = 1
select *
from #MyTable
Because of the Not Exists tests, it has at least that much row-by-row. There's almost certainly a better way to do that particular test, but I don't have time to mess around with that right now. This way works, at least on this data.
Another thing I noticed about your test data is that you had two StatusID 2 in a row for Prd "B". That seems to indicate a close without an open. I got rid of that for my test, but if it really exists in your database, the code will have to correct for that.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 29, 2008 at 2:44 pm
Hi
GSquared
After runnig your both the query, for set and sequece, this is what i got
ID DateValue Prd statusid setfirst setlast setid setsequence
12007-10-17 00:01:00.000A11011
22007-10-17 00:02:00.000A10021
32007-10-17 00:03:00.000A10001
42007-10-17 00:04:00.000A10002
52007-10-17 00:05:00.000B11003
62007-10-17 00:06:00.000B10004
72007-10-17 00:07:00.000B10005
82007-10-17 00:08:00.000B20106
92007-10-17 00:09:00.000B11007
102007-10-17 00:10:00.000B20108
112007-10-17 00:11:00.000A10009
122007-10-17 00:12:00.000A100010
132007-10-17 00:13:00.000A100011
142007-10-17 00:14:00.000A100012
152007-10-17 00:15:00.000A201013
but i was looking for something like this! and setfirst was good escept for the last Prd A
ID DateValue Prd statusid setfirst setlast setid setsequence
12007-10-17 00:01:00.000A11011
22007-10-17 00:02:00.000A10021
32007-10-17 00:03:00.000A10001
42007-10-17 00:04:00.000A10001
52007-10-17 00:05:00.000B11001
62007-10-17 00:06:00.000B10001
72007-10-17 00:07:00.000B10001
82007-10-17 00:08:00.000B20101
92007-10-17 00:09:00.000B11002
102007-10-17 00:10:00.000B20102
112007-10-17 00:11:00.000A11003
122007-10-17 00:12:00.000A10003
132007-10-17 00:13:00.000A10003
142007-10-17 00:14:00.000A10003
152007-10-17 00:15:00.000A20103
So sequence 1 is one block , Sequence two is another block and Sequence 3 is one more block
and after acheiving this i can trap the first and last record of each sequence which is my ultimate goal
I guess i am not messing up your labor day weekend!
I am really excited and thrilled from your unconditional help for new guy in SQL
Thanks
Simon
August 29, 2008 at 4:33 pm
Hi
Gsquared
i copied the previous data from sql pane and they are not aligining to each other, i think you can
estimate its order. Please let me know if its not readable
Thanks
Simon
August 29, 2008 at 10:06 pm
The common problem of dividing data into groups boils down to creating a dense rank. Here we want a dense rank on Statusid but ordered by ID (or DateValue). This problem should be easily solved using the Dense_Rank() function in S2005. Unfortunately all the ranking functions assume the target of the rank is also its order. The order cannot be independent of the target. This dependency was intended to be solved by the full implementation of the OVER statement in sql-99. This is known as the sql window. With the full OVER the target of a rank and its order can be independently specified. The Dense_Rank() function is actually a short hand for a generalized dense rank (using OVER). This rank is based on a 'running sum' over the target column (or more generally an expression involving the column) and can be ordered by another column.The running sum is
equivalent to a dense rank.
To get this type of dense rank you simply have to state the rule that increments the run. This is the same as deciding what rule bounds each rank (group). In this problem each rank is bounded by a value of 1 followed by a 2. The run is incremented whenever there is a Statusid change from a some value other than 1 (2,null) to 1. Here all you have to do is check the prior row value of Statusid. If the prior row (based in the order of ID) is a 2 the current row starts a new group (the running sum is incremented by 1).
Sample data.
CREATE TABLE #mytable
(ID INT PRIMARY KEY,
DateValue DATETIME, Prd VARCHAR(10), Statusid INT )
INSERT INTO #mytable
(ID, DateValue, Prd, Statusid)
SELECT 1, '2007-10-17 00:01:00.000','A',1 UNION ALL
SELECT 2, '2007-10-17 00:02:00.000','A',1 UNION ALL
SELECT 3, '2007-10-17 00:03:00.000','A',1 UNION ALL
SELECT 4, '2007-10-17 00:04:00.000','A',1 UNION ALL
SELECT 5, '2007-10-17 00:05:00.000','B',1 UNION ALL
SELECT 6, '2007-10-17 00:06:00.000','B',1 UNION ALL
SELECT 7, '2007-10-17 00:07:00.000','B',1 UNION ALL
SELECT 8, '2007-10-17 00:08:00.000','B',2 UNION ALL
SELECT 9, '2007-10-17 00:09:00.000','B',1 UNION ALL
SELECT 10,'2007-10-17 00:10:00.000','B',2 UNION ALL
SELECT 11,'2007-10-17 00:11:00.000','A',1 UNION ALL
SELECT 12,'2007-10-17 00:12:00.000','A',1 UNION ALL
SELECT 13,'2007-10-17 00:13:00.000','A',1 UNION ALL
SELECT 14,'2007-10-17 00:14:00.000','A',1 UNION ALL
SELECT 15,'2007-10-17 00:15:00.000','A',2 UNION ALL
An outer join would make it easy to check the prior Statusid.
SELECT A.ID,A.DateValue,A.Prd,A.Statusid,B.ID,B.Statusid
FROM #mytable AS A LEFT JOIN #mytable AS B
ON A.ID=B.ID+1
ORDER BY A.ID
Using the full implemention of OVER (sql-99 window) and the Sum aggregate
you could form the groups, get a dense rank, with one pass thru the join:
SELECT A.ID,A.DateValue,A.Prd,A.Statusid,
Sum(CASE WHEN B.Statusid=2 THEN 1 ELSE 0 END)
OVER(ORDER BY A.ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS Setsequence
FROM #mytable AS A LEFT JOIN #mytable AS B
ON A.ID=B.ID+1
Even the join could be eliminated by using a LAG function of OVER (gets the
previous ID value)
SELECT ID,DateValue,Prd,Statusid,
Sum(CASE WHEN LAG(ID,1)=2 THEN 1 ELSE 0 END)
OVER(ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS Setsequence
FROM #mytable
Well all this is pie in the sky for sql server because MS for some other worldy
reason(s) decided the full implementation of OVER could wait. The are various ways to simulate OVER such as combinations of ranking functions/counts, updates, subqueries etc. None are as simple and straightforward as OVER. Of course for sql server I prefer to use Rac:)
Exec Rac
@TRANSFORM='_dummy_', -- Plays no part in computations.
@ROWS='ID & DateValue(d) & Prd & Statusid',
@PVTCOL='report mode',
@FROM='#mytable',
@DEFAULTS1='y',@RACHECK='y',@SHELL='n',
-- @ROWRUNS is a running sum that does not involve a @TRANSFORM, hense Dumy:)
@ROWRUNS='Dumy^CASE WHEN PRIOR.Statusid=2 THEN 1 ELSE 0 END^(Dumy)',
@ROWRUNSLABEL='DenseRank',
-- Add 1 to the rank to start at 1 (instead of 0).
@SELECT='SELECT _ROWS_,(1*DenseRank)+1 AS Setsequence
FROM rac
ORDER BY rd'
ID DateValue Prd Statusid Setsequence
------ ------------------- ---- -------- -----------
1 Oct 17 2007 12:01AM A 1 1
2 Oct 17 2007 12:02AM A 1 1
3 Oct 17 2007 12:03AM A 1 1
4 Oct 17 2007 12:04AM A 1 1
5 Oct 17 2007 12:05AM B 1 1
6 Oct 17 2007 12:06AM B 1 1
7 Oct 17 2007 12:07AM B 1 1
8 Oct 17 2007 12:08AM B 2 1
9 Oct 17 2007 12:09AM B 1 2
10 Oct 17 2007 12:10AM B 2 2
11 Oct 17 2007 12:11AM A 1 3
12 Oct 17 2007 12:12AM A 1 3
13 Oct 17 2007 12:13AM A 1 3
14 Oct 17 2007 12:14AM A 1 3
15 Oct 17 2007 12:15AM A 2 3
You could flag 1st and last rows of each group or whatever. But I think I'll stop here:)
August 30, 2008 at 3:54 pm
Hi
Rog
Thanks for your input on this!
Is there any other way to get this result without using RAC? thats the exact result that i was looking
for . I am using sql2k5 , that is also a big problem to use RAC
Thanks
simon
August 30, 2008 at 4:50 pm
Ummm... I thought Rog gave a solution that didn't include RAC... take a look at his post again. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 30, 2008 at 5:24 pm
Hi
Rog, Jeff
I have problem running this statement
SELECT A.ID,A.DateValue,A.Prd,A.Statusid,
Sum(CASE WHEN B.Statusid=2 THEN 1 ELSE 0 END)
OVER(ORDER BY A.ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)-- this line throws error
AS Setsequence
FROM #mytable AS A LEFT JOIN #mytable AS B
ON A.ID=B.ID+1
Msg 102, Level 15, State 1, Line 4
Incorrect syntax near 'ROWS'.
Thanks
Simon
August 30, 2008 at 7:19 pm
simon phoenix (8/30/2008)
HiRog, Jeff
I have problem running this statement
SELECT A.ID,A.DateValue,A.Prd,A.Statusid,
Sum(CASE WHEN B.Statusid=2 THEN 1 ELSE 0 END)
OVER(ORDER BY A.ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)-- this line throws error
AS Setsequence
FROM #mytable AS A LEFT JOIN #mytable AS B
ON A.ID=B.ID+1
Msg 102, Level 15, State 1, Line 4
Incorrect syntax near 'ROWS'.
Thanks
Simon
Sorry for any confusion. The above form of OVER is NOT supported in S2k5. But RAC does run in S2k5 🙂 Anyway here is a t-sql solution that will simulate the running sum of OVER. Actually it's not as bad as I perhaps made it out to be but its no substitute for the real thing. I'm using the same sample data used in RAC.
WITH C AS
(
SELECT A.ID,A.DateValue,A.Prd,A.Statusid,B.Statusid AS PriorStatusid
FROM #mytable AS A LEFT JOIN #mytable AS B
ON A.ID=B.ID+1
)
select D.ID,D.DateValue,D.Prd,D.Statusid,
Sum(CASE WHEN E.PriorStatusid=2 THEN 1 ELSE 0 END)+1 AS Setsequence
FROM C AS D JOIN C AS E
ON E.ID BETWEEN 0 AND D.ID
GROUP BY D.ID,D.DateValue,D.Prd,D.Statusid
ORDER BY D.ID
Note there are 2 joins, one to access the prior row Statusid and one to form the rows for grouping. It's these joins that OVER (and RAC) eliminate. For large data sets performance could be a problem here. But that's another issue 🙂
August 30, 2008 at 7:25 pm
Sorry... My bad... Roger's too. That syntax, starting at ROWS, isn't part of T-SQL in SQL Server 2005.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 30, 2008 at 8:05 pm
Hi
Rog
You are the SQl Guru, Thanks a lot , i really appreciat it, it was just bugging my head since last week. Now only job is to trap the begining and end of those distinct sequence and i think i can do it.
If i ran into trouble , i will post it again, please look into this thread for few more days, i am really close in getting my task done with the help of great guys like you, Gsquared and jeff. i can't explain
how greatful i am to find good people like you guys exist
God bless you Rog, Jeff and Gsquared
Thanks
Simon
----------Happy Labor Day----------
August 30, 2008 at 8:27 pm
Heh... ok... my turn. 🙂 And sorry it took me so long to get to this problem...
First, in order to be fast and short, this fun problem does actually require some procedureal code, but we won't be using a Cursor or a While loop. And, we don't need extra columns nor do we need to buy a special product to get it done. AND, we don't need to self join to the same table 4 times to do it either. All we need is what appears to be the correct logical clustered index which you probably already have anyway.
First, the test data... do read the comments please... this little trick requires a clustered index on the ID column...
drop table #yourtable
go
--===== Setup the demonstration table...
-- I'm assuming that the ID column is the CLUSTERED PRIMARY KEY
-- and that it's named PK_YourTable_ID. Please adjust as necessary.
-- I'm also assuming that it has a "SetSequence" column.
CREATE TABLE #YourTable
(
ID INT IDENTITY(1,1),
DateValue DATETIME,
Prd CHAR(1),
StatusID TINYINT,
SetSequence INT,
CONSTRAINT PK_YourTable_ID PRIMARY KEY CLUSTERED (ID)
)
INSERT INTO #YourTable
(DateValue,Prd,StatusID)
SELECT '2007-10-17 00:01:00.000','A','1' UNION ALL
SELECT '2007-10-17 00:02:00.000','A','1' UNION ALL
SELECT '2007-10-17 00:03:00.000','A','1' UNION ALL
SELECT '2007-10-17 00:04:00.000','A','1' UNION ALL
SELECT '2007-10-17 00:05:00.000','B','1' UNION ALL
SELECT '2007-10-17 00:06:00.000','B','1' UNION ALL
SELECT '2007-10-17 00:07:00.000','B','1' UNION ALL
SELECT '2007-10-17 00:08:00.000','B','2' UNION ALL
SELECT '2007-10-17 00:09:00.000','B','1' UNION ALL
SELECT '2007-10-17 00:10:00.000','B','2' UNION ALL
SELECT '2007-10-17 00:11:00.000','A','1' UNION ALL
SELECT '2007-10-17 00:12:00.000','A','1' UNION ALL
SELECT '2007-10-17 00:13:00.000','A','1' UNION ALL
SELECT '2007-10-17 00:14:00.000','A','1' UNION ALL
SELECT '2007-10-17 00:15:00.000','A','2'
GO
--===== Show the content of the test table
SELECT * FROM #YourTable
The key to this problem is very much like "trailing edge digital logic". We don't actually care where the 1's and 2's are... we only care when the StatusID drops in value. To translate into pseudo code, when the StatusID of the previous row is LESS THAN OR EQUAL TO the current row, the SetSequence will remain the same, otherwise, we need to add 1 to the SetSequence.
Now, the clustered index makes it easy using the very same technique in the following article...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
... here's the code that does it... at a rate of a million rows every 7 seconds... 😛
--===== Declare some necessary variables
DECLARE @Sequence INT, --Remembers the current sequence
@PrevStatusID TINYINT --Remembers the StatusID from the previous row
SELECT @Sequence = 1,
@PrevStatusID = 1
--===== Do a procedural update using SQL Server's proprietary update
UPDATE #YourTable
SET @Sequence = SetSequence = CASE WHEN @PrevStatusID <= StatusID THEN @Sequence ELSE @Sequence+1 END,
@PrevStatusID = StatusID
FROM #YourTable WITH(INDEX(PK_YourTable_ID))
--===== Show the results
SELECT * FROM #YourTable
Questions? 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
August 30, 2008 at 8:30 pm
simon phoenix (8/30/2008)
HiRog
You are the SQl Guru, Thanks a lot , i really appreciat it, it was just bugging my head since last week. Now only job is to trap the begining and end of those distinct sequence and i think i can do it.
If i ran into trouble , i will post it again, please look into this thread for few more days, i am really close in getting my task done with the help of great guys like you, Gsquared and jeff. i can't explain
how greatful i am to find good people like you guys exist
God bless you Rog, Jeff and Gsquared
Thanks
Simon
----------Happy Labor Day----------
Simon, "you da man!" Thanks for the awesome thought. Happy Labor Day!
--Jeff Moden
Change is inevitable... Change for the better is not.
August 30, 2008 at 8:50 pm
simon phoenix (8/30/2008)
Now only job is to trap the begining and end of those distinct sequence and i think i can do it.
Missed that... sorry. Easy to do now that the SetSequence is done...
--===== Show first and last date of each SetSequence
SELECT SetSequence, MIN(DateValue) AS StartDateTime, MAX(DateValue) AS EndDateTime
FROM #YourTable
GROUP BY SetSequence
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 48 total)
You must be logged in to reply to this topic. Login to reply