March 13, 2007 at 12:49 pm
Good Afternoon;
Any chance I can do the following in T-SQL. I've done something similar, and as usual, can't find the source.
Table:
Code Status
1 S
2 T
3 T
4 T
5 S
6 S
7 S
10 A
11 T
12 T
.
.
.
Results desired:
min max status
1 1 S
2 4 T
5 7 S
10 10 A
11 14 T
.
.
appreciate any thoughts...
Randy.
March 13, 2007 at 1:01 pm
Does column Code contain unique ascending sequence of values ?
March 13, 2007 at 1:05 pm
Yes. There may be skips (1,2,3,5,6,7,11,12,13), but they are unique, ascending.
March 14, 2007 at 8:52 am
This one is a little ugly but it should work. Depending on your real purpose this might be a job for a cursor. Otherwise, if you are using SQL 2005 you can do it with a PARTITION statement (Google for "SQL count consecutive rows same value"). Otherwise, here is a shot:
ISNULL(
(SELECT TOP 1 PriorTable.Code AS PriorCodeStatus
FROM MyTable AS PriorTable
WHERE PriorTable.Code <= MyTable.Code
AND PriorTable.Status = MyTable.Status
AND PriorTable.Code >=
ISNULL(
(SELECT TOP 1 PrevTable.Code FROM MyTable AS PrevTable
WHERE PrevTable.Code < MyTable.Code
AND PrevTable.Status <> MyTable.Status
ORDER BY Code Desc),0)
ORDER BY Code),0)
AS MinCode,
ISNULL(
(SELECT TOP 1 MaxTable.Code AS MaxCodeStatus
FROM MyTable AS MaxTable
WHERE MaxTable.Code >= MyTable.Code
AND MaxTable.Status = MyTable.Status
AND MaxTable.Code <=
ISNULL(
(SELECT TOP 1 NxtTable.Code FROM MyTable AS NxtTable
WHERE NxtTable.Code > MyTable.Code
AND NxtTable.Status <> MyTable.Status
ORDER BY Code),Code)
ORDER BY Code DESC),0)
AS MaxCode, Status
FROM MyTable
ORDER BY MinCode
Basically it is a series of nested TOP 1 queries which determine the first, last, previous and next Code for a given Status grouping.
Hope that helps.
Brian
March 16, 2007 at 6:01 am
Brian -
Sorry for the long pause, but thanks, that was the trick! I'm breaking it down now to 'burn -in' the logic.
March 16, 2007 at 7:11 am
The following is similar to Brian's code but should be more efficient. If you have more than 10000 rows you may be better off using cursors/temp tables.
SELECT MIN(D.MinCode) AS MinCode
,D.MaxCode
,D.status
FROM (
SELECT T1.code AS MinCode
,MAX(T2.code) AS MaxCode
,T1.status
FROM YourTable T1
JOIN YourTable T2
ON T1.status = T2.status
AND T1.code <= T2.code
WHERE T1.status = ALL (
SELECT T3.status
FROM YourTable T3
WHERE T3.code BETWEEN T1.code AND T2.code
)
GROUP BY T1.code, T1.status
) D
GROUP BY D.MaxCode, D.status
ORDER BY MinCode
March 16, 2007 at 7:24 am
The logic for working out the above is:
1. Self join the table on status where one code is less than or equal to the other code:
SELECT T1.code AS MinCode
,T2.code AS MaxCode
,T1.status
FROM YourTable T1
JOIN YourTable T2
ON T1.status = T2.status
AND T1.code <= T2.code
2. Filter out rows where intermediate statuses are not the same.
SELECT T1.code AS MinCode
,T2.code AS MaxCode
,T1.status
FROM YourTable T1
JOIN YourTable T2
ON T1.status = T2.status
AND T1.code <= T2.code
WHERE T1.status = ALL (
SELECT T3.status
FROM YourTable T3
WHERE T3.code BETWEEN T1.code AND T2.code
)
3. Then do MIN and MAX to get the start and end codes of the range.
March 16, 2007 at 7:41 am
Ken, Brian-
Thanks for the replys. Now off to do validation as both answers returned different #rows. Let me compare the two and I'll get back.
March 16, 2007 at 7:43 am
Thanks to David le Quesne for showing me the above approach. Here is the thread:
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=303210&p=2
March 16, 2007 at 8:48 am
After review -
Original table contained non-unique entries, in error. Obtained correct list and refreshed. Table contains 848 rows.
Both solutions now retrive identical count & valued rows (157). Only difference is time. Brians solution returns in less than 1 second , while Kens took 11 seconds.
March 16, 2007 at 10:45 am
In case someone wanted Brian's solution broken down into its process and components I went back through and did that.
Process:
1. Select all the rows (MyTable)
2. SubQuery: Get the highest Code (PrevTable) where the Status doesn't match the current row and Code < current row (Top 1 DESC)
3. SubQuery: Get the lowest Code (PriorTable) where the Status is equal to the the current row and Code > #2 result. (Top 1 ASC) That should be the first occurrence of this Status in the current group.
4. Wrap #3 result in an IsNull->0 and that is the Minimum Code for the group
5. SubQuery: Get the lowest Code (NxtTable) where the Status doesn't match the current row and Code is greater than current row, return current code for Null (Top 1 ASC)
6. SubQuery: Get the highest Code (MaxTable) where the Status is equal to the the current row and Code <= #5 result. (Top 1 DESC) That should be the last occurrence of this Status in the current group.
7. I think there is an unnecessary IsNull->0 there.
8. Return only the DISTINCT records to eliminate the dupes since the recordset is not grouped.
#1 and 8:
SELECT DISTINCT
(#2-4) AS MinCode,
(5-7) AS MaxCode,
Status
FROM MyTable ORDER BY MinCode
#2:
ISNULL(
(SELECT TOP 1 PrevTable.Code FROM MyTable AS PrevTable
WHERE PrevTable.Code < MyTable.Code
AND PrevTable.Status <> MyTable.Status
ORDER BY Code Desc),0)
#3-4:
ISNULL(
(SELECT TOP 1 PriorTable.Code AS PriorCodeStatus
FROM MyTable AS PriorTable
WHERE PriorTable.Code <= MyTable.Code
AND PriorTable.Status = MyTable.Status
AND PriorTable.Code >=
//***** #1 Subquery Here
ORDER BY Code),0)
#5:
ISNULL(
(SELECT TOP 1 NxtTable.Code FROM MyTable AS NxtTable
WHERE NxtTable.Code > MyTable.Code
AND NxtTable.Status <> MyTable.Status
ORDER BY Code),Code)
#6-7:
ISNULL(
(SELECT TOP 1 MaxTable.Code AS MaxCodeStatus
FROM MyTable AS MaxTable
WHERE MaxTable.Code >= MyTable.Code
AND MaxTable.Status = MyTable.Status
AND MaxTable.Code <=
//***** #5 Subquery Here
ORDER BY Code DESC),0)
I hope that is helpful. I think you can eliminate the IsNull in #6-7 and also the one that wraps #3. Beyond that it is just a brute force solution that avoids the restrictions that Group By imposes on subqueries. To really do this properly I would probably attempt to do this with a cursor and with a temp table or a function that returns a table and compare the execution plans and statistics based on the actual data. Of course, like lots of us, once I have a solution it is on to the next crisis...
March 16, 2007 at 11:34 am
Thanks for this.
As a matter of interest, how quick is my style of solution if there is an index on status,code?
March 17, 2007 at 12:45 pm
Try this too!
-- Prepare sample data
DECLARE @Sample TABLE (ID INT, Status VARCHAR(9))
INSERT @Sample
SELECT 1, 'S' UNION ALL
SELECT 2, 'T' UNION ALL
SELECT 3, 'T' UNION ALL
SELECT 5, 'T' UNION ALL
SELECT 10, 'S' UNION ALL
SELECT 16, 'S' UNION ALL
SELECT 17, 'S' UNION ALL
SELECT 20, 'A' UNION ALL
SELECT 31, 'T' UNION ALL
SELECT 32, 'T'
-- Stage the data
DECLARE @Stage TABLE (RecID INT IDENTITY(1, 1), ID INT, Status VARCHAR(9), First INT, Last INT)
INSERT @Stage (ID, Status)
SELECT ID,
Status
FROM @Sample
ORDER BY ID
UPDATE s
SET s.First = (SELECT MAX(x.RecID) FROM @Stage AS x WHERE x.Status <> s.Status AND x.RecID < s.RecID) + 1,
s.Last = (SELECT MIN(x.RecID) FROM @Stage AS x WHERE x.Status <> s.Status AND x.RecID > s.RecID) - 1
FROM @Stage AS s
UPDATE @Stage
SET First = 1
WHERE First IS NULL
DECLARE @Max INT
SELECT @Max = MAX(RecID)
FROM @Stage
WHERE Last IS NULL
UPDATE @Stage
SET Last = @max-2
WHERE Last IS NULL
-- Show the expected result
SELECT DISTINCT t1.ID AS [Min],
t2.ID AS [Max],
s.Status
FROM @Stage AS s
INNER JOIN @Stage AS t1 ON t1.RecID = s.First
INNER JOIN @Stage AS t2 ON t2.RecID = s.Last
ORDER BY t1.ID
N 56°04'39.16"
E 12°55'05.25"
March 18, 2007 at 3:12 pm
As some have suggested, both methods get pretty slow, even in the presence of proper indexing, simply because of the triangular joins involved. As is typical of large triangular joins, they spawn so many "internal" rows trying to come up with an answer that they become pretty much unusable for anything more than about 10,000 rows. Here's a chart of the performance of Brian's, Ken's, and a "new" query to do this… All times are in seconds...
Rows |
500
884
1000
2000
4000
8000
16000
100000
1000000
Jeff
0.123
0.190
0.250
0.436
0.483
0.843
1.390
6.716
41.860
Brian
0.576
1.483
1.830
7.080
29.373
118.720
466.576
Ken
0.376
1.296
1.640
6.483
30.283
145.938
762.016
Before we get to the "Jeff" query, lemme just say that the idea of using either a CURSOR or a WHILE Loop would probably get the job done but I'm not so sure it would get it done as quickly. Yeah, a cursor would probably do it... but not as fast as what I'm going to show you... and, don't balk at the temp table... where do you think a cursor is stored? Thats right... TempDB... just like the internals for triangular joins or a temp table.
So, if you have a million row table that looks like this...
SET NOCOUNT ON
USE TempDB
--===== If the test table already exists, drop it
IF OBJECT_ID('MyTable') IS NOT NULL
DROP TABLE MyTable
--===== Create the test table
CREATE TABLE MyTable (Code INT IDENTITY(1,1),Status CHAR(1) NOT NULL)
--===== Populate the test table with required data
INSERT INTO MyTable (Status)
SELECT TOP 1000000
CAST(RAND(CAST(NEWID() AS VARBINARY))*3.0 AS INT)+1 AS Status
FROM Master.dbo.SYSCOLUMNS sc1,
Master.dbo.SYSCOLUMNS sc2
UPDATE MyTable
SET Status = CASE STATUS WHEN '1' THEN 'A' WHEN '2' THEN 'S' WHEN '3' THEN 'T' END
--===== Add some indexes just to be fair
ALTER TABLE MyTable
ADD CONSTRAINT PK_MyTable_Code_Status PRIMARY KEY CLUSTERED (Code ASC,Status ASC)
CREATE INDEX IX_MyTable2 ON MyTable (Code DESC,Status DESC)
--===== Display the first 100 rows
SELECT top 100 * FROM MyTable ORDER BY Code
... the following code will work faster than a cursor (just can't bring myself to use one ) and just about any other method...
--===== Variable for speed measurement only
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
--===== Move the data into a temp table where we can work on it
-- and add a couple of extra columns (takes about 3 seconds_
SELECT Code,Status,CAST(NULL AS INT) AS MyMin, CAST(NULL AS INT) AS MyMax
INTO #MyHead
FROM MyTable
--===== Add some required indexing (takes about 5 seconds)
ALTER TABLE #MyHead
ADD CONSTRAINT PK_MyHead_Code_Status PRIMARY KEY CLUSTERED (Code ASC,Status ASC)
CREATE INDEX IX_MyHead2 ON #MyHead (Code DESC,Status DESC)
--===== Create an update statement using dynamic SQL so the new indexes can be used
-- in this same batch without an error (takes 0 seconds)
DECLARE @sql VARCHAR(8000)
SET @sql = '
DECLARE @PrevStatus CHAR(1)
DECLARE @PrevCode INT
SET @PrevStatus = '' ''
SET @PrevCode = 0
--===== "Smears" the Mins across all rows
UPDATE #MyHead
SET @PrevCode = MyMin = CASE WHEN Status = @PrevStatus THEN @PrevCode ELSE Code END,
@PrevStatus = Status
FROM #MyHead WITH (INDEX(PK_MyHead_Code_Status))
SET @PrevStatus = '' ''
SET @PrevCode = 0
--===== "Smears" the Maxs across all rows
UPDATE #MyHead
SET @PrevCode = MyMax = CASE WHEN Status = @PrevStatus THEN @PrevCode ELSE Code END,
@PrevStatus = Status
FROM #MyHead WITH (INDEX(IX_MyHead2))
'
--===== Execute the dynamic SQL to do the min/max "smears" (takes about 19 seconds)
EXEC (@SQL)
--===== Display the desired result (takes about 10 seconds)
SELECT DISTINCT
MyMin AS Min, MyMax AS Max, Status
FROM #MyHead
ORDER BY Min
DROP TABLE #MyHead
PRINT DATEDIFF(ms,@StartTime,GETDATE())
--Jeff Moden
Change is inevitable... Change for the better is not.
March 18, 2007 at 3:32 pm
Peter, not sure what your's would do if you added a primary key to the table but it currently take about 15 seconds to do 4,000 rows, 60 seconds to do 8,000 rows, and 240 seconds to do 16,000 rows (double the rows takes 4 times longer). Like I said, those nasty little triangular joins just eat your face off
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply