Select And Count Consecutive Numbers

Question

Post reply

Select And Count Consecutive Numbers

David-250683

Hall of Fame

Points: 3452
More actions
May 19, 2007 at 2:11 am

#116813

1) How can I select only the first record in a series of certain consecutive numbers?
2) How can I obtain the count for the number of times a set of those certain consecutive numbers appear?
Example: (this assumes an un-altered Northwind database)
USE Northwind
GO
WITH Peter AS
(SELECT EmployeeID, Freight,
ROW_NUMBER() OVER(PARTITION BY EmployeeID ORDER BY Freight) AS [RecordNumber]
FROM Orders)
SELECT * FROM Peter
Partial Result Set
EmployeeID Freight RecordNumber
-------------- -------- ------------------
1 0.21 1
1 0.45 2
1 0.93 3
1 1.27 4
1 1.35 5
1 1.36 6
1 1.51 7
1 1.66 8
1 2.50 9
1 3.94 10
1 4.03 11
1 4.27 12
1 4.41 13
1 4.98 14
1 4.99 15
1 4.99 16
1 7.46 17
From the list, records 5 and 6 show consecutive Freight figures of 1.35 and 1.36 for EmployeeID 1.
From the list, records 14 and 15 show consecutive Freight figures of 4.98 and 4.99 for EmployeeID 1.
In this example how could I get a result set that only showed records, 1.35 and 4.98 - that is the first record in a series of consecutive Freight figures?
How could I get a count of the number of consecutive records for each set?
A partial result set should look like this.
EmployeeID 1stFreightRecord Count
1 1.35 2
1 4.98 2

Viewing 15 posts - 1 through 15 (of 45 total)

You must be logged in to reply to this topic. Login to reply

Jan-155192 SSCrazy Points: 2613 More actions · Answer 1

David,

Is it always going to be incrementals steps of 0.01?

The DISTICT was necessary because there where 2 values of 4.99, namely 16 and 17.

I hope this is what you needed?

WITH Peter AS

(SELECT EmployeeID, Freight,

ROW_NUMBER() OVER(PARTITION BY EmployeeID ORDER BY Freight) AS [RecordNumber]

FROM Orders)

SELECT DISTINCT p2.RecordNumber FROM Peter p1

INNER JOIN Peter P2

ON p1.Freight = (p2.Freight + 0.01)

Jan

David-250683 Hall of Fame Points: 3452 More actions · Answer 2

Jan,

Thank you for your efforts. In this example, yes, the next value would be in an increment of .01 because that would be the next consecutive number. The RecordNumber field is not necessary. It was only used to make it convenient for the reader to easily identify rows 5 and 14 without having to count down the rows on the screen.

Your response only gives the RecordNumber. I swapped out p2.RecordNumber for p2.Freight and received numbers that were not the first number in a series. The first output was .15. When running the original query to then check, I did not see .16 for any EmployeeID. I also need the result set grouped by the EmployeedID for each series and the count of each consecutive numbers in the series. Since there may be more than one series per EmployeeID, I have not had success using GROUP BY since that groups the entire result set for each EmployeeID. That's one of the challenges I am facing. The other challenges are to just find numbers in a sequence and return the first record in the set. Lastly to get a count of the consecutive numbers in each set.

David

Jan-155192 SSCrazy Points: 2613 More actions · Answer 3

use

tempdb;

CREATE

TABLE test( EmployeeId int, Freight decimal(9,2))

-------------- -------- ------------------

INSERT

test(EmployeeID, Freight) VALUES (1,0.21)

INSERT

test(EmployeeID, Freight) VALUES (1,0.45)

INSERT

test(EmployeeID, Freight) VALUES (1,0.93)

INSERT

test(EmployeeID, Freight) VALUES (1,1.27)

INSERT

test(EmployeeID, Freight) VALUES (1,1.35)

INSERT

test(EmployeeID, Freight) VALUES (1,1.36)

INSERT

test(EmployeeID, Freight) VALUES (1,1.51)

INSERT

test(EmployeeID, Freight) VALUES (1,1.66)

INSERT

test(EmployeeID, Freight) VALUES (1,2.50)

INSERT

test(EmployeeID, Freight) VALUES (1,3.94)

INSERT

test(EmployeeID, Freight) VALUES (1,4.03)

INSERT

test(EmployeeID, Freight) VALUES (1,4.27)

INSERT

test(EmployeeID, Freight) VALUES (1,4.41)

INSERT

test(EmployeeID, Freight) VALUES (1,4.98)

INSERT

test(EmployeeID, Freight) VALUES (1,4.99)

INSERT

test(EmployeeID, Freight) VALUES (1,4.99)

INSERT

test(EmployeeID, Freight) VALUES (1,7.46)

-- Add an employee 2

INSERT

test(EmployeeID, Freight) VALUES (2,4.41)

INSERT

test(EmployeeID, Freight) VALUES (2,4.98)

INSERT

test(EmployeeID, Freight) VALUES (2,4.99)

INSERT

test(EmployeeID, Freight) VALUES (2,4.99)

INSERT

test(EmployeeID, Freight) VALUES (2,7.46)

-- Query

SELECT

DISTINCT p2.EmployeeID, p2.Freight FROM Test p1

INNER JOIN Test P2

ON

p1.Freight = (p2.Freight + 0.01)

ORDER BY 1

-- Result: the result seems to be ok to me:

EmployeeID Freight

1 1.35

1 4.98

2 4.98

I would only use a group by if you use an aggregate function. Surely you're not summing or adding id's or anthing.

Jan

David-250683 Hall of Fame Points: 3452 More actions · Answer 4

Jan,

Thanks again for another suggestion. However, this does not provide the count of the number of freight records contained in a sequence. It also shows a duplicate. I could easily create a test table with a SELECT INTO statement; however, it would be to cumbersome to handpick records to employee2. There are 830 records in the Orders table.

The data from Northwind is used only as an example so others would have a data set to test with.

Jan-155192 SSCrazy Points: 2613 More actions · Answer 5

David,

My solution did not show a duplicate. I just added a second employee because you were mentioning that you needed a group by employeeId; but in your test data there was only one EmployeeId. I needed it to test if the query is valid.

Indeed, I overlooked the Count, which is also the reason why I wondered in my last post why you needed the GROUP BY.

Anyway, it is easily added to the solution:

SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) as CountFreight

FROM

(SELECT p2.EmployeeID, p2.Freight

FROM test p1

INNER JOIN test P2

ON p1.EmployeeID = p2.EmployeeID

AND p1.Freight = (p2.Freight + 0.01)) r

GROUP BY r.EmployeeID, r.Freight

Mind that in the testcase you provided there where doubles in the data: freight 4.99. So thisis the count of 2 successors that are provided in the query. I hope I understood this time correctly what you want.

I don't immedeately see where the SELECT INTO would help. Anyway this would involve more IO; the query here doesn't explicitely create temporary data.

Hope we are getting nearer.

Jan

David-250683 Hall of Fame Points: 3452 More actions · Answer 6

SOLUTION

SELECT

EmployeeID

,LowFreight

,(

SELECT count(*) FROM Orders PS

WHERE PS.EmployeeID = T.EmployeeID AND (PS.Freight BETWEEN T.LowFreight AND T.HighFreight)

) as [SeriesRowCount]

FROM

(

SELECT EmployeeID, Freight as LowFreight

,(

SELECT min(Freight) FROM Orders PS3

WHERE EXISTS

(

SELECT * FROM Orders PS2

WHERE ( PS2.EmployeeID = PS3.EmployeeID AND PS2.Freight = (PS3.Freight - .01) )

)

AND NOT EXISTS

(

SELECT * FROM Orders PS2

WHERE ( PS2.EmployeeID = PS3.EmployeeID AND PS2.Freight = (PS3.Freight + .01) )

)

AND PS3.EmployeeID = PS1.EmployeeID

AND PS3.Freight > PS1.Freight

) as HighFreight

FROM Orders PS1

WHERE EXISTS

(

SELECT * FROM Orders PS2

WHERE ( PS2.EmployeeID = PS1.EmployeeID AND PS2.Freight = (PS1.Freight + .01) )

)

AND NOT EXISTS

(

SELECT * FROM Orders PS2

WHERE ( PS2.EmployeeID = PS1.EmployeeID AND PS2.Freight = (PS1.Freight - .01) )

)

) T

ORDER BY EmployeeID, LowFreight

Jeff Moden SSC Guru Points: 1004313 More actions · Answer 7

David,

Seems a bit complex in light of Jan's second solution. What was wrong with it?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David-250683 Hall of Fame Points: 3452 More actions · Answer 8

Hi Jeff,

Jan's solution did not work. I appreciate her effort.

Jan-155192 SSCrazy Points: 2613 More actions · Answer 9

Hi Jeff,

Sure my code works! I never post anything without testing it first. It is of course possible that we misunderstand what is needed, but even that is not the case here. I was unaware that these were samples of the Northwind database; that's why I created my test table. In the following query I only changed the table name from "test" into "Orders" in order to run it in Northwind and added the ORDER BY EmployeeID, Freight in order to compare it with Davids query; the rest is exactly as I posted it before:

SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) as CountFreight

FROM

(SELECT p2.EmployeeID, p2.Freight

FROM Orders p1

INNER JOIN Orders P2

ON p1.EmployeeID = p2.EmployeeID

AND p1.Freight = (p2.Freight + 0.01)) r

GROUP BY r.EmployeeID, r.Freight

ORDER BY EmployeeID, Freight

Running this, we see that the only difference in result I have with David is that my CountFreight is always one less then Davids. This is how I understood the question: I just counted the "followers". If David wants to count one more, fine, all we need to add is "+1" and we get exactly the same result as David, but with much simpler code and a much better query plan than his. I invite everyone who has a SQL and Northwind running to compare both solutions and their query plan.

SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) + 1 as CountFreight

FROM

(SELECT p2.EmployeeID, p2.Freight

FROM Orders p1

INNER JOIN Orders P2

ON p1.EmployeeID = p2.EmployeeID

AND p1.Freight = (p2.Freight + 0.01)) r

GROUP BY r.EmployeeID, r.Freight

ORDER BY EmployeeID, Freight

And I still don't think it is a good idea to post exactly the same question in 2 threads.

Regards,

Jan

David-250683 Hall of Fame Points: 3452 More actions · Answer 10

Jan,

I appreciate your effort and glowing positive attitude; however, your suggestion does not work. This is apparent without the need of examining the data in Northwind. Here is your result set:

EmployeeID Freight CountFreight

1 1.35 1

1 4.98 3

4 1.26 1

4 3.04 1

4 29.6 1

4 59.13 1

7 41.89 1

8 4.41 1

1) The result is supposed to, in part, obtain the count for the number of times a set of consecutive numbers appear. If the count is 1, then there are no consecutive numbers.

2) You state you did not know this question was posted with data from the Northwind database. "Northwind" was stated twice in the first posting and repeated on my 2nd reply.

3) Your suggestion involved adding non-existent data.

4) The question is not posted in two threads. It is posted in two different forums.

I welcome a shorter suggestion; but need a solution.

Jan-155192 SSCrazy Points: 2613 More actions · Answer 11

Jeff and all others,

I have helped a lot of people here, but this game is becoming too anoying. Why should I continue this child play?

1) Would you be so kind to READ what I said? I fist understood that only the followers needed counting; so that's why I gave you the version with +1 which gives EXACTLY the same result as yours. Is putting this +1 hard to imagine?

SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) + 1 as CountFreight

FROM

(SELECT p2.EmployeeID, p2.Freight

FROM Orders p1

INNER JOIN Orders P2

ON p1.EmployeeID = p2.EmployeeID

AND p1.Freight = (p2.Freight + 0.01)) r

GROUP BY r.EmployeeID, r.Freight

ORDER BY EmployeeID, Freight

Result:

1 1,35 2

1 4,98 3

4 1,26 2

4 3,04 2

4 29,60 2

4 59,13 2

7 41,89 2

8 4,41 2

2) Ok, I overlooked this and I used your "partial result set" as data in a test table. My test table took exactly the same structure so it didn't even matter. Was it so hard to imangine that the name "test" needed replacing by "orders"?

3) My suggestion did not involve adding non-existing data. It was a TEST. I wanted to add duplicates and employees to group by, exactly in the way that they exist in Northwind. Of course you don't NEED this data, it works with any data. Proof of this: run this query in Northwind and you get EXACTLY the same as yours.

4) Clearly in 2 forums means in 2 threads as well. The point is that we, who are trying to help others, see the same question over and over again. And we don't see the suggestions already offered before.

I have helped a lot of people here. I like problems because they are intelectually stimulating to look into. I do very much welcome somebody improving my code and can accept that I make errors (you can check my posting history on that). But this is an objective thing, it can be run by everyone. I see 272 views, so lots of people are spending time reading this nonsence. I invite everyone to run Davids query and mine (and yes, please, the version with the "+1" as posted here) the will get the EXACT SAME result with less code and a SIMPLER query plan.

Jan

David-250683 Hall of Fame Points: 3452 More actions · Answer 12

Jan,

As angry as you are I still appreciate your efforts; however, your suggestion still does not work.

The data used for this question was Northwind because it is a database we all have access to. A solution must work fundamentally, not just for Northwind. The solution must work whether there are two, three, or a hundred consecutive numbers. It turns out that the Orders table has at most three consecutive numbers according to the criteria in the original posting. Although, your suggestion with a +1 to the count works for Orders, it fails to work fundamentally. To prove this, simply insert more consecutive numbers:

INSERT INTO Orders (EmployeeID, Freight) VALUES (1, 1.37)

INSERT INTO Orders (EmployeeID, Freight) VALUES (1, 1.38)

Now look at your result set.

Jan-155192 SSCrazy Points: 2613 More actions · Answer 13

Jan-155192

SSCrazy

Points: 2613

May 26, 2007 at 8:00 am

#709122

Ok, I was wrong.

Jan

Jeff Moden SSC Guru Points: 1004313 More actions · Answer 14

Someone could probably do this in SQL Server 2005 using row numbers and correlated subqueries... but I don't have access to 2005 and I hate correlated subqueries just 'cause...

First... Here's the test data I used... notice that I've added a dupe (4.99 like in the original test data and another "multi-dupe at 9.01) and a couple of places where there are more than 2 items that qualify for David's request... I also did like Jan did... I included more than 1 EmployeeID so that we can see that this works across multiple employee's without additional code... to keep everyone safe, I did this in a temp table... please read all comments in all code that follows...

--=======================================================================================

-- Create some test data. This is NOT part of the solution... it's just to give us

-- all something common to work with for testing.

--=======================================================================================

--===== If the table that holds the test data already exists drop it

IF OBJECT_ID('TempDB..#MyHead') IS NOT NULL

DROP TABLE #MyHead

--===== Recreate the test table and populate it with test data

CREATE TABLE #MyHead (EmployeeID INT,Freight DECIMAL(6,2))

INSERT INTO #MyHead (EmployeeID,Freight)

SELECT 1,0.21 UNION ALL

SELECT 1,0.45 UNION ALL

SELECT 1,0.93 UNION ALL

SELECT 1,1.27 UNION ALL

SELECT 1,1.35 UNION ALL --should be a "hit" with count of "2"

SELECT 1,1.36 UNION ALL

SELECT 1,1.51 UNION ALL

SELECT 1,1.66 UNION ALL

SELECT 1,2.50 UNION ALL

SELECT 1,3.94 UNION ALL

SELECT 1,4.03 UNION ALL

SELECT 1,4.27 UNION ALL

SELECT 1,4.41 UNION ALL

SELECT 1,4.98 UNION ALL --should be a "hit" with count of "2"

SELECT 1,4.99 UNION ALL

SELECT 1,4.99 UNION ALL --dupe (not included according to David's specs)

SELECT 1,7.46 UNION ALL

SELECT 1,8.00 UNION ALL --should be a "hit" with count of "2"

SELECT 1,8.01 UNION ALL

SELECT 1,9.01 UNION ALL --should be a "hit" with count of "4"

SELECT 1,9.02 UNION ALL

SELECT 1,9.03 UNION ALL

SELECT 1,9.04 UNION ALL

SELECT 2,4.27 UNION ALL --different employee id

SELECT 2,4.41 UNION ALL

SELECT 2,4.98 UNION ALL --should be a "hit" with count of "3"

SELECT 2,4.99 UNION ALL

SELECT 2,4.99 UNION ALL --dupe (not included according to David's specs)

SELECT 2,5.00 UNION ALL

SELECT 2,7.46 UNION ALL

SELECT 2,8.00 UNION ALL --should be a "hit" with count of "3"

SELECT 2,8.01 UNION ALL

SELECT 2,8.02 UNION ALL

SELECT 2,9.01 UNION ALL --should be a "hit" with count of "4"

SELECT 2,9.02 UNION ALL

SELECT 2,9.02 UNION ALL --dupe (not included according to David's specs)

SELECT 2,9.03 UNION ALL

SELECT 2,9.04 UNION ALL

SELECT 2,9.07 UNION ALL

SELECT 3,9.07 UNION ALL --just to show one EmployeeID won't bleed into the next

SELECT 3,9.08

Ok... because I find it handy to "peel on potato at a time" (also known as "Divide and Conquer"), let's first isolate the data that qualifies for David's problem so we don't have to worry about rows that don't ...

--=======================================================================================

-- First, let's start by isolating only the rows in the data the qualify for David's

-- problem... those are rows that are .01 apart in the Freight column for any given

-- customer... to "play" this against the Orders table (or any other table), just

-- change #MyHead to the correct table name.

--=======================================================================================

--===== If the table that holds the isolated data already exists drop it

IF OBJECT_ID('TempDB..#Work') IS NOT NULL

DROP TABLE #Work

--===== Create and populate the work table with a clustered primary key and a special

-- column to support some "trick" code later on

CREATE TABLE #Work

(

EmployeeID INT NOT NULL,

Freight DECIMAL(6,2) NOT NULL,

MyGroup INT, --This is the "special" column

PRIMARY KEY CLUSTERED (EmployeeID,Freight)

)

INSERT INTO #Work (EmployeeID,Freight)

SELECT DISTINCT

t1.EmployeeID,t1.Freight

FROM #MyHead t1,

#MyHead t2

WHERE (t1.Freight+.01 = t2.Freight)

AND t1.EmployeeID = t2.EmployeeID

UNION ----------------------------------------------

SELECT DISTINCT

t1.EmployeeID,t1.Freight

FROM #MyHead t1,

#MyHead t2

WHERE (t1.Freight-.01 = t2.Freight)

AND t1.EmployeeID = t2.EmployeeID

--===== Here's what's in the work table right now

SELECT * FROM #WORK

... A bit more potato peeling... let's break contiguous/sequential rows (.01 apart) into groups ...

--=======================================================================================

-- This bit of "trick" code relies on the order of the data which is

-- forced by the Clustered Primary Key

=======================================================================================

--===== Declare a couple of local variables to support the "trick" code

DECLARE @MyGroup INT

SET @MyGroup = 0

DECLARE @LastFreight DECIMAL(6,2)

SET @LastFreight = -1

--===== Here's the "trick"... this "groups" rows that are .01 apart in Freight using the

-- proprietary UPDATE clause of SQL Server... magic... and blows cursor speed away!!!

UPDATE w

SET @MyGroup = MyGroup = CASE

WHEN Freight = @LastFreight + .01

THEN @MyGroup

ELSE @MyGroup+1

END,

@LastFreight = Freight

FROM #Work w --Order of processing is by the Clustered Primary key

--===== Here's what's in the work table right now... see where we're going with this????

SELECT * FROM #WORK

... and, finally, let's dump the peeled potato's into a basket...

--=======================================================================================

-- So, with that having been done, it's now easy to get the results you want.

--=======================================================================================

SELECT EmployeeID, MIN(Freight) AS Freight,COUNT(*) AS Items

FROM #Work

GROUP BY EmployeeID,MyGroup

Like I said, there's lot's of folks that could probably do this with a nice correlated sub-query and a handful of self joins... and in 2k5, undoubtably some form of CTE with an "Over Partion" would probably play nicely... but I've found that most correlated sub-queries of the nature necessary to support this solution end up having slower triangular joins and I just don't have 2k5 to experiment with.

David... let me know if that does it for you, please...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)