May 19, 2007 at 2:11 am
1) How can I select only the first record in a series of certain consecutive numbers?
2) How can I obtain the count for the number of times a set of those certain consecutive numbers appear?
Example: (this assumes an un-altered Northwind database)
USE Northwind
GO
WITH Peter AS
(SELECT EmployeeID, Freight,
ROW_NUMBER() OVER(PARTITION BY EmployeeID ORDER BY Freight) AS [RecordNumber]
FROM Orders)
SELECT * FROM Peter
Partial Result Set
EmployeeID Freight RecordNumber
-------------- -------- ------------------
1 0.21 1
1 0.45 2
1 0.93 3
1 1.27 4
1 1.35 5
1 1.36 6
1 1.51 7
1 1.66 8
1 2.50 9
1 3.94 10
1 4.03 11
1 4.27 12
1 4.41 13
1 4.98 14
1 4.99 15
1 4.99 16
1 7.46 17
From the list, records 5 and 6 show consecutive Freight figures of 1.35 and 1.36 for EmployeeID 1.
From the list, records 14 and 15 show consecutive Freight figures of 4.98 and 4.99 for EmployeeID 1.
In this example how could I get a result set that only showed records, 1.35 and 4.98 - that is the first record in a series of consecutive Freight figures?
How could I get a count of the number of consecutive records for each set?
A partial result set should look like this.
EmployeeID 1stFreightRecord Count
1 1.35 2
1 4.98 2
May 19, 2007 at 6:28 am
David,
Is it always going to be incrementals steps of 0.01?
The DISTICT was necessary because there where 2 values of 4.99, namely 16 and 17.
I hope this is what you needed?
WITH Peter AS
(SELECT EmployeeID, Freight,
ROW_NUMBER() OVER(PARTITION BY EmployeeID ORDER BY Freight) AS [RecordNumber]
FROM Orders)
SELECT DISTINCT p2.RecordNumber FROM Peter p1
INNER JOIN Peter P2
ON p1.Freight = (p2.Freight + 0.01)
Jan
May 19, 2007 at 8:42 am
Jan,
Thank you for your efforts. In this example, yes, the next value would be in an increment of .01 because that would be the next consecutive number. The RecordNumber field is not necessary. It was only used to make it convenient for the reader to easily identify rows 5 and 14 without having to count down the rows on the screen.
Your response only gives the RecordNumber. I swapped out p2.RecordNumber for p2.Freight and received numbers that were not the first number in a series. The first output was .15. When running the original query to then check, I did not see .16 for any EmployeeID. I also need the result set grouped by the EmployeedID for each series and the count of each consecutive numbers in the series. Since there may be more than one series per EmployeeID, I have not had success using GROUP BY since that groups the entire result set for each EmployeeID. That's one of the challenges I am facing. The other challenges are to just find numbers in a sequence and return the first record in the set. Lastly to get a count of the consecutive numbers in each set.
David
May 20, 2007 at 6:37 am
use
tempdb;
CREATE
TABLE test( EmployeeId int, Freight decimal(9,2))
-------------- -------- ------------------
INSERT
test(EmployeeID, Freight) VALUES (1,0.21)
INSERT
test(EmployeeID, Freight) VALUES (1,0.45)
INSERT
test(EmployeeID, Freight) VALUES (1,0.93)
INSERT
test(EmployeeID, Freight) VALUES (1,1.27)
INSERT
test(EmployeeID, Freight) VALUES (1,1.35)
INSERT
test(EmployeeID, Freight) VALUES (1,1.36)
INSERT
test(EmployeeID, Freight) VALUES (1,1.51)
INSERT
test(EmployeeID, Freight) VALUES (1,1.66)
INSERT
test(EmployeeID, Freight) VALUES (1,2.50)
INSERT
test(EmployeeID, Freight) VALUES (1,3.94)
INSERT
test(EmployeeID, Freight) VALUES (1,4.03)
INSERT
test(EmployeeID, Freight) VALUES (1,4.27)
INSERT
test(EmployeeID, Freight) VALUES (1,4.41)
INSERT
test(EmployeeID, Freight) VALUES (1,4.98)
INSERT
test(EmployeeID, Freight) VALUES (1,4.99)
INSERT
test(EmployeeID, Freight) VALUES (1,4.99)
INSERT
test(EmployeeID, Freight) VALUES (1,7.46)
-- Add an employee 2
INSERT
test(EmployeeID, Freight) VALUES (2,4.41)
INSERT
test(EmployeeID, Freight) VALUES (2,4.98)
INSERT
test(EmployeeID, Freight) VALUES (2,4.99)
INSERT
test(EmployeeID, Freight) VALUES (2,4.99)
INSERT
test(EmployeeID, Freight) VALUES (2,7.46)
-- Query
SELECT
DISTINCT p2.EmployeeID, p2.Freight FROM Test p1
INNER JOIN Test P2
ON
p1.Freight = (p2.Freight + 0.01)
ORDER BY 1
-- Result: the result seems to be ok to me:
EmployeeID Freight
1 1.35
1 4.98
2 4.98
I would only use a group by if you use an aggregate function. Surely you're not summing or adding id's or anthing.
Jan
May 20, 2007 at 11:05 am
Jan,
Thanks again for another suggestion. However, this does not provide the count of the number of freight records contained in a sequence. It also shows a duplicate. I could easily create a test table with a SELECT INTO statement; however, it would be to cumbersome to handpick records to employee2. There are 830 records in the Orders table.
The data from Northwind is used only as an example so others would have a data set to test with.
May 20, 2007 at 12:10 pm
David,
My solution did not show a duplicate. I just added a second employee because you were mentioning that you needed a group by employeeId; but in your test data there was only one EmployeeId. I needed it to test if the query is valid.
Indeed, I overlooked the Count, which is also the reason why I wondered in my last post why you needed the GROUP BY.
Anyway, it is easily added to the solution:
SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) as CountFreight
FROM
(SELECT p2.EmployeeID, p2.Freight
FROM test p1
INNER JOIN test P2
ON p1.EmployeeID = p2.EmployeeID
AND p1.Freight = (p2.Freight + 0.01)) r
GROUP BY r.EmployeeID, r.Freight
Mind that in the testcase you provided there where doubles in the data: freight 4.99. So thisis the count of 2 successors that are provided in the query. I hope I understood this time correctly what you want.
I don't immedeately see where the SELECT INTO would help. Anyway this would involve more IO; the query here doesn't explicitely create temporary data.
Hope we are getting nearer.
Jan
May 24, 2007 at 6:29 pm
SOLUTION
SELECT
EmployeeID
,LowFreight
,(
SELECT count(*) FROM Orders PS
WHERE PS.EmployeeID = T.EmployeeID AND (PS.Freight BETWEEN T.LowFreight AND T.HighFreight)
) as [SeriesRowCount]
FROM
(
SELECT EmployeeID, Freight as LowFreight
,(
SELECT min(Freight) FROM Orders PS3
WHERE EXISTS
(
SELECT * FROM Orders PS2
WHERE ( PS2.EmployeeID = PS3.EmployeeID AND PS2.Freight = (PS3.Freight - .01) )
)
AND NOT EXISTS
(
SELECT * FROM Orders PS2
WHERE ( PS2.EmployeeID = PS3.EmployeeID AND PS2.Freight = (PS3.Freight + .01) )
)
AND PS3.EmployeeID = PS1.EmployeeID
AND PS3.Freight > PS1.Freight
) as HighFreight
FROM Orders PS1
WHERE EXISTS
(
SELECT * FROM Orders PS2
WHERE ( PS2.EmployeeID = PS1.EmployeeID AND PS2.Freight = (PS1.Freight + .01) )
)
AND NOT EXISTS
(
SELECT * FROM Orders PS2
WHERE ( PS2.EmployeeID = PS1.EmployeeID AND PS2.Freight = (PS1.Freight - .01) )
)
) T
ORDER BY EmployeeID, LowFreight
May 24, 2007 at 11:09 pm
David,
Seems a bit complex in light of Jan's second solution. What was wrong with it?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 25, 2007 at 1:36 pm
Hi Jeff,
Jan's solution did not work. I appreciate her effort.
May 25, 2007 at 4:51 pm
Hi Jeff,
Sure my code works! I never post anything without testing it first. It is of course possible that we misunderstand what is needed, but even that is not the case here. I was unaware that these were samples of the Northwind database; that's why I created my test table. In the following query I only changed the table name from "test" into "Orders" in order to run it in Northwind and added the ORDER BY EmployeeID, Freight in order to compare it with Davids query; the rest is exactly as I posted it before:
SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) as CountFreight
FROM
(SELECT p2.EmployeeID, p2.Freight
FROM Orders p1
INNER JOIN Orders P2
ON p1.EmployeeID = p2.EmployeeID
AND p1.Freight = (p2.Freight + 0.01)) r
GROUP BY r.EmployeeID, r.Freight
ORDER BY EmployeeID, Freight
Running this, we see that the only difference in result I have with David is that my CountFreight is always one less then Davids. This is how I understood the question: I just counted the "followers". If David wants to count one more, fine, all we need to add is "+1" and we get exactly the same result as David, but with much simpler code and a much better query plan than his. I invite everyone who has a SQL and Northwind running to compare both solutions and their query plan.
SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) + 1 as CountFreight
FROM
(SELECT p2.EmployeeID, p2.Freight
FROM Orders p1
INNER JOIN Orders P2
ON p1.EmployeeID = p2.EmployeeID
AND p1.Freight = (p2.Freight + 0.01)) r
GROUP BY r.EmployeeID, r.Freight
ORDER BY EmployeeID, Freight
And I still don't think it is a good idea to post exactly the same question in 2 threads.
Regards,
Jan
May 25, 2007 at 6:11 pm
Jan,
I appreciate your effort and glowing positive attitude; however, your suggestion does not work. This is apparent without the need of examining the data in Northwind. Here is your result set:
EmployeeID Freight CountFreight
1 1.35 1
1 4.98 3
4 1.26 1
4 3.04 1
4 29.6 1
4 59.13 1
7 41.89 1
8 4.41 1
1) The result is supposed to, in part, obtain the count for the number of times a set of consecutive numbers appear. If the count is 1, then there are no consecutive numbers.
2) You state you did not know this question was posted with data from the Northwind database. "Northwind" was stated twice in the first posting and repeated on my 2nd reply.
3) Your suggestion involved adding non-existent data.
4) The question is not posted in two threads. It is posted in two different forums.
I welcome a shorter suggestion; but need a solution.
May 26, 2007 at 2:45 am
Jeff and all others,
I have helped a lot of people here, but this game is becoming too anoying. Why should I continue this child play?
1) Would you be so kind to READ what I said? I fist understood that only the followers needed counting; so that's why I gave you the version with +1 which gives EXACTLY the same result as yours. Is putting this +1 hard to imagine?
SELECT r.EmployeeID, r.Freight, COUNT(r.Freight) + 1 as CountFreight
FROM
(SELECT p2.EmployeeID, p2.Freight
FROM Orders p1
INNER JOIN Orders P2
ON p1.EmployeeID = p2.EmployeeID
AND p1.Freight = (p2.Freight + 0.01)) r
GROUP BY r.EmployeeID, r.Freight
ORDER BY EmployeeID, Freight
Result:
1 1,35 2
1 4,98 3
4 1,26 2
4 3,04 2
4 29,60 2
4 59,13 2
7 41,89 2
8 4,41 2
2) Ok, I overlooked this and I used your "partial result set" as data in a test table. My test table took exactly the same structure so it didn't even matter. Was it so hard to imangine that the name "test" needed replacing by "orders"?
3) My suggestion did not involve adding non-existing data. It was a TEST. I wanted to add duplicates and employees to group by, exactly in the way that they exist in Northwind. Of course you don't NEED this data, it works with any data. Proof of this: run this query in Northwind and you get EXACTLY the same as yours.
4) Clearly in 2 forums means in 2 threads as well. The point is that we, who are trying to help others, see the same question over and over again. And we don't see the suggestions already offered before.
I have helped a lot of people here. I like problems because they are intelectually stimulating to look into. I do very much welcome somebody improving my code and can accept that I make errors (you can check my posting history on that). But this is an objective thing, it can be run by everyone. I see 272 views, so lots of people are spending time reading this nonsence. I invite everyone to run Davids query and mine (and yes, please, the version with the "+1" as posted here) the will get the EXACT SAME result with less code and a SIMPLER query plan.
Jan
May 26, 2007 at 2:54 am
Jan,
As angry as you are I still appreciate your efforts; however, your suggestion still does not work.
The data used for this question was Northwind because it is a database we all have access to. A solution must work fundamentally, not just for Northwind. The solution must work whether there are two, three, or a hundred consecutive numbers. It turns out that the Orders table has at most three consecutive numbers according to the criteria in the original posting. Although, your suggestion with a +1 to the count works for Orders, it fails to work fundamentally. To prove this, simply insert more consecutive numbers:
INSERT INTO Orders (EmployeeID, Freight) VALUES (1, 1.37)
INSERT INTO Orders (EmployeeID, Freight) VALUES (1, 1.38)
Now look at your result set.
May 26, 2007 at 8:00 am
Ok, I was wrong.
Jan
May 26, 2007 at 12:54 pm
Someone could probably do this in SQL Server 2005 using row numbers and correlated subqueries... but I don't have access to 2005 and I hate correlated subqueries just 'cause...
First... Here's the test data I used... notice that I've added a dupe (4.99 like in the original test data and another "multi-dupe at 9.01) and a couple of places where there are more than 2 items that qualify for David's request... I also did like Jan did... I included more than 1 EmployeeID so that we can see that this works across multiple employee's without additional code... to keep everyone safe, I did this in a temp table... please read all comments in all code that follows...
--=======================================================================================
-- Create some test data. This is NOT part of the solution... it's just to give us
-- all something common to work with for testing.
--=======================================================================================
--===== If the table that holds the test data already exists drop it
IF OBJECT_ID('TempDB..#MyHead') IS NOT NULL
DROP TABLE #MyHead
--===== Recreate the test table and populate it with test data
CREATE TABLE #MyHead (EmployeeID INT,Freight DECIMAL(6,2))
INSERT INTO #MyHead (EmployeeID,Freight)
SELECT 1,0.21 UNION ALL
SELECT 1,0.45 UNION ALL
SELECT 1,0.93 UNION ALL
SELECT 1,1.27 UNION ALL
SELECT 1,1.35 UNION ALL --should be a "hit" with count of "2"
SELECT 1,1.36 UNION ALL
SELECT 1,1.51 UNION ALL
SELECT 1,1.66 UNION ALL
SELECT 1,2.50 UNION ALL
SELECT 1,3.94 UNION ALL
SELECT 1,4.03 UNION ALL
SELECT 1,4.27 UNION ALL
SELECT 1,4.41 UNION ALL
SELECT 1,4.98 UNION ALL --should be a "hit" with count of "2"
SELECT 1,4.99 UNION ALL
SELECT 1,4.99 UNION ALL --dupe (not included according to David's specs)
SELECT 1,7.46 UNION ALL
SELECT 1,8.00 UNION ALL --should be a "hit" with count of "2"
SELECT 1,8.01 UNION ALL
SELECT 1,9.01 UNION ALL --should be a "hit" with count of "4"
SELECT 1,9.02 UNION ALL
SELECT 1,9.03 UNION ALL
SELECT 1,9.04 UNION ALL
SELECT 2,4.27 UNION ALL --different employee id
SELECT 2,4.41 UNION ALL
SELECT 2,4.98 UNION ALL --should be a "hit" with count of "3"
SELECT 2,4.99 UNION ALL
SELECT 2,4.99 UNION ALL --dupe (not included according to David's specs)
SELECT 2,5.00 UNION ALL
SELECT 2,7.46 UNION ALL
SELECT 2,8.00 UNION ALL --should be a "hit" with count of "3"
SELECT 2,8.01 UNION ALL
SELECT 2,8.02 UNION ALL
SELECT 2,9.01 UNION ALL --should be a "hit" with count of "4"
SELECT 2,9.02 UNION ALL
SELECT 2,9.02 UNION ALL --dupe (not included according to David's specs)
SELECT 2,9.02 UNION ALL --dupe (not included according to David's specs)
SELECT 2,9.02 UNION ALL --dupe (not included according to David's specs)
SELECT 2,9.03 UNION ALL
SELECT 2,9.04 UNION ALL
SELECT 2,9.07 UNION ALL
SELECT 3,9.07 UNION ALL --just to show one EmployeeID won't bleed into the next
SELECT 3,9.08
Ok... because I find it handy to "peel on potato at a time" (also known as "Divide and Conquer"), let's first isolate the data that qualifies for David's problem so we don't have to worry about rows that don't ...
--=======================================================================================
-- First, let's start by isolating only the rows in the data the qualify for David's
-- problem... those are rows that are .01 apart in the Freight column for any given
-- customer... to "play" this against the Orders table (or any other table), just
-- change #MyHead to the correct table name.
--=======================================================================================
--===== If the table that holds the isolated data already exists drop it
IF OBJECT_ID('TempDB..#Work') IS NOT NULL
DROP TABLE #Work
--===== Create and populate the work table with a clustered primary key and a special
-- column to support some "trick" code later on
CREATE TABLE #Work
(
EmployeeID INT NOT NULL,
Freight DECIMAL(6,2) NOT NULL,
MyGroup INT, --This is the "special" column
PRIMARY KEY CLUSTERED (EmployeeID,Freight)
)
INSERT INTO #Work (EmployeeID,Freight)
SELECT DISTINCT
t1.EmployeeID,t1.Freight
FROM #MyHead t1,
#MyHead t2
WHERE (t1.Freight+.01 = t2.Freight)
AND t1.EmployeeID = t2.EmployeeID
UNION ----------------------------------------------
SELECT DISTINCT
t1.EmployeeID,t1.Freight
FROM #MyHead t1,
#MyHead t2
WHERE (t1.Freight-.01 = t2.Freight)
AND t1.EmployeeID = t2.EmployeeID
--===== Here's what's in the work table right now
SELECT * FROM #WORK
... A bit more potato peeling... let's break contiguous/sequential rows (.01 apart) into groups ...
--=======================================================================================
-- This bit of "trick" code relies on the order of the data which is
-- forced by the Clustered Primary Key
=======================================================================================
--===== Declare a couple of local variables to support the "trick" code
DECLARE @MyGroup INT
SET @MyGroup = 0
DECLARE @LastFreight DECIMAL(6,2)
SET @LastFreight = -1
--===== Here's the "trick"... this "groups" rows that are .01 apart in Freight using the
-- proprietary UPDATE clause of SQL Server... magic... and blows cursor speed away!!!
UPDATE w
SET @MyGroup = MyGroup = CASE
WHEN Freight = @LastFreight + .01
THEN @MyGroup
ELSE @MyGroup+1
END,
@LastFreight = Freight
FROM #Work w --Order of processing is by the Clustered Primary key
--===== Here's what's in the work table right now... see where we're going with this????
SELECT * FROM #WORK
... and, finally, let's dump the peeled potato's into a basket...
--=======================================================================================
-- So, with that having been done, it's now easy to get the results you want.
--=======================================================================================
SELECT EmployeeID, MIN(Freight) AS Freight,COUNT(*) AS Items
FROM #Work
GROUP BY EmployeeID,MyGroup
Like I said, there's lot's of folks that could probably do this with a nice correlated sub-query and a handful of self joins... and in 2k5, undoubtably some form of CTE with an "Over Partion" would probably play nicely... but I've found that most correlated sub-queries of the nature necessary to support this solution end up having slower triangular joins and I just don't have 2k5 to experiment with.
David... let me know if that does it for you, please...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 45 total)
You must be logged in to reply to this topic. Login to reply