how to group and rank (with reset)

Question

how to group and rank (with reset)

sjsubscribe

SSCrazy

Points: 2080
More actions
November 20, 2011 at 3:30 pm

#250784

How do I calculate the ranking and grouping column as shown below based on the following:
acd drv led p ranking and grouping
-----+-----+-----+---+----------------------
208 912 199 1 0
208 912 200 0 1
208 912 201 0 2
208 912 202 1 0
208 912 203 0 1
208 312 199 1 0
208 312 200 0 1
208 312 201 0 2
208 312 202 1 0
208 312 203 0 1
208 912 204 0 2
208 312 204 0 2
208 312 205 0 3
208 312 206 0 4
208 912 205 0 3
(acd, drv, led) are the primary keys,
led is simple ascending integers (1 to 400), and
p is 1 or 0
for every acd-drv-led combination, I want ranking and grouping based on p
value. When P=1, then rank is 0. Subsequent P=zeros until the next P=1
get ranked incrementally higher for that acd-drv combination. The ranking
gets reset at P=0 and does not carry over to the next group even for
the same acd-drv combination.
This is unlike row_number() over (partition by drv, acd order by p, led), which
does not reset the ranks at each P=1. It continues increasing the rank values
through the entire table.
Are there any other tricks. I'm weary of any cursor solution as I have
to sift through 13 million records. I also don't have the luxury of
using any CLRs.
Thanks,
S.J.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 1

Does your table an ID column to know the order of the rows?? Without that it is highly impossible to code this requirement. With ID column, this can achieved without cursor or CLR..

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 2

2 ways to do this,if there is an unique ID column.

1. Recursive CTE - not a performant one; will be similar to the performance numbers of cursors or while loops.

2. Quirky Update[/url] - lightning fast; but has a set of rules u MUST adhere to; else ur table will become corrupt.

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 3

For starters 🙂

here is the sample data..

IF OBJECT_ID ('#TempDB..#TempOne') IS NOT NULL

DROP TABLE #TempOne

GO

CREATE TABLE #TempOne

(

ID INT PRIMARY KEY CLUSTERED

, acd INT

, drv INT

, led INT

, p INT

, RankingAndGrouping INT

);

INSERT #TempOne (ID ,acd ,drv ,led ,p,RankingAndGrouping)

SELECT 1, 208, 912 ,199 ,1 ,0

UNION ALL SELECT 2, 208, 912 ,200 ,0 ,1

UNION ALL SELECT 3, 208, 912 ,201 ,0 ,2

UNION ALL SELECT 4, 208, 912 ,202 ,1 ,0

UNION ALL SELECT 5, 208, 912 ,203 ,0 ,1

UNION ALL SELECT 6, 208, 312 ,199 ,1 ,0

UNION ALL SELECT 7, 208, 312 ,200 ,0 ,1

UNION ALL SELECT 8, 208, 312 ,201 ,0 ,2

UNION ALL SELECT 9, 208, 312 ,202 ,1 ,0

UNION ALL SELECT 10, 208, 312 ,203 ,0 ,1

UNION ALL SELECT 11, 208, 912 ,204 ,0 ,2

UNION ALL SELECT 12, 208, 312 ,204 ,0 ,2

UNION ALL SELECT 13, 208, 312 ,205 ,0 ,3

UNION ALL SELECT 14, 208, 312 ,206 ,0 ,4

UNION ALL SELECT 15, 208, 912 ,205 ,0 ,3 ;

CREATE NONCLUSTERED INDEX NIX_acd_drv_led

ON #TempOne (acd ,drv ,led) INCLUDE (p);

here is the partially requirement satisfying code :hehe:

-- Recursive CTE

; WITH RankedCTE AS

(

SELECT ID ,acd ,drv ,led ,p , CASE WHEN p = 1 THEN 0 ELSE 1 END AS RankingAndGrouping

FROM #TempOne

WHERE ID = 1

UNION ALL

SELECT BaseTable.ID , BaseTable.acd , BaseTable.drv , BaseTable.led , BaseTable.p

, CASE WHEN BaseTable.p = 1 THEN 0 ELSE CTE.RankingAndGrouping + 1

END AS RankingAndGrouping

FROM RankedCTE CTE

INNER JOIN #TempOne BaseTable

ON CTE.ID + 1 = BaseTable.ID

)

SELECT OuterTable.ID , OuterTable.acd , OuterTable.drv , OuterTable.led , OuterTable.p

, CrsApp.RankingAndGrouping

FROM RankedCTE OuterTable

CROSS APPLY

(

SELECT TOP 1 RankingAndGrouping

FROM RankedCTE InnerTable

WHERE InnerTable.acd = OuterTable.acd AND InnerTable.led = OuterTable.led

ORDER BY InnerTable.ID ASC

) CrsApp

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 4

i guess a cute written cursor will smoke recursive cte here. But it will be interesting to see how quirky update works. should be stunning fast here...

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 5

@sjsubscribe,

Post the CREATE TABLE statement for you table and any/all the indexes you may have on the table and I'll show you the "Quirky Update" method Cold Coffee is talking about. It'll do a million rows in about 3 seconds.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sjsubscribe SSCrazy Points: 2080 More actions · Answer 6

ColdCoffee (11/20/2011)
Does your table an ID column to know the order of the rows?? Without that it is highly impossible to code this requirement. With ID column, this can achieved without cursor or CLR..

The order of the rows is the "led" column. There's another one that has a timestamp but that shows only chronological order.

sjsubscribe SSCrazy Points: 2080 More actions · Answer 7

ColdCoffee (11/20/2011)

For starters 🙂

thanks ColdCoffee, I think the solution is in Cross Apply. I forgot about that. I may have to tweak it a bit to get the order right for led column. Otherwise your solution is a great hint.

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 8

I guess my simple request for a bit more information is being ignored. 😉

I'm not asking what I asked to make anyone jump through any type of hoop. I need the information to determine whether I should do an "insitu" Quirky Update or not.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sjsubscribe SSCrazy Points: 2080 More actions · Answer 9

Jeff Moden (11/20/2011)
@sjsubscribe,
Post the CREATE TABLE statement for you table and any/all the indexes you may have on the table and I'll show you the "Quirky Update" method Cold Coffee is talking about. It'll do a million rows in about 3 seconds.

Thanks Jeff for the offer. Sorry for the delay in responding as I had to wait until I got the original table with data with the DDL from the production server.

Instead of putting the whole thing here, I've uploaded a zip file as an attachment to this msg. It has the actual data minus some redundant fields that I removed. I am able to add other indexes if needed but cannot change the primary keys.

A note about the fields (the data's from remote sensors):

acd is the location code (INT)

R is record number for that location (auto incremented from 1 to about 15K, which translates to about 15K records per location)

drv is the deviceType (about 40 distinct types at each location, also as INT)

P=1 is when the power reset, P=0 no power interruption

Led is the loop cycle number. Starts at 1 and may go unto 600 per device. (also INT)

LoadTime is the timing value I'm most interested in for stats.

RankingAndGrouping shows values I entered manually for this exercise to show the grouping and ranking.

Grouping should be by:

each drv at each acd for each power cycle

And ranking by:

Led, where it's zero when P=1 and then increases by 1 for each subsequent Led

The grouping and rankings are for stats on LoadTime by various RankingAndGrouping values. Best, worst, best avg, worst avg, etc, by the various groups/devices/locations.

When this query is run on the sample dataset (attached), the correct RankingAndGrouping values are as shown:

select * from robot.tP

where RankingAndGrouping is not null

order by acd, drv, R, Led

the above query should show this result below:

acdRdrvPLedLoadTimeRankingAndGrouping

19918390100:00:53.15400001

19922590200:00:47.99600002

19926790300:00:47.87700003

19930991400:00:47.99900000

19935090500:00:48.09600001

19939490600:00:48.58900002

19943790700:00:48.23700003

19947990800:00:48.61200004

19952390900:00:48.37100005

199565901000:00:48.06600006

199180120100:00:53.45300001

199224120200:00:48.03800002

199268120300:00:48.05900003

199310120400:00:48.01500004

199358120500:00:48.49300005

199405120600:00:48.80000006

199446120700:00:48.44000007

199492121800:00:48.49000000

199535120900:00:48.47400001

1995781201000:00:48.55500002

20040890100:00:42.79600001

20044990200:00:41.05900002

20049290300:00:41.14500003

20053491400:00:41.18400000

20057690500:00:41.12400001

20061890600:00:41.42700002

20066090700:00:41.80900003

20070290800:00:41.45100004

20074590900:00:41.35200005

200787901000:00:41.59700006

200403120100:00:43.26700001

200445120200:00:40.94100002

200487121300:00:40.95800000

200531120400:00:41.09100001

200574120500:00:41.63400002

200617120600:00:41.75700003

200661120700:00:42.04700004

200704120800:00:41.82500005

200746120900:00:41.92600006

2007881201000:00:41.91800007

Lowell SSC Guru Points: 323518 More actions · Answer 10

a CTE of the posted data:

WITH MySampleData(acd,R,drv,P,Led,LoadTime,RankingAndGrouping)

AS

(

SELECT 199,183,9,0,1,00:00:53.1540000,1 UNION ALL

SELECT 199,225,9,0,2,00:00:47.9960000,2 UNION ALL

SELECT 199,267,9,0,3,00:00:47.8770000,3 UNION ALL

SELECT 199,309,9,1,4,00:00:47.9990000,0 UNION ALL

SELECT 199,350,9,0,5,00:00:48.0960000,1 UNION ALL

SELECT 199,394,9,0,6,00:00:48.5890000,2 UNION ALL

SELECT 199,437,9,0,7,00:00:48.2370000,3 UNION ALL

SELECT 199,479,9,0,8,00:00:48.6120000,4 UNION ALL

SELECT 199,523,9,0,9,00:00:48.3710000,5 UNION ALL

SELECT 199,565,9,0,10,00:00:48.0660000,6 UNION ALL

SELECT 199,180,12,0,1,00:00:53.4530000,1 UNION ALL

SELECT 199,224,12,0,2,00:00:48.0380000,2 UNION ALL

SELECT 199,268,12,0,3,00:00:48.0590000,3 UNION ALL

SELECT 199,310,12,0,4,00:00:48.0150000,4 UNION ALL

SELECT 199,358,12,0,5,00:00:48.4930000,5 UNION ALL

SELECT 199,405,12,0,6,00:00:48.8000000,6 UNION ALL

SELECT 199,446,12,0,7,00:00:48.4400000,7 UNION ALL

SELECT 199,492,12,1,8,00:00:48.4900000,0 UNION ALL

SELECT 199,535,12,0,9,00:00:48.4740000,1 UNION ALL

SELECT 199,578,12,0,10,00:00:48.5550000,2 UNION ALL

SELECT 200,408,9,0,1,00:00:42.7960000,1 UNION ALL

SELECT 200,449,9,0,2,00:00:41.0590000,2 UNION ALL

SELECT 200,492,9,0,3,00:00:41.1450000,3 UNION ALL

SELECT 200,534,9,1,4,00:00:41.1840000,0 UNION ALL

SELECT 200,576,9,0,5,00:00:41.1240000,1 UNION ALL

SELECT 200,618,9,0,6,00:00:41.4270000,2 UNION ALL

SELECT 200,660,9,0,7,00:00:41.8090000,3 UNION ALL

SELECT 200,702,9,0,8,00:00:41.4510000,4 UNION ALL

SELECT 200,745,9,0,9,00:00:41.3520000,5 UNION ALL

SELECT 200,787,9,0,10,00:00:41.5970000,6 UNION ALL

SELECT 200,403,12,0,1,00:00:43.2670000,1 UNION ALL

SELECT 200,445,12,0,2,00:00:40.9410000,2 UNION ALL

SELECT 200,487,12,1,3,00:00:40.9580000,0 UNION ALL

SELECT 200,531,12,0,4,00:00:41.0910000,1 UNION ALL

SELECT 200,574,12,0,5,00:00:41.6340000,2 UNION ALL

SELECT 200,617,12,0,6,00:00:41.7570000,3 UNION ALL

SELECT 200,661,12,0,7,00:00:42.0470000,4 UNION ALL

SELECT 200,704,12,0,8,00:00:41.8250000,5 UNION ALL

SELECT 200,746,12,0,9,00:00:41.9260000,6 UNION ALL

SELECT 200,788,12,0,10,00:00:41.9180000,7

)

Select * from MySampleData

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Jeff Moden SSC Guru Points: 1004392 More actions · Answer 11

sjsubscribe (11/22/2011)
Jeff Moden (11/20/2011)
@sjsubscribe,
Post the CREATE TABLE statement for you table and any/all the indexes you may have on the table and I'll show you the "Quirky Update" method Cold Coffee is talking about. It'll do a million rows in about 3 seconds.
Jeff Moden (11/20/2011)
@sjsubscribe,
Post the CREATE TABLE statement for you table and any/all the indexes you may have on the table and I'll show you the "Quirky Update" method Cold Coffee is talking about. It'll do a million rows in about 3 seconds.
Thanks Jeff for the offer. Sorry for the delay in responding as I had to wait until I got the original table with data with the DDL from the production server.
Instead of putting the whole thing here, I've uploaded a zip file as an attachment to this msg. It has the actual data minus some redundant fields that I removed. I am able to add other indexes if needed but cannot change the primary keys.
A note about the fields (the data's from remote sensors):
acd is the location code (INT)
R is record number for that location (auto incremented from 1 to about 15K, which translates to about 15K records per location)
drv is the deviceType (about 40 distinct types at each location, also as INT)
P=1 is when the power reset, P=0 no power interruption
Led is the loop cycle number. Starts at 1 and may go unto 600 per device. (also INT)
LoadTime is the timing value I'm most interested in for stats.
RankingAndGrouping shows values I entered manually for this exercise to show the grouping and ranking.
Grouping should be by:
each drv at each acd for each power cycle
And ranking by:
Led, where it's zero when P=1 and then increases by 1 for each subsequent Led
The grouping and rankings are for stats on LoadTime by various RankingAndGrouping values. Best, worst, best avg, worst avg, etc, by the various groups/devices/locations.
When this query is run on the sample dataset (attached), the correct RankingAndGrouping values are as shown:
select * from robot.tP
where RankingAndGrouping is not null
order by acd, drv, R, Led
the above query should show this result below:
acdRdrvPLedLoadTimeRankingAndGrouping
19918390100:00:53.15400001
19922590200:00:47.99600002
19926790300:00:47.87700003
19930991400:00:47.99900000
19935090500:00:48.09600001
19939490600:00:48.58900002
19943790700:00:48.23700003
19947990800:00:48.61200004
19952390900:00:48.37100005
199565901000:00:48.06600006
199180120100:00:53.45300001
199224120200:00:48.03800002
199268120300:00:48.05900003
199310120400:00:48.01500004
199358120500:00:48.49300005
199405120600:00:48.80000006
199446120700:00:48.44000007
199492121800:00:48.49000000
199535120900:00:48.47400001
1995781201000:00:48.55500002
20040890100:00:42.79600001
20044990200:00:41.05900002
20049290300:00:41.14500003
20053491400:00:41.18400000
20057690500:00:41.12400001
20061890600:00:41.42700002
20066090700:00:41.80900003
20070290800:00:41.45100004
20074590900:00:41.35200005
200787901000:00:41.59700006
200403120100:00:43.26700001
200445120200:00:40.94100002
200487121300:00:40.95800000
200531120400:00:41.09100001
200574120500:00:41.63400002
200617120600:00:41.75700003
200661120700:00:42.04700004
200704120800:00:41.82500005
200746120900:00:41.92600006
2007881201000:00:41.91800007

Appologies for the delay... I was trying to salvage your zip file. The table in your zip file doesn't have the same primary key as what you described in your original post. The data in the file violates at least one not null constraint and the data for the LED column isn't incremental in the file.

If you could provide the correct data, I could give this a whirl. I'd use Lowell's data because it looks correct but I'd rather use the data you actually have.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)