How to get start and end value from a sequence of numbers - Grouped by a different field.

Question

Post reply

How to get start and end value from a sequence of numbers - Grouped by a different field.

gvsriramakrishna

Old Hand

Points: 357
More actions
January 4, 2009 at 3:56 pm

#217167

Hi All,
I have a requirement to obtain the start value and end value of a sequence based on a particular column.
Ex:
Serial Number - Reseller Name
===============================
1001 RA01
1002 RA01
1003 RA01
1004 RA01
1005 RA02 --> This value changes in middle of sequence.
1006 RA02
1007 RA02
1008 RA02
1009 RA01 --> Again this value changes
1010 RA01
1011 RA01
The requirement is to obtain the output for the reseller RA01 as:
Start Val - End Val - Reseller Name
===========================
1001 1004 RA01
1009 1011 RA01

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Mike Osborne SSC Veteran Points: 265 More actions · Answer 1

Here is a version that works but I am not sure a cursor solution might be faster.

It also misses the special case of one row for a Name entry (no max value for the range)

Assume table test1 has columns id, name (similar to your example data)

select d.minid, min(d1.maxid) as maxid, d.name

from (select c.id as minid, c.name from test1 c

where c.id not in (SELECT a.id

from test1 a inner join test1 b

on a.name = b.name

where a.id > b.id and (a.id - b.id) = 1)) d

inner join

(select c1.id as maxid, c1.name from test1 c1

where c1.id not in (SELECT a1.id

from test1 a1 inner join test1 b1

on a1.name = b1.name

where a1.id < b1.id and (a1.id - b1.id) = -1)) d1

on d1.name = d.name where d.minid < d1.maxid

group by d.minid, d.name

mike

gvsriramakrishna Old Hand Points: 357 More actions · Answer 2

Hi Mike,

Seems the query is taking a while to execute. I tried it for 20minutes.. not have seen any output.

Does this have any performance dependancy on the size of the table ?

My table is substantially big for the query to execute.

Anyways thanks for the query.

do u have a better solution ?

Mike Osborne SSC Veteran Points: 265 More actions · Answer 3

yes, because of the joins and the use of "not in" it should get much worse with a large data table.

you might try it looking at one name (add to where clause) rather than the version I gave which will generate a table for all names.

Again, I think a cursor solution might be faster on a large table as it would just take one pass through the table and the proc could be set up to catch single line entries as well.

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 4

gvsriramakrishna (1/7/2009)
Hi Mike,
Seems the query is taking a while to execute. I tried it for 20minutes.. not have seen any output.
Does this have any performance dependancy on the size of the table ?
My table is substantially big for the query to execute.
Anyways thanks for the query.
do u have a better solution ?

How many rows in the real table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

gvsriramakrishna Old Hand Points: 357 More actions · Answer 5

gvsriramakrishna

Old Hand

Points: 357

January 8, 2009 at 6:27 pm

#923844

480002 rows

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 6

gvsriramakrishna (1/8/2009)
480002 rows

I'll be right back...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

gvsriramakrishna Old Hand Points: 357 More actions · Answer 7

Mike,

Your solution doesn't seem to be working.

It ran for 1 1/2 hrs.

Seems too much as a solution.

Thanks.

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 8

You'll like this... takes just seconds to run on a half million rows... of course, you'll need to change #YourTable to your actual table name. For future posts, please see the following article... you'll get answers much more quickly...

http://www.sqlservercentral.com/articles/Best+Practices/61537/

--===== Create and populate a test table.

-- This is NOT part of the solution.

CREATE TABLE #YourTable

(

[Serial Number] INT,

[Reseller Name] VARCHAR(100)

)

INSERT INTO #YourTable

([Serial Number],[Reseller Name])

SELECT '1001','RA01' UNION ALL

SELECT '1002','RA01' UNION ALL

SELECT '1003','RA01' UNION ALL

SELECT '1004','RA01' UNION ALL

SELECT '1005','RA02' UNION ALL

SELECT '1006','RA02' UNION ALL

SELECT '1007','RA02' UNION ALL

SELECT '1008','RA02' UNION ALL

SELECT '1009','RA01' UNION ALL

SELECT '1010','RA01' UNION ALL

SELECT '1011','RA01'

--================================================================

-- Solution starts here

--================================================================

--===== Copy the data from your table to a place we can work on it

SELECT ISNULL([Serial Number],0) AS [Serial Number],

[Reseller Name],

CAST(0 AS INT) AS Section

INTO #WorkArea

FROM #YourTable

--===== This clustered index is absolutely necessary to maintain the

-- order of the "pseudo cursor" or "quirky" update

ALTER TABLE #WorkArea

ADD PRIMARY KEY CLUSTERED ([Serial Number]) WITH FILLFACTOR = 100

--===== Declare a couple of obviously named variables...

DECLARE @PrevSN INT,

@PrevRN VARCHAR(100),

@PrevSection INT

--===== ... and preset them to known values.

SELECT @PrevSN = 0,

@PrevRN = '',

@PrevSection = 0

--===== Do the "pseudo cursor" update using the index scan to force

-- the correct order of the update (NOT GUARANTEED FOR SELECTS).

-- This should only take about 3 seconds to run on 480K rows.

UPDATE #WorkArea

SET @PrevSection = Section = CASE WHEN @PrevRN = [Reseller Name]

THEN @PrevSection

ELSE @PrevSection + 1

END,

@PrevRN = [Reseller Name],

@PrevSN = [Serial Number] --"anchor"

FROM #WorkArea WITH(INDEX(0),TABLOCKX)

--===== Display the desired results.

SELECT MIN([Serial Number]) AS [Start Val],

MAX([Serial Number]) AS [End Val],

[Reseller Name]

FROM #WorkArea

GROUP BY Section,[Reseller Name]

ORDER BY [Reseller Name], Section

I strongly recommend that you carefully read the following article so you know how that code works...

http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

This type of "data smear" is very similar to a running total. 😉

Be very careful when using this method... leave ANYTHING out and the data will be bad. Do it right and it's up to hundreds of times faster than a cursor and thousands of times faster than some improperly rendered set based code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 9

So, gvsriramakrishna... any feedback on this?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

steve dassin SSCrazy Points: 2396 More actions · Answer 10

Jeff Moden (1/10/2009)
So, gvsriramakrishna... any feedback on this?

I think you deserve some:)

I guess Update is ok:) It's conceptually similar to using the rowindicator parameter in Rac which computes a rank over ResellerName in the order of SerialNumber. This is a dense rank problem but unfortuneatly it's never referred to by that description. How the problem is described (differently by all who post it) is itself part of the problem:)

Exec Rac

@transform='_dummy_',

@rows='ResellerName & SerialNumber',

@rowsort='SerialNumber',

@pvtcol='Report Mode',

@from='Sellers',

@rowindicators='ResellerName{Grp}',

@rowbreak='n',@defaults1='y',@racheck='y',@shell='n',

@select='select Min(SerialNumber) as [Start Val],Max(SerialNumber) as [End],ResellerName

from rac

where ResellerName=~RA01~

group by ResellerName,Grp

order by Grp'

Start Val End ResellerName

--------- ---- ------------

1001 1004 RA01

1009 1011 RA01

The similarity with what you've done is even clearer if you use a running sum (@rowruns) with the built in prior row value(s).

Exec Rac

@transform='_dummy_',

@rows='SerialNumber & ResellerName ',

@rowsort='SerialNumber',

@pvtcol='Report Mode',

@from='Sellers',

@rowbreak='n',@defaults1='y',@racheck='y',@shell='n',

@rowruns='^case when ResellerName=prior.ResellerName then 0 else 1 end^(dumy)',

@rowrunslabel='Grp',

@select='select Min(SerialNumber) as [Start Val],Max(SerialNumber) as [End Val],ResellerName

from rac

where ResellerName=~RA01~

group by ResellerName,Grp

order by Grp'

Or use a running sumwith a join to compare prior and current ResellerName values.

Exec Rac

@transform='_dummy_',

@rows='A.SerialNumber as ASerial & A.ResellerName as AReseller &

B.SerialNumber as BSerial & B.ResellerName as BReseller ',

@rowsort='A.SerialNumber',

@pvtcol='Report Mode',

@from='Sellers as A left join Sellers as B on A.SerialNumber-1=B.SerialNumber',

@where='1=1',

@rowbreak='n',@defaults1='y',@racheck='y',@shell='n',

@rowruns='^case when AReseller=BReseller then 0 else 1 end^(dumy)',

@rowrunslabel='Grp',

@select='select Min(ASerial) as [Start Val],Max(ASerial) as [End Val],AReseller as ResellerName

from rac

where AReseller=~RA01~

group by AReseller,Grp

order by Grp'

Finally here's the sql-99 standard OLAP solution using an window function. This is possible to do in DB2, Oracle and Teradata databases. MS obviously thinks it's not significant enough to add to sql server. They're had over 5 years to mull it OVER:)

SELECT Min(SerialNumber) as [Start Val],Max(SerialNumber) as [End Val],ResellerName

FROM

(SELECT A.SerialNumber,A.ResellerName,

-- This is a running sum, it amounts to a dense rank over A.ResellerName in the

-- direction of A.SerialNumber. It's an sql window defined in the standard.

Sum(CASE WHEN A.ResellerName=B.ResellerName THEN 0 ELSE 1 END) OVER

(ORDER BY A.SerialNumber

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp

FROM Sellers AS A LEFT JOIN Sellers AS B ON A.SerialNumber-1=B.SerialNumber

WHERE ResellerName='RA01') AS C

GROUP BY ResellerName,Grp

ORDER BY Grp

Is this a better solution than writing code for the Update statement? The whole point of OLAP in the standards was to make problems like this 'easier' to solve than existing sql permits/permitted. I guess Rac is still the easiest solution in sql server:) When problems get more involved and/ or more columns are involved, writing code for Update becomes messy:) I suppose some sql mavens embrace it but there's a lot of developers out there that will scratch their head, or lower, over it:)

I suspect that using Update makes some users feel uncomfortable as it brings into question some fundamental ideas about what a relational database can and cannot do. I would suggest that instead of knocking the technique they would simply give up the ghost that sql server is, in fact, a relational database. It's no such thing. So dig around for things like this with a clear conscience:)

http://www.rac4sql.net

http://www.beyondsql.blogspot.com

gvsriramakrishna Old Hand Points: 357 More actions · Answer 11

Oops.. there seems to be an error in th SQL query at the over keyword.

Incorrect syntax near the keyword 'OVER'.

steve dassin SSCrazy Points: 2396 More actions · Answer 12

gvsriramakrishna (1/11/2009)
Oops.. there seems to be an error in th SQL query at the over keyword.
Incorrect syntax near the keyword 'OVER'.

You should read the words, not just sql code:) I listed the systems that it will execute on. MS has not implemented enough of OLAP for the query to work.

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 13

Here is another way of getting the beginning and end of a sequence. The downside of this way is that it assumes that there are no gaps in the numbers inside the sequence. If there is one, it will consider it as a different sequence. It also runs fast on a table with half a million rows.

;with MyCTE as (

select [Serial Number] as SR, [Reseller Name] as RN, row_number() over (order by [Reseller Name], [Serial number]) as RowNumber

from #YourTable

)

select min(SR), max(SR), RN

from MyCTE

where RN = 'RA01'

group by RN, SR-RowNumber

order by min(sr)

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 14

gvsriramakrishna (1/11/2009)
Oops.. there seems to be an error in th SQL query at the over keyword.
Incorrect syntax near the keyword 'OVER'.

No... I wanted to know how the UPDATE solution works. The OVER solution that Rog Pike posted won't work in SQL Server and he said so.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)