January 4, 2009 at 3:56 pm
Hi All,
I have a requirement to obtain the start value and end value of a sequence based on a particular column.
Ex:
Serial Number - Reseller Name
===============================
1001 RA01
1002 RA01
1003 RA01
1004 RA01
1005 RA02 --> This value changes in middle of sequence.
1006 RA02
1007 RA02
1008 RA02
1009 RA01 --> Again this value changes
1010 RA01
1011 RA01
The requirement is to obtain the output for the reseller RA01 as:
Start Val - End Val - Reseller Name
===========================
1001 1004 RA01
1009 1011 RA01
January 7, 2009 at 7:05 pm
Here is a version that works but I am not sure a cursor solution might be faster.
It also misses the special case of one row for a Name entry (no max value for the range)
Assume table test1 has columns id, name (similar to your example data)
select d.minid, min(d1.maxid) as maxid, d.name
from (select c.id as minid, c.name from test1 c
where c.id not in (SELECT a.id
from test1 a inner join test1 b
on a.name = b.name
where a.id > b.id and (a.id - b.id) = 1)) d
inner join
(select c1.id as maxid, c1.name from test1 c1
where c1.id not in (SELECT a1.id
from test1 a1 inner join test1 b1
on a1.name = b1.name
where a1.id < b1.id and (a1.id - b1.id) = -1)) d1
on d1.name = d.name where d.minid < d1.maxid
group by d.minid, d.name
mike
January 7, 2009 at 10:07 pm
Hi Mike,
Seems the query is taking a while to execute. I tried it for 20minutes.. not have seen any output.
Does this have any performance dependancy on the size of the table ?
My table is substantially big for the query to execute.
Anyways thanks for the query.
do u have a better solution ?
January 8, 2009 at 4:46 pm
yes, because of the joins and the use of "not in" it should get much worse with a large data table.
you might try it looking at one name (add to where clause) rather than the version I gave which will generate a table for all names.
Again, I think a cursor solution might be faster on a large table as it would just take one pass through the table and the proc could be set up to catch single line entries as well.
January 8, 2009 at 6:25 pm
gvsriramakrishna (1/7/2009)
Hi Mike,Seems the query is taking a while to execute. I tried it for 20minutes.. not have seen any output.
Does this have any performance dependancy on the size of the table ?
My table is substantially big for the query to execute.
Anyways thanks for the query.
do u have a better solution ?
How many rows in the real table?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 8, 2009 at 6:27 pm
480002 rows
January 8, 2009 at 8:46 pm
gvsriramakrishna (1/8/2009)
480002 rows
I'll be right back...
--Jeff Moden
Change is inevitable... Change for the better is not.
January 8, 2009 at 8:49 pm
Mike,
Your solution doesn't seem to be working.
It ran for 1 1/2 hrs.
Seems too much as a solution.
Thanks.
January 8, 2009 at 9:06 pm
You'll like this... takes just seconds to run on a half million rows... of course, you'll need to change #YourTable to your actual table name. For future posts, please see the following article... you'll get answers much more quickly...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
--===== Create and populate a test table.
-- This is NOT part of the solution.
CREATE TABLE #YourTable
(
[Serial Number] INT,
[Reseller Name] VARCHAR(100)
)
INSERT INTO #YourTable
([Serial Number],[Reseller Name])
SELECT '1001','RA01' UNION ALL
SELECT '1002','RA01' UNION ALL
SELECT '1003','RA01' UNION ALL
SELECT '1004','RA01' UNION ALL
SELECT '1005','RA02' UNION ALL
SELECT '1006','RA02' UNION ALL
SELECT '1007','RA02' UNION ALL
SELECT '1008','RA02' UNION ALL
SELECT '1009','RA01' UNION ALL
SELECT '1010','RA01' UNION ALL
SELECT '1011','RA01'
--================================================================
-- Solution starts here
--================================================================
--===== Copy the data from your table to a place we can work on it
SELECT ISNULL([Serial Number],0) AS [Serial Number],
[Reseller Name],
CAST(0 AS INT) AS Section
INTO #WorkArea
FROM #YourTable
--===== This clustered index is absolutely necessary to maintain the
-- order of the "pseudo cursor" or "quirky" update
ALTER TABLE #WorkArea
ADD PRIMARY KEY CLUSTERED ([Serial Number]) WITH FILLFACTOR = 100
--===== Declare a couple of obviously named variables...
DECLARE @PrevSN INT,
@PrevRN VARCHAR(100),
@PrevSection INT
--===== ... and preset them to known values.
SELECT @PrevSN = 0,
@PrevRN = '',
@PrevSection = 0
--===== Do the "pseudo cursor" update using the index scan to force
-- the correct order of the update (NOT GUARANTEED FOR SELECTS).
-- This should only take about 3 seconds to run on 480K rows.
UPDATE #WorkArea
SET @PrevSection = Section = CASE WHEN @PrevRN = [Reseller Name]
THEN @PrevSection
ELSE @PrevSection + 1
END,
@PrevRN = [Reseller Name],
@PrevSN = [Serial Number] --"anchor"
FROM #WorkArea WITH(INDEX(0),TABLOCKX)
--===== Display the desired results.
SELECT MIN([Serial Number]) AS [Start Val],
MAX([Serial Number]) AS [End Val],
[Reseller Name]
FROM #WorkArea
GROUP BY Section,[Reseller Name]
ORDER BY [Reseller Name], Section
I strongly recommend that you carefully read the following article so you know how that code works...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
This type of "data smear" is very similar to a running total. 😉
Be very careful when using this method... leave ANYTHING out and the data will be bad. Do it right and it's up to hundreds of times faster than a cursor and thousands of times faster than some improperly rendered set based code.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 10, 2009 at 10:26 pm
So, gvsriramakrishna... any feedback on this?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 11, 2009 at 6:34 pm
Jeff Moden (1/10/2009)
So, gvsriramakrishna... any feedback on this?
I think you deserve some:)
I guess Update is ok:) It's conceptually similar to using the rowindicator parameter in Rac which computes a rank over ResellerName in the order of SerialNumber. This is a dense rank problem but unfortuneatly it's never referred to by that description. How the problem is described (differently by all who post it) is itself part of the problem:)
Exec Rac
@transform='_dummy_',
@rows='ResellerName & SerialNumber',
@rowsort='SerialNumber',
@pvtcol='Report Mode',
@from='Sellers',
@rowindicators='ResellerName{Grp}',
@rowbreak='n',@defaults1='y',@racheck='y',@shell='n',
@select='select Min(SerialNumber) as [Start Val],Max(SerialNumber) as [End],ResellerName
from rac
where ResellerName=~RA01~
group by ResellerName,Grp
order by Grp'
Start Val End ResellerName
--------- ---- ------------
1001 1004 RA01
1009 1011 RA01
The similarity with what you've done is even clearer if you use a running sum (@rowruns) with the built in prior row value(s).
Exec Rac
@transform='_dummy_',
@rows='SerialNumber & ResellerName ',
@rowsort='SerialNumber',
@pvtcol='Report Mode',
@from='Sellers',
@rowbreak='n',@defaults1='y',@racheck='y',@shell='n',
@rowruns='^case when ResellerName=prior.ResellerName then 0 else 1 end^(dumy)',
@rowrunslabel='Grp',
@select='select Min(SerialNumber) as [Start Val],Max(SerialNumber) as [End Val],ResellerName
from rac
where ResellerName=~RA01~
group by ResellerName,Grp
order by Grp'
Or use a running sumwith a join to compare prior and current ResellerName values.
Exec Rac
@transform='_dummy_',
@rows='A.SerialNumber as ASerial & A.ResellerName as AReseller &
B.SerialNumber as BSerial & B.ResellerName as BReseller ',
@rowsort='A.SerialNumber',
@pvtcol='Report Mode',
@from='Sellers as A left join Sellers as B on A.SerialNumber-1=B.SerialNumber',
@where='1=1',
@rowbreak='n',@defaults1='y',@racheck='y',@shell='n',
@rowruns='^case when AReseller=BReseller then 0 else 1 end^(dumy)',
@rowrunslabel='Grp',
@select='select Min(ASerial) as [Start Val],Max(ASerial) as [End Val],AReseller as ResellerName
from rac
where AReseller=~RA01~
group by AReseller,Grp
order by Grp'
Finally here's the sql-99 standard OLAP solution using an window function. This is possible to do in DB2, Oracle and Teradata databases. MS obviously thinks it's not significant enough to add to sql server. They're had over 5 years to mull it OVER:)
SELECT Min(SerialNumber) as [Start Val],Max(SerialNumber) as [End Val],ResellerName
FROM
(SELECT A.SerialNumber,A.ResellerName,
-- This is a running sum, it amounts to a dense rank over A.ResellerName in the
-- direction of A.SerialNumber. It's an sql window defined in the standard.
Sum(CASE WHEN A.ResellerName=B.ResellerName THEN 0 ELSE 1 END) OVER
(ORDER BY A.SerialNumber
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM Sellers AS A LEFT JOIN Sellers AS B ON A.SerialNumber-1=B.SerialNumber
WHERE ResellerName='RA01') AS C
GROUP BY ResellerName,Grp
ORDER BY Grp
Is this a better solution than writing code for the Update statement? The whole point of OLAP in the standards was to make problems like this 'easier' to solve than existing sql permits/permitted. I guess Rac is still the easiest solution in sql server:) When problems get more involved and/ or more columns are involved, writing code for Update becomes messy:) I suppose some sql mavens embrace it but there's a lot of developers out there that will scratch their head, or lower, over it:)
I suspect that using Update makes some users feel uncomfortable as it brings into question some fundamental ideas about what a relational database can and cannot do. I would suggest that instead of knocking the technique they would simply give up the ghost that sql server is, in fact, a relational database. It's no such thing. So dig around for things like this with a clear conscience:)
January 11, 2009 at 6:46 pm
Oops.. there seems to be an error in th SQL query at the over keyword.
Incorrect syntax near the keyword 'OVER'.
January 11, 2009 at 6:51 pm
gvsriramakrishna (1/11/2009)
Oops.. there seems to be an error in th SQL query at the over keyword.Incorrect syntax near the keyword 'OVER'.
You should read the words, not just sql code:) I listed the systems that it will execute on. MS has not implemented enough of OLAP for the query to work.
January 12, 2009 at 3:02 am
Here is another way of getting the beginning and end of a sequence. The downside of this way is that it assumes that there are no gaps in the numbers inside the sequence. If there is one, it will consider it as a different sequence. It also runs fast on a table with half a million rows.
;with MyCTE as (
select [Serial Number] as SR, [Reseller Name] as RN, row_number() over (order by [Reseller Name], [Serial number]) as RowNumber
from #YourTable
)
select min(SR), max(SR), RN
from MyCTE
where RN = 'RA01'
group by RN, SR-RowNumber
order by min(sr)
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
January 12, 2009 at 6:32 am
gvsriramakrishna (1/11/2009)
Oops.. there seems to be an error in th SQL query at the over keyword.Incorrect syntax near the keyword 'OVER'.
No... I wanted to know how the UPDATE solution works. The OVER solution that Rog Pike posted won't work in SQL Server and he said so.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply