Removing Overlapping periods

Question

Post reply

Removing Overlapping periods

SQL_Loser

SSC Eights!

Points: 829
More actions
August 27, 2013 at 11:57 am

#282552

Hi,
I have a problem and I have created a solution using loops. I would like it to be more efficient if possible. The explanation of the problem is in the code below. If any one can come up with a better solution I would be very appreciative!
/****Create sample table of Intervals********************/
IF Object_ID('tempdb.dbo.#Intervals') is not null
Drop table #Intervals
Create table #Intervals
(ID Int Not null,
PeriodBegin INT not null,
PeriodEnd INT not null)
Declare @id int = 1
Declare @PeriodBegin INT = 1
Declare @i int = 0
While @i <= 2000
Begin
INSert into #Intervals
Select id = @id, PeriodBegin = @PeriodBegin+ @i,PeriodEnd = @PeriodBegin+ @i+ 62
Set @i = @i + 1
END
Select * FROM #Intervals
/****************************
For the above table I need to retain only the records that do not overlap a previous period starting with periodBegin = 1.
The next 63 rows have a period begin less than the 1st row's period end, so those rows will be removed.
The next valid row would be row 64 as the period begin is less than the first row's period end.
I have a method of doing this below but it uses a loop and the tables that I am doing this for have millions of rows in some cases so it is not as efficient as I would like.
I cannot just choose the 1st, 64th,127th...etc. rows because not every ordinal value will be represented in the period begin column, the length of the period will always remain consistent however.
Can anyone come up with a faster method for doing this???
*********************************************************************************************************/
IF Object_ID('tempdb.dbo.#UniqueIntervals') is not null
Drop table#UniqueIntervals
Create table #UniqueIntervals
(ID Int Not null,
PeriodBegin INT not null,
PeriodEnd Int Not null)
Declare @loop int = 1
While @loop <= 32 ---32 possible unique periods
Begin
Insert into #UniqueIntervals (id, PeriodBegin,PeriodEnd)
Select id, PeriodBegin = Min(PeriodBegin), PeriodENd = Min(PeriodEnd)
FROM #Intervals
Group by id
order by Min(PeriodBegin)
Delete b FROM #UniqueIntervals a
INNER JOIN #Intervals b
ON a.id = b.id
and b.PeriodBegin <= a.PeriodEnd
Set @loop = @loop + 1
END
Select * FROM #UniqueIntervals
/************************************************************************/

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Hunchback SSCrazy Points: 2649 More actions · Answer 1

It will be helpful if you attach also sample data, expected result and the version of SQL Server that you are using.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 2

And while you are gen'ing up sample data for us search the web for Itzik Ben-Gan's various posts and articles (SQLMag has some iirc) on interval processing. There is some AMAZINGLY efficient stuff that can be done with SQL 2005+ using new mechanisms!!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

SQL_Loser SSC Eights! Points: 829 More actions · Answer 3

The first query is there to create a sample table. Does it not work?

SQL_Loser SSC Eights! Points: 829 More actions · Answer 4

Oh, and I have access to Sql Server 2012. The second part of the script is my solution and gives the expected result. I'm just checking to see if there is a better solution.

Hunchback SSCrazy Points: 2649 More actions · Answer 5

If I understood it correctly, you want groups of 63 rows supposing they are consecutives and then select the first row for each group.

WITH T AS (

SELECT

ID,

PeriodBegin,

PeriodEnd,

ROW_NUMBER() OVER(PARTITION BY ID, ((PeriodBegin - 1) / 63) + 1 ORDER BY PeriodBegin) AS rn

FROM

#Intervals

)

SELECT

ID,

PeriodBegin,

PeriodEnd

FROM

T

WHERE

rn = 1

ORDER BY

ID,

PeriodBegin;

GO

This expression ((PeriodBegin - 1) / 63) + 1 assign an incremental group number (63 rows each group) and the ROW_NUMBER enumerate the rows partition by (ID, grpnum). The rest is to select all rows where rn = 1.

I am afraid I am taking it too simple.