October 9, 2010 at 1:19 pm
Comments posted to this topic are about the item How to Compare Rows within Partitioned Sets to Find Overlapping Dates
October 11, 2010 at 12:38 am
Here i didn't get the use of this query. Please can u give a real time example on this...
October 11, 2010 at 3:17 am
October 11, 2010 at 4:47 am
I think this would be of use in planning/supply systems where you want to know about future availablity of resources.
However it is slightly misleading in that I would have thought PersonId should have 3 rows, and not two, as they will be resourced up until the end of December 2010.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
October 11, 2010 at 5:12 am
Shouldn't you have a LEFT JOIN rather than an INNER JOIN?
October 11, 2010 at 5:54 am
ta.bu.shi.da.yu (10/11/2010)
If I read this correctly, you just implemented the LAG(...) Over(...) analytic function.However, I'm interested... if you do this on a large dataset, what does the execution plan look like?
Got a BOL clicky-link for this, by any chance? π
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 11, 2010 at 6:45 am
I don't think you're going to find that in BOL, Chris; unless I am mistaken LEAD and LAG are not supported by SQL Server. Oracle uses them to access values in a previous or next row - see this article. Let's hope Microsoft adds them to a future version of SQL Server.
Kevin, I too am curious about the performance of your method on large data sets. Doing a self-join with a WHERE a < b condition generally leads to very slow queries as the table gets large - a triangular join has Order N2 performance. But using a partition ought to be much more efficient.
October 11, 2010 at 7:56 am
We SOOOOOOO need full Windowing Function support in SQL Server. And yes, performance on this type of query will currently be exceptionally poor and approaching non-functional on increasingly large datasets.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
October 11, 2010 at 8:15 am
TheSQLGuru (10/11/2010)
We SOOOOOOO need full Windowing Function support in SQL Server. And yes, performance on this type of query will currently be exceptionally poor and approaching non-functional on increasingly large datasets.
Not so. Always check.
-----------------------------------------------------------
-- Create a working table to play with.
-----------------------------------------------------------
DROP TABLE #Numbers
SELECT TOP 1000000
n = ROW_NUMBER() OVER (ORDER BY a.name),
CalcValue = CAST(NULL AS BIGINT)
INTO #Numbers
FROM master.dbo.syscolumns a, master.dbo.syscolumns b
CREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC)
-----------------------------------------------------------
-- run a test against the table
-----------------------------------------------------------
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT a.*, B.n AS Nextrow
INTO #junk
FROM #Numbers a
INNER JOIN #Numbers b ON b.n = a.n + 1
-- (999999 row(s) affected) / CPU time = 3516 ms, elapsed time = 3538 ms.
-- Table 'Worktable'. Scan count 2, logical reads 6224, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SET STATISTICS IO Off
SET STATISTICS TIME Off
DROP TABLE #junk
-----------------------------------------------------------
-- run a functionally similar test against a CTE of the table with ROW_NUMBER() generating "row IDs"
-----------------------------------------------------------
SET STATISTICS IO ON
SET STATISTICS TIME ON
;WITH CTE AS (SELECT NewRowNumber = ROW_NUMBER() OVER (ORDER BY n DESC) FROM #Numbers)
SELECT a.*, B.NewRowNumber AS Nextrow
INTO #junk
FROM CTE a
INNER JOIN CTE b ON b.NewRowNumber = a.NewRowNumber + 1
-- (999999 row(s) affected) / CPU time = 7781 ms, elapsed time = 7808 ms.
-- Table 'Worktable'. Scan count 2, logical reads 6224, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SET STATISTICS IO Off
SET STATISTICS TIME Off
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 11, 2010 at 9:35 am
Real use of this query: Hospitals in the UK are penalised if patients have to wait too long for their operations. If the patient cancels an appointment the timer is reset on the waiting time. if the hospital cancels, the waiting time is not reset. (It's actually a lot more complex, but that's the basic principle)
Last year I was asked to sort out some ETL stored procedures used in waiting time calculations that were taking over 20 hours to run. These procs were using cursors. I replaced the stored procs with ones using very similar code to that shown here. The run time dropped from 20 hours + to less than 10 minutes.
Hope that's real enough π
Glyn
October 11, 2010 at 10:13 am
David McKinney (10/11/2010)
Shouldn't you have a LEFT JOIN rather than an INNER JOIN?
You're right! In haste, I ran an INNER JOIN so that I could just see the compared rows, but in hindsight, I probably should have kept my base table pure and ran a LEFT JOIN instead. This at least would have allowed me to keep my table counts the same as well as singling out the last row in each partitioned set.
October 11, 2010 at 10:25 am
ta.bu.shi.da.yu (10/11/2010)
If I read this correctly, you just implemented the LAG(...) Over(...) analytic function.
Cool, I didn't even know this existed. But it looks like it's only available in Oracle databases. Microsoft, please bring this over!
However, I'm interested... if you do this on a large dataset, what does the execution plan look like?
Let me try to get my execution plan and I'll get back to you. I do remember that our initial process ran painfully slow and we had just 45K rows. It ran for something like 45 seconds (almost 1 sec/1K rows) using a scalar function. This updated process now takes less than one second to complete.
October 12, 2010 at 3:53 pm
Real World Example: Wage and Hour Class Action Lawsuits. I am always working with dates on employment cases and many times the individual has many start and end dates. Part of the scrubbing process is to check for date overlaps or gaps. This is a good start to check those types of things.
Thanks for the post! π
December 24, 2013 at 11:02 pm
SQL 2012 makes pretty short work of this type of problem:
WITH PartitionedSchedules AS
(
SELECT ScheduleID, PersonID, startDate, durationDays
,CalculatedEndDate=DATEADD(day, durationDays, startDate)
,row2startDate=LEAD(startDate, 1) OVER (PARTITION BY PersonID ORDER BY startDate)
FROM Schedules
)
SELECT ScheduleID, PersonID, startDate, durationDays, CalculatedEndDate
,row2startDate
,datedifference
,analysis=CASE SIGN(datedifference)
WHEN 0 THEN 'contiguous'
WHEN 1 THEN CAST(ABS(datedifference) AS VARCHAR) + ' days overlap'
ELSE CAST(ABS(datedifference) AS VARCHAR) + ' days gap'
END
FROM PartitionedSchedules a
CROSS APPLY
(
SELECT DATEDIFF(day, row2startDate, CalculatedEndDate)
) b (datedifference)
WHERE datedifference IS NOT NULL;
No more need for a self-join.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 25, 2013 at 12:08 am
And here is (I think) another way to do it in SQL 2005 that avoids the self-join:
SELECT ScheduleID, PersonID, startDate, durationDays
,row2StartDate, CalculatedEndDate, datedifference
,analysis=CASE SIGN(datedifference)
WHEN 0 THEN 'contiguous'
WHEN 1 THEN CAST(ABS(datedifference) AS VARCHAR) + ' days overlap'
ELSE CAST(ABS(datedifference) AS VARCHAR) + ' days gap'
END
FROM
(
SELECT ScheduleID=MAX(CASE WHEN rn2 = 2 THEN ScheduleID END)
,PersonID
,startDate=MIN(startDate)
,durationDays=MAX(CASE WHEN rn2 = 2 THEN durationDays END)
,row2StartDate=MAX(CASE rn2 WHEN 2 THEN CalculatedEndDate ELSE [Date] END)
,CalculatedEndDate=MAX(CASE rn2 WHEN 2 THEN [Date] END)
,datedifference=DATEDIFF(day
,MAX(CASE rn2 WHEN 2 THEN CalculatedEndDate ELSE [Date] END)
,MAX(CASE rn2 WHEN 2 THEN [Date] END))
FROM
(
SELECT ScheduleID
,PersonID
,startDate
,durationDays
,CalculatedEndDate=CASE WHEN rn2=1 THEN DATEADD(day, durationDays, [Date]) END
,[Date]
,rn=ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY startDate)/2
,rn2
FROM Schedules a
CROSS APPLY
(
SELECT 1, startDate UNION ALL
SELECT 2, DATEADD(day, durationDays, startDate)
) b (rn2, [Date])
) a
GROUP BY PersonID, rn
HAVING COUNT(*) = 2
) a
ORDER BY PersonID;
This one assumes though that a row does not overlap two or more following rows.
The benefit of course of not doing a self-join is that the query does a single table or index scan (depending on indexing) instead of two.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply