May 11, 2012 at 11:40 pm
I have spend the best part of a few hours trying to figure out how to structure a query that looks to the previous record number to identify whether there is a gap in the sequence. I need to know all of the records that are out of sequence. For example here is the dataset:
Record
1
2
3
5
9
10
11
12
I need to know that record #4, #6, #7 and #8 is missing.
I have started with the following code but am not getting anywhere with it.
USE DataBase
GO
SELECT Record, ROW_NUMBER() OVER(ORDER BY Record) AS RowNumber
FROM Table
May 12, 2012 at 2:27 am
The way to use ROW_NUMBER to find the gaps in a sequence is by first finding your groups of day were the number is sequential. This is done by subtracting the generated ROW_NUMBER from your ID column; you'll see that this value remains the same until there is a gap, so 0 for those IDs up to 4 then 1 for 5 and so on.
Once you have that, you'll be able to find the minimum and maximum values within that group by using MIN and MAX with a group by on the column we've just derived, and it's a small step then to find the minimum and maximum values where there is a gap.
I'm not at my desk at the moment, so don't want to stab at the actual code without testing, but I'll post back later if I can.
Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]
May 12, 2012 at 2:31 am
There are lots of ways to solve this, most incredibly slow.
Google "Gaps and Islands" Itzik Ben Gan
He's done an article recently comparing several methods of identifying gaps in a sequence.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 12, 2012 at 9:20 am
Matthew Darwin & Gila Monster,
Thank you for your replies. Taking a solution as you recommended Gila Monster in SQL Server MVP Deep Dives, I have found a solution by Itzik Ben-Ganin in chapter 5 to work.
http://www.manning.com/nielsen/SampleChapter5.pdf
Thank you for your help.
May 12, 2012 at 9:29 am
This is the code for what I was referring to earlier; but I agree that Itzik Ben-Gan's gaps and islands techniques are well worth checking out.
CREATE TABLE #tmp
(ID int primary key)
INSERT INTO #tmp
VALUES
(1)
INSERT INTO #tmp
VALUES
(2)
INSERT INTO #tmp
VALUES
(3)
INSERT INTO #tmp
VALUES
(5)
INSERT INTO #tmp
VALUES
(9)
INSERT INTO #tmp
VALUES
(10)
INSERT INTO #tmp
VALUES
(11)
INSERT INTO #tmp
VALUES
(12)
;WITH cte AS (
SELECT ID
, ID - ROW_NUMBER() OVER (ORDER BY ID) AS GroupingFactor
FROM #tmp)
, cte2 AS (SELECT MIN(ID) MinimumIDInGroup
, MAX(ID) MaxIDInGroup
, GroupingFactor
, ROW_NUMBER() OVER (ORDER BY GroupingFactor) AS IslandID
FROM cte
GROUP BY GroupingFactor
)
SELECT mn.MaxIDInGroup + 1 AS MinMissingIDInGroup
, mn.MaxIDInGroup + mx.GroupingFactor AS MaxMissingIdInGroup
, mn.IslandID AS MissingRange
FROM cte2 mn
INNER JOIN cte2 mx
ON mn.IslandID = mx.IslandID - 1
Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]
May 12, 2012 at 12:18 pm
It wasn't my original idea but here's a method that I've changed a bit over time. It turns out to be similar to Ben-Gan's and is nasty fast when compared to many row numbering methods especially when there are very large gaps.
SELECT GapStart = ISNULL((SELECT MAX(N+1) FROM SomeTable m WHERE m.N < lo.N),1),
GapEnd = lo.N-1
FROM SomeTable lo
WHERE lo.N NOT IN (SELECT N+1 FROM SomeTable )
AND lo.N > 1
"N" is the identity column, what ever it is actually called.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2012 at 1:03 pm
Thanks Jeff and Matthew,
I am using extremely large data sets (ie. 66,000,000 records). I will have to test both of these to see how fast they process through the data.
Appreciate the input.
Mike
May 12, 2012 at 2:02 pm
mcertini (5/12/2012)
Thanks Jeff and Matthew,I am using extremely large data sets (ie. 66,000,000 records). I will have to test both of these to see how fast they process through the data.
Appreciate the input.
Mike
Thanks. Looking forward to your feedback. I guess my question would be, why are you worried about such gaps?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2012 at 12:43 pm
To ensure all of my data is complete.
May 14, 2012 at 12:49 pm
mcertini (5/14/2012)
To ensure all of my data is complete.
If you are using an identity column you will get gaps in the incrementing numbers. any time a record is deleted it will leave a gap. also any time a transaction is rolled back the inserts will increment the identity but on roll back the identity remains the same.
as an example you have 1 through 10 then delete 5. you have a gap at 5 but its perfectly valid to have a gap there. the second you have 1 through 10 then have a failed insert of 5 records. the next insert will be an ID of 16. unless you manually insert the number keeping track of it some how (either MAX(ID) + 1 or in a counters table) and you never delete from the table, there will be gaps that develop over time.
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
May 14, 2012 at 12:49 pm
The problem there is if that's an identity column, there can be gaps even with the data complete. If any insert is ever rolled back and there's an identity column, then the identity column is incremented even though the insert never completes.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 14, 2012 at 1:03 pm
mcertini (5/12/2012)
...Taking a solution as you recommended Gila Monster in SQL Server MVP Deep Dives, I have found a solution by Itzik Ben-Ganin in chapter 5 to work.
Heh. You know, one of the solutions in that chapter (Islands, #3) is actually based off my answer to one of Itzik's puzzler articles several years ago that he himself admittedly could not solve. (OK, yes, two other guys posted similar answers, but mine was first ... 😀 )
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 14, 2012 at 1:04 pm
Using the example provided previously something like this would work too.
CREATE TABLE #tmp
(ID int primary key)
INSERT INTO #tmp
VALUES
(1)
INSERT INTO #tmp
VALUES
(2)
INSERT INTO #tmp
VALUES
(3)
INSERT INTO #tmp
VALUES
(5)
INSERT INTO #tmp
VALUES
(9)
INSERT INTO #tmp
VALUES
(10)
INSERT INTO #tmp
VALUES
(11)
INSERT INTO #tmp
VALUES
(12)
select Tally.N as MissingNumbers
from Tally
left join #tmp t on t.id = tally.N
where tally.N >= (select min(id) from #tmp) and tally.N <= (select max(id) from #tmp)
and t.ID is null
drop table #tmp
This is based of the tally or numbers table. The "Numbers" or "Tally" Table: What it is and how it replaces a loop.[/url]
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply