How Do You Verify Sequential Record Numbers.

Question

How Do You Verify Sequential Record Numbers.

mcertini

Mr or Mrs. 500

Points: 580
More actions
May 11, 2012 at 11:40 pm

#256303

I have spend the best part of a few hours trying to figure out how to structure a query that looks to the previous record number to identify whether there is a gap in the sequence. I need to know all of the records that are out of sequence. For example here is the dataset:
Record
1
2
3
5
9
10
11
12
I need to know that record #4, #6, #7 and #8 is missing.
I have started with the following code but am not getting anywhere with it.
USE DataBase
GO
SELECT Record, ROW_NUMBER() OVER(ORDER BY Record) AS RowNumber
FROM Table

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Matthew Darwin SSCertifiable Points: 5513 More actions · Answer 1

The way to use ROW_NUMBER to find the gaps in a sequence is by first finding your groups of day were the number is sequential. This is done by subtracting the generated ROW_NUMBER from your ID column; you'll see that this value remains the same until there is a gap, so 0 for those IDs up to 4 then 1 for 5 and so on.

Once you have that, you'll be able to find the minimum and maximum values within that group by using MIN and MAX with a group by on the column we've just derived, and it's a small step then to find the minimum and maximum values where there is a gap.

I'm not at my desk at the moment, so don't want to stab at the actual code without testing, but I'll post back later if I can.

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

There are lots of ways to solve this, most incredibly slow.

Google "Gaps and Islands" Itzik Ben Gan

He's done an article recently comparing several methods of identifying gaps in a sequence.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

mcertini Mr or Mrs. 500 Points: 580 More actions · Answer 3

Matthew Darwin & Gila Monster,

Thank you for your replies. Taking a solution as you recommended Gila Monster in SQL Server MVP Deep Dives, I have found a solution by Itzik Ben-Ganin in chapter 5 to work.

http://www.manning.com/nielsen/SampleChapter5.pdf

Thank you for your help.

Matthew Darwin SSCertifiable Points: 5513 More actions · Answer 4

This is the code for what I was referring to earlier; but I agree that Itzik Ben-Gan's gaps and islands techniques are well worth checking out.

CREATE TABLE #tmp

(ID int primary key)

INSERT INTO #tmp

VALUES

(1)

INSERT INTO #tmp

VALUES

(2)

INSERT INTO #tmp

VALUES

(3)

INSERT INTO #tmp

VALUES

(5)

INSERT INTO #tmp

VALUES

(9)

INSERT INTO #tmp

VALUES

(10)

INSERT INTO #tmp

VALUES

(11)

INSERT INTO #tmp

VALUES

(12)

;WITH cte AS (

SELECT ID

, ID - ROW_NUMBER() OVER (ORDER BY ID) AS GroupingFactor

FROM #tmp)

, cte2 AS (SELECT MIN(ID) MinimumIDInGroup

, MAX(ID) MaxIDInGroup

, GroupingFactor

, ROW_NUMBER() OVER (ORDER BY GroupingFactor) AS IslandID

FROM cte

GROUP BY GroupingFactor

)

SELECT mn.MaxIDInGroup + 1 AS MinMissingIDInGroup

, mn.MaxIDInGroup + mx.GroupingFactor AS MaxMissingIdInGroup

, mn.IslandID AS MissingRange

FROM cte2 mn

INNER JOIN cte2 mx

ON mn.IslandID = mx.IslandID - 1

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

Jeff Moden SSC Guru Points: 1004475 More actions · Answer 5

It wasn't my original idea but here's a method that I've changed a bit over time. It turns out to be similar to Ben-Gan's and is nasty fast when compared to many row numbering methods especially when there are very large gaps.

SELECT GapStart = ISNULL((SELECT MAX(N+1) FROM SomeTable m WHERE m.N < lo.N),1),

GapEnd = lo.N-1

FROM SomeTable lo

WHERE lo.N NOT IN (SELECT N+1 FROM SomeTable )

AND lo.N > 1

"N" is the identity column, what ever it is actually called.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mcertini Mr or Mrs. 500 Points: 580 More actions · Answer 6

Thanks Jeff and Matthew,

I am using extremely large data sets (ie. 66,000,000 records). I will have to test both of these to see how fast they process through the data.

Appreciate the input.

Mike

Jeff Moden SSC Guru Points: 1004475 More actions · Answer 7

mcertini (5/12/2012)
Thanks Jeff and Matthew,
I am using extremely large data sets (ie. 66,000,000 records). I will have to test both of these to see how fast they process through the data.
Appreciate the input.
Mike

Thanks. Looking forward to your feedback. I guess my question would be, why are you worried about such gaps?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mcertini Mr or Mrs. 500 Points: 580 More actions · Answer 8

mcertini

Mr or Mrs. 500

Points: 580

May 14, 2012 at 12:43 pm

#1487326

To ensure all of my data is complete.

CapnHector SSCoach Points: 16143 More actions · Answer 9

mcertini (5/14/2012)
To ensure all of my data is complete.

If you are using an identity column you will get gaps in the incrementing numbers. any time a record is deleted it will leave a gap. also any time a transaction is rolled back the inserts will increment the identity but on roll back the identity remains the same.

as an example you have 1 through 10 then delete 5. you have a gap at 5 but its perfectly valid to have a gap there. the second you have 1 through 10 then have a failed insert of 5 records. the next insert will be an ID of 16. unless you manually insert the number keeping track of it some how (either MAX(ID) + 1 or in a counters table) and you never delete from the table, there will be gaps that develop over time.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

The problem there is if that's an identity column, there can be gaps even with the data complete. If any insert is ever rolled back and there's an identity column, then the identity column is incremented even though the insert never completes.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

RBarryYoung SSC Guru Points: 143329 More actions · Answer 11

mcertini (5/12/2012)
...Taking a solution as you recommended Gila Monster in SQL Server MVP Deep Dives, I have found a solution by Itzik Ben-Ganin in chapter 5 to work.
http://www.manning.com/nielsen/SampleChapter5.pdf

Heh. You know, one of the solutions in that chapter (Islands, #3) is actually based off my answer to one of Itzik's puzzler articles several years ago that he himself admittedly could not solve. (OK, yes, two other guys posted similar answers, but mine was first ... 😀 )

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Sean Lange SSC Guru Points: 286590 More actions · Answer 12

Using the example provided previously something like this would work too.

CREATE TABLE #tmp

(ID int primary key)

INSERT INTO #tmp

VALUES

(1)

INSERT INTO #tmp

VALUES

(2)

INSERT INTO #tmp

VALUES

(3)

INSERT INTO #tmp

VALUES

(5)

INSERT INTO #tmp

VALUES

(9)

INSERT INTO #tmp

VALUES

(10)

INSERT INTO #tmp

VALUES

(11)

INSERT INTO #tmp

VALUES

(12)

select Tally.N as MissingNumbers

from Tally

left join #tmp t on t.id = tally.N

where tally.N >= (select min(id) from #tmp) and tally.N <= (select max(id) from #tmp)

and t.ID is null

drop table #tmp

This is based of the tally or numbers table. The "Numbers" or "Tally" Table: What it is and how it replaces a loop.[/url]

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/