How Do You Verify Sequential Record Numbers.

  • I have spend the best part of a few hours trying to figure out how to structure a query that looks to the previous record number to identify whether there is a gap in the sequence. I need to know all of the records that are out of sequence. For example here is the dataset:

    Record

    1

    2

    3

    5

    9

    10

    11

    12

    I need to know that record #4, #6, #7 and #8 is missing.

    I have started with the following code but am not getting anywhere with it.

    USE DataBase

    GO

    SELECT Record, ROW_NUMBER() OVER(ORDER BY Record) AS RowNumber

    FROM Table

  • The way to use ROW_NUMBER to find the gaps in a sequence is by first finding your groups of day were the number is sequential. This is done by subtracting the generated ROW_NUMBER from your ID column; you'll see that this value remains the same until there is a gap, so 0 for those IDs up to 4 then 1 for 5 and so on.

    Once you have that, you'll be able to find the minimum and maximum values within that group by using MIN and MAX with a group by on the column we've just derived, and it's a small step then to find the minimum and maximum values where there is a gap.

    I'm not at my desk at the moment, so don't want to stab at the actual code without testing, but I'll post back later if I can.

    Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

  • There are lots of ways to solve this, most incredibly slow.

    Google "Gaps and Islands" Itzik Ben Gan

    He's done an article recently comparing several methods of identifying gaps in a sequence.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Matthew Darwin & Gila Monster,

    Thank you for your replies. Taking a solution as you recommended Gila Monster in SQL Server MVP Deep Dives, I have found a solution by Itzik Ben-Ganin in chapter 5 to work.

    http://www.manning.com/nielsen/SampleChapter5.pdf

    Thank you for your help.

  • This is the code for what I was referring to earlier; but I agree that Itzik Ben-Gan's gaps and islands techniques are well worth checking out.

    CREATE TABLE #tmp

    (ID int primary key)

    INSERT INTO #tmp

    VALUES

    (1)

    INSERT INTO #tmp

    VALUES

    (2)

    INSERT INTO #tmp

    VALUES

    (3)

    INSERT INTO #tmp

    VALUES

    (5)

    INSERT INTO #tmp

    VALUES

    (9)

    INSERT INTO #tmp

    VALUES

    (10)

    INSERT INTO #tmp

    VALUES

    (11)

    INSERT INTO #tmp

    VALUES

    (12)

    ;WITH cte AS (

    SELECT ID

    , ID - ROW_NUMBER() OVER (ORDER BY ID) AS GroupingFactor

    FROM #tmp)

    , cte2 AS (SELECT MIN(ID) MinimumIDInGroup

    , MAX(ID) MaxIDInGroup

    , GroupingFactor

    , ROW_NUMBER() OVER (ORDER BY GroupingFactor) AS IslandID

    FROM cte

    GROUP BY GroupingFactor

    )

    SELECT mn.MaxIDInGroup + 1 AS MinMissingIDInGroup

    , mn.MaxIDInGroup + mx.GroupingFactor AS MaxMissingIdInGroup

    , mn.IslandID AS MissingRange

    FROM cte2 mn

    INNER JOIN cte2 mx

    ON mn.IslandID = mx.IslandID - 1

    Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

  • It wasn't my original idea but here's a method that I've changed a bit over time. It turns out to be similar to Ben-Gan's and is nasty fast when compared to many row numbering methods especially when there are very large gaps.

    SELECT GapStart = ISNULL((SELECT MAX(N+1) FROM SomeTable m WHERE m.N < lo.N),1),

    GapEnd = lo.N-1

    FROM SomeTable lo

    WHERE lo.N NOT IN (SELECT N+1 FROM SomeTable )

    AND lo.N > 1

    "N" is the identity column, what ever it is actually called.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff and Matthew,

    I am using extremely large data sets (ie. 66,000,000 records). I will have to test both of these to see how fast they process through the data.

    Appreciate the input.

    Mike

  • mcertini (5/12/2012)


    Thanks Jeff and Matthew,

    I am using extremely large data sets (ie. 66,000,000 records). I will have to test both of these to see how fast they process through the data.

    Appreciate the input.

    Mike

    Thanks. Looking forward to your feedback. I guess my question would be, why are you worried about such gaps?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To ensure all of my data is complete.

  • mcertini (5/14/2012)


    To ensure all of my data is complete.

    If you are using an identity column you will get gaps in the incrementing numbers. any time a record is deleted it will leave a gap. also any time a transaction is rolled back the inserts will increment the identity but on roll back the identity remains the same.

    as an example you have 1 through 10 then delete 5. you have a gap at 5 but its perfectly valid to have a gap there. the second you have 1 through 10 then have a failed insert of 5 records. the next insert will be an ID of 16. unless you manually insert the number keeping track of it some how (either MAX(ID) + 1 or in a counters table) and you never delete from the table, there will be gaps that develop over time.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • The problem there is if that's an identity column, there can be gaps even with the data complete. If any insert is ever rolled back and there's an identity column, then the identity column is incremented even though the insert never completes.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • mcertini (5/12/2012)


    ...Taking a solution as you recommended Gila Monster in SQL Server MVP Deep Dives, I have found a solution by Itzik Ben-Ganin in chapter 5 to work.

    http://www.manning.com/nielsen/SampleChapter5.pdf

    Heh. You know, one of the solutions in that chapter (Islands, #3) is actually based off my answer to one of Itzik's puzzler articles several years ago that he himself admittedly could not solve. (OK, yes, two other guys posted similar answers, but mine was first ... 😀 )

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Using the example provided previously something like this would work too.

    CREATE TABLE #tmp

    (ID int primary key)

    INSERT INTO #tmp

    VALUES

    (1)

    INSERT INTO #tmp

    VALUES

    (2)

    INSERT INTO #tmp

    VALUES

    (3)

    INSERT INTO #tmp

    VALUES

    (5)

    INSERT INTO #tmp

    VALUES

    (9)

    INSERT INTO #tmp

    VALUES

    (10)

    INSERT INTO #tmp

    VALUES

    (11)

    INSERT INTO #tmp

    VALUES

    (12)

    select Tally.N as MissingNumbers

    from Tally

    left join #tmp t on t.id = tally.N

    where tally.N >= (select min(id) from #tmp) and tally.N <= (select max(id) from #tmp)

    and t.ID is null

    drop table #tmp

    This is based of the tally or numbers table. The "Numbers" or "Tally" Table: What it is and how it replaces a loop.[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply