Find missing numbers in sequence 1 thru 5

  • I have a table with records. Each set of records is numbered 1 thru 5. I need  to iterate thru each set and find records that do not have all five values 

    For example I need to delete the records in red since they are not a complete set. How can I do this in sql. Thanks

  • So create a small table (or use VALUES as a temporary table constructor) with the numbers 1-5 in it and do an outer join to it with the table you're checking.  Anything that returns a null means that it's missing values.  If you expose all 5 values in the 1-5 table (or table constructor), you'll even be able to determine which of five values are missing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, March 24, 2019 3:42 PM

    So create a small table (or use VALUES as a temporary table constructor) with the numbers 1-5 in it and do an outer join to it with the table you're checking.  Anything that returns a null means that it's missing values.  If you expose all 5 values in the 1-5 table (or table constructor), you'll even be able to determine which of five values are missing.

    The problem that I see with the table you posted is that there is duplication in both column 1 and 2 and there is no column apparent to "group the dupes" with, which also means that this is pretty much an impossible task to resolve.  Is there some other column that delineates one group of 5 from another or are you simply relying on the unreliable thought of having 5 contiguous rows in a group according to the row position delineated by the ID column (which is a very bad and unreliable way to do such a thing, BTW)?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • rpatlan - Sunday, March 24, 2019 12:38 PM

    I have a table with records. Each set of records is numbered 1 thru 5. I need  to iterate thru each set and find records that do not have all five values 

    For example I need to delete the records in red since they are not a complete set. How can I do this in sql. Thanks

    This should give you what you're looking for...

    IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
    DROP TABLE #TestData;

    CREATE TABLE #TestData (
      Id INT NOT NULL PRIMARY KEY,
      Col2 TINYINT NOT NULL
      );
    INSERT #TestData (Id, Col2) VALUES
      (6, 1),(7, 2),(8, 3),(9, 4),(10, 5),(11, 1),(12, 2),
      (13, 3),(14, 1),(15, 2),(16, 3),(17, 4),(18, 5);

    -- SELECT * FROM #TestData td;
    --=============================

    /****** Warning: The following solution relies on a lack of gaps in the Id column, within a group and the proper insertion order of Col2. Any variation in that pattern could result in the deletion of valid data.
    Before DELETEing any data, you are advised to execute this as a SELECT to make sure you aren’t going to delete data valid data. ******/

    WITH
      cte_add_cnt_group AS (
       SELECT
        td.Id,
        td.Col2,
        cnt = COUNT(1) OVER (PARTITION BY Id - td.Col2)
       FROM
        #TestData td
       )
    SELECT *
    --DELETE acg
    FROM
      cte_add_cnt_group acg
    WHERE
      acg.cnt < 5;

    --=============================

    -- check data after delete
    SELECT * FROM #TestData td;

  • Great idea, Jason.  Just a caution though...

    While that will work for the given data, it won't always work... all you need is some gaps in the ID column and it won't work correctly any more.

    However, if you created a computed column as a sequential row number in the same order as the ID column and played the count formula against that instead of the ID column, then that would most likely work all of the time even if there were gaps in the ID column.

    As a bit of a sidebar, if things are supposed to be in complete sets of 5, why is this problem centering around a set of 3?  Seems to me that something went haywire and I'd be loath to just up and delete the evidence.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, March 24, 2019 7:22 PM

    Great idea, Jason.  Just a caution though...

    While that will work for the given data, it won't always work... all you need is some gaps in the ID column and it won't work correctly any more.

    However, if you created a computed column as a sequential row number in the same order as the ID column and played the count formula against that instead of the ID column, then that would most likely work all of the time even if there were gaps in the ID column.

    As a bit of a sidebar, if things are supposed to be in complete sets of 5, why is this problem centering around a set of 3?  Seems to me that something went haywire and I'd be loath to just up and delete the evidence.

    Thank you sir. 🙂

    Yes, I did think about that and also the possibility that the Col2 values could be inserted out of order. That's an inherent problem with data designs that that rely on insert order to create meaning. Given the numerous ways in which this type of schema can go south, I figured it was best to simply solve for the data that was supplied and let the OP identify other anomalous data  patters if they exist.

    Good point about researching, rather than deleting... Which reminds me that I should have issued a warning with my previous solution...
    It will delete potentially valid data if it doesn't follow the established pattern. I would suggest executing it as a SELECT, verify that valid data is not going to be deleted, BEFORE actually deleting it.

  • Thanks for all your comments. This data is coming from text files. It is being parsed and cleaned as much as possible. So that is why some sets of 5 are missing. Although the records will be deleted this is only to allow for the import into a table. This data will be validated against production data. The sql works as I had wanted. Again thanks.

  • The id column is an identity field with numbers generated auto so it should be o.k.

  • Thank you for the feedback and clarification. I'm glad the solution will work for you. 🙂

  • rpatlan - Sunday, March 24, 2019 9:39 PM

    Thanks for all your comments. This data is coming from text files. It is being parsed and cleaned as much as possible. So that is why some sets of 5 are missing. Although the records will be deleted this is only to allow for the import into a table. This data will be validated against production data. The sql works as I had wanted. Again thanks.

    So you've researched why some of the sets have less than 5 items and know 100% for sure that there isn't a problem either on your end or the provider of the data end and that you're not actually deleting meaningful data that could be critical to whatever you're doing?

    I found that a lot of people that say "Yes" to the above question end up in deep Kimchi somewhere down the road.  Be careful and don't assume.  Get something in writing, if for no other reason, than to protect yourself.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jason A. Long - Sunday, March 24, 2019 8:24 PM

    Jeff Moden - Sunday, March 24, 2019 7:22 PM

    Great idea, Jason.  Just a caution though...

    While that will work for the given data, it won't always work... all you need is some gaps in the ID column and it won't work correctly any more.

    However, if you created a computed column as a sequential row number in the same order as the ID column and played the count formula against that instead of the ID column, then that would most likely work all of the time even if there were gaps in the ID column.

    As a bit of a sidebar, if things are supposed to be in complete sets of 5, why is this problem centering around a set of 3?  Seems to me that something went haywire and I'd be loath to just up and delete the evidence.

    Thank you sir. 🙂

    Yes, I did think about that and also the possibility that the Col2 values could be inserted out of order. That's an inherent problem with data designs that that rely on insert order to create meaning. Given the numerous ways in which this type of schema can go south, I figured it was best to simply solve for the data that was supplied and let the OP identify other anomalous data  patters if they exist.

    Good point about researching, rather than deleting... Which reminds me that I should have issued a warning with my previous solution...
    It will delete potentially valid data if it doesn't follow the established pattern. I would suggest executing it as a SELECT, verify that valid data is not going to be deleted, BEFORE actually deleting it.

    Totally agreed on all of that and thank you for taking my comments the right way.  Concerning the bold underlined stuff, let's hope the OP truly understands why the anomalous data is there and that it's not actually an indication of a problem that is getting ready to bite the company.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, it is not a problem. We already have the data so we can cross reference with imported data to determine which records are missing. We can then go and get complete record if needed.

    Thanks again.

  • rpatlan - Monday, March 25, 2019 7:36 AM

    Yes, it is not a problem. We already have the data so we can cross reference with imported data to determine which records are missing. We can then go and get complete record if needed.

    Thanks again.

    I guess I don't understand that.  You already have the data but you're importing it again?  And it comes across with missing data?  Why not have an up-close and personal conversation with the data provider and get them to straighten out their act? 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This data is exported from another older dos system to text file and has additional information that is needed. We have the matching Client ID. If data is missing we will be able to determine by doing left join. The missing data can be retrieved at a later date. Hard to explain process and why it id done this way. Be assured that the Client is aware of the issues.

  • rpatlan - Monday, March 25, 2019 8:28 AM

    This data is exported from another older dos system to text file and has additional information that is needed. We have the matching Client ID. If data is missing we will be able to determine by doing left join. The missing data can be retrieved at a later date. Hard to explain process and why it id done this way. Be assured that the Client is aware of the issues.

    K.  Just trying to watch out for you.  Thank you for the feedback and totally understood on "Hard to explain" processes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply