Adding/updating rows of data that is missing but not in sequential order

  • I don't know how to explain this clearly, but I have the data set below

    with all int data type.

    ID year reportdate

    1 1998 19980101

    1 2000 20001231

    1 2003 20030331

    drop table table_1

    create table table_1

    (

    ID int NULL,

    Year int NULL,

    ReportDate int NULL

    )

    insert into table_1

    values(1, 1998, 19980101)

    insert into table_1

    values(1, 2000, 20001231)

    insert into table_1

    values(1, 2003, 20030331)

    select * from table_1

    I need to add the missing year and missing reportdate in between or among

    these existing years and date, for it to come out as below. as you can see,

    it is kind of tricky because the missing year is easy to add; however, the report

    date of the missing year has to be the date of the following year.

    I created a function to do this, and it works fine when the id has only one year missing,

    or consecutively missing (missing year in order), but it does not work when they

    are missing and out of sequence. the while loop only works when

    it's to be looped through in order of +1 or -1, but how can you loop

    through when you don't know when it would be missing to add them properly?

    Any idea or suggestion would be greatly appreciated.

    -- Result to be

    ID year reportdate

    1 1998 19980101

    1 1999 19991231

    1 2000 20001231

    1 2001 20010331

    1 2002 20020331

    1 2003 20030331

  • How about something like (I hope is right - I haven't got access to a SQL Server instance to check it)

    DECLARE @yearlist cursor for select distinct year from table_1

    DECLARE @EndYear int, @year INT

    OPEN yearlist

    FETCH NEXT FROM @yearlist into @year

    WHILE @@Fetch_status = 0

    BEGIN

    SELECT @endyear = Min (Year)

    from table_1

    where year > @year

    if @year + 1 <> @endyear

    BEGIN

    WJILE @year < @endyear

    BEGIN

    INSERT INTO table_1 (year, reportdate)

    SELECT @year + 1, (@year + 1) * 10000 + CONVERT (INT, Right ((min(year), 4)

    FROM Table_1

    WHERE Year > @year

    SELECT @year = @year + 1

    END

    fetch next from @yearlist into @year

    END

    deallocate @yearlist

  • thanks very much happycat59 , let me try in my procedure.

    Even if it does not work perfectly, you gave me idea and modify

    to my needs and some lights at the end of the tunel. Greatly appreciated.

  • If you have a lot of these to do - you could use a TALLY table to help build this instead of a cursor. with a fairly sizeable number - this would be lots faster.

    A Tally table is something that has an unbroken sequence of numbers in it. Can be VERY useful to keep around for just these kinds of scenarios and lots more.

    Here's a quick way to build one:

    create table Tally(number as int primary key clustered)

    Insert Tally(number)

    select top 100000 --pick how many you want

    identity(int,1,1)

    from sys.all_columns SC1 cross join sys.all_columns SC2

    Once you build that your query becomes really easy:

    insert table1 (year, reportdate)

    select tally.number as [year],

    cast(tally.number as varchar(4))+'1231' as reportdate

    from tally

    LEFT OUTER JOIN table_1 on tally.number=table_1.year

    where

    tally.number between 1990 and 2010 --pick your daterange here

    and table_1.year IS NULL

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks very much Matt Miller.

    I am still working the cursor into my solution

    and certainly will try yours as well, because

    you are right, if it can be done without cursor

    it would be preferred way. But I will see what

    best for my problem. Appreciated a lot for

    helping me.

  • hi happycat59

    your cursor was right, I just had to change the right(year) portion

    to right(reportdate) and added one more end.

    however, even after it has inserted all the correct years

    and date, the cursor will not stop, it just kept running until I manually stop it.

    is there anything I missed to do?

    thanks again.

  • Matt is correct about the Tally table... but a Tally table isn't really worth a hoot when it comes to performance unless you also build the necessary Clustered Primary Key... please see the script for building a Tally table at the following URL...

    http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff for the link and info. I have never heard of it until Matt wrote it to me.

    It is very helpful and agree that better with proper index.

    I was be able to replace my function and use the tally to

    generate the misisng year, but i'm still having

    trouble with generating the associated date

    with the criteria that I described. So, still playing around with it

    and the cursor solution, so will see.

  • As you're finding out, the hard part is to "smear" the existing data into the new rows for the missing years. This type of smearing is VERY procedural and most folks think the only way to do it is either with a cursor or a While loop. Since you've never heard of the Tally Table before, might as well show you a little SQL Server trick that you've never seen before to replace the cursor. And, on large datasets, it's very fast...

    ... here's all the code including your original table generator. The details are in the comments in the code...

    [font="Courier New"]

    --=============================================================================

    --      Recreate the posted test data... this is NOT part of the solution.

    --=============================================================================

    create table table_1

    (

            ID int NULL,

            Year int NULL,

            ReportDate int NULL

    )

    insert into table_1

    values(1, 1998, 19980101)

    insert into table_1

    values(1, 2000, 20001231)

    insert into table_1

    values(1, 2003, 20030331)

    --=============================================================================

    -- Using the Tally table as a source of numbers for the missing years, copy

    -- all existing rows and make new rows for missing years in a temp table.

    --=============================================================================

    SELECT IDENTITY(INT,1,1) AS RowNum,

            t1.ID, yrs.Year, t1.ReportDate

       INTO #AllYears

       FROM dbo.Table_1 t1

      RIGHT OUTER JOIN

            (--==== Returns all years from the first year to the last year

             SELECT t.N AS Year

               FROM dbo.Tally T  --List of numbers

              INNER JOIN

                    (--==== Find first and last years to use as criteria

                         -- in the inner join with the Tally table

                     SELECT MIN(Year) AS YearLo,

                            MAX(Year) AS YearHi

                       FROM Table_1

                    ) y

                 ON t.N BETWEEN y.YearLo AND y.YearHi

            ) yrs

         ON t1.Year = yrs.Year

      ORDER BY yrs.Year DESC

    --===== Add the required clustered key to force the physical order of data

         -- for the "smear" update that will follow

      ALTER TABLE #AllYears

        ADD PRIMARY KEY CLUSTERED (RowNum)

    --=============================================================================

    --      Using the clustered key to automatically order the data in the correct

    --      descending order for Year, "smear" the data from existing rows down

    --      into the new rows that are missing data using SQL Server's proprietary

    --      UPDATE tablename SET @vaiable = column = expression

    --=============================================================================

    --===== Declare some variables to remember values for the previous row updated.

    DECLARE @PrevID   INT  --The previous ID

    DECLARE @PrevMMDD INT  --Right 4 digits of the previous ReportDate

    SELECT @PrevID   = 0, --Start off w/non-null values that are not in the table.

            @PrevMMDD = 0

    --===== Do the "smear" update.  Using the variables to remember the values from

         -- each previous row, add the missing data to the table where the data is

         -- missing.

    UPDATE #AllYears

        SET @PrevID    = ID = CASE WHEN ID IS NULL THEN @PrevID ELSE ID END,

            @PrevMMDD  =      CASE WHEN RIGHT(ReportDate,4) IS NULL THEN @PrevMMDD

                              ELSE RIGHT(ReportDate,4) END,

            ReportDate =      REPLACE(STR(Year,4)+STR(@PrevMMDD,4),' ','0')

       FROM #AllYears

    --=============================================================================

    --      Display the desired result

    --=============================================================================

    SELECT ID, Year, ReportDate

       FROM #AllYears

      ORDER BY ID,ReportDate

    --===== Housekeeping so can run the test again from SMS

       DROP TABLE #AllYears

       DROP TABLE Table_1

    [/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s.

    Longer code is not always slower code. Longer code does the ol' "Divide and Conquer". The code I posted is what they call "Set Based" programming and that doesn't mean doing it all in a single query. It's a powerful way to write code for many reasons...

    1. Easier to develop because you "peel on potato at a time" (Divide and Conquer). Solving one problem at a time in a larger task just makes it easier to write code for.

    2. Easier to document. Each step is naturally broken out from the whole task and leaves plenty of room for documentation.

    3. The documentation makes it easier for the next poor slob (or, maybe you a year from now) to figure out what you're doing in the code for troubleshooting. The managers at work are amazed that research on larger procs to make simple changes has dropped from as much as 2 days to mere minutes just because we forced folks to document their code. And, fewer mistakes in modifications are made to boot.

    4. The documentation also makes it easier to develop the code because you don't have to remember what you've already written and you don't need to read/analyze the code to see what you've already done. Pays off big time on larger procs but is just as important on the smaller ones.

    And, that's my soapbox for the day 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, your procedure works wonderfully.

    Thanks so much for going above-and-beyond for helping

    me in creating the proc and for giving me guidance

    on doing things the proper ways. You should write a book

    about this. I really like the way you comment your proc

    and put each part into section. It is very easy to read and

    clean. I'm deeply grateful, so thanks x a million.

    Have a beautiful Sunday.

  • Thanks, SQLBlue. :blush: Very happy it does what you want and I sure do appreciate the compliment... you made my weekend. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/12/2008)


    As you're finding out, the hard part is to "smear" the existing data into the new rows for the missing years. This type of smearing is VERY procedural and most folks think the only way to do it is either with a cursor or a While loop. Since you've never heard of the Tally Table before, might as well show you a little SQL Server trick that you've never seen before to replace the cursor. And, on large datasets, it's very fast...

    ... here's all the code including your original table generator. The details are in the comments in the code...

    [font="Courier New"]

    --=============================================================================

    --      Recreate the posted test data... this is NOT part of the solution.

    --=============================================================================

    create table table_1

    (

            ID int NULL,

            Year int NULL,

            ReportDate int NULL

    )

    insert into table_1

    values(1, 1998, 19980101)

    insert into table_1

    values(1, 2000, 20001231)

    insert into table_1

    values(1, 2003, 20030331)

    --=============================================================================

    -- Using the Tally table as a source of numbers for the missing years, copy

    -- all existing rows and make new rows for missing years in a temp table.

    --=============================================================================

    SELECT IDENTITY(INT,1,1) AS RowNum,

            t1.ID, yrs.Year, t1.ReportDate

       INTO #AllYears

       FROM dbo.Table_1 t1

      RIGHT OUTER JOIN

            (--==== Returns all years from the first year to the last year

             SELECT t.N AS Year

               FROM dbo.Tally T  --List of numbers

              INNER JOIN

                    (--==== Find first and last years to use as criteria

                         -- in the inner join with the Tally table

                     SELECT MIN(Year) AS YearLo,

                            MAX(Year) AS YearHi

                       FROM Table_1

                    ) y

                 ON t.N BETWEEN y.YearLo AND y.YearHi

            ) yrs

         ON t1.Year = yrs.Year

      ORDER BY yrs.Year DESC

    --===== Add the required clustered key to force the physical order of data

         -- for the "smear" update that will follow

      ALTER TABLE #AllYears

        ADD PRIMARY KEY CLUSTERED (RowNum)

    --=============================================================================

    --      Using the clustered key to automatically order the data in the correct

    --      descending order for Year, "smear" the data from existing rows down

    --      into the new rows that are missing data using SQL Server's proprietary

    --      UPDATE tablename SET @vaiable = column = expression

    --=============================================================================

    --===== Declare some variables to remember values for the previous row updated.

    DECLARE @PrevID   INT  --The previous ID

    DECLARE @PrevMMDD INT  --Right 4 digits of the previous ReportDate

    SELECT @PrevID   = 0, --Start off w/non-null values that are not in the table.

            @PrevMMDD = 0

    --===== Do the "smear" update.  Using the variables to remember the values from

         -- each previous row, add the missing data to the table where the data is

         -- missing.

    UPDATE #AllYears

        SET @PrevID    = ID = CASE WHEN ID IS NULL THEN @PrevID ELSE ID END,

            @PrevMMDD  =      CASE WHEN RIGHT(ReportDate,4) IS NULL THEN @PrevMMDD

                              ELSE RIGHT(ReportDate,4) END,

            ReportDate =      REPLACE(STR(Year,4)+STR(@PrevMMDD,4),' ','0')

       FROM #AllYears

    --=============================================================================

    --      Display the desired result

    --=============================================================================

    SELECT ID, Year, ReportDate

       FROM #AllYears

      ORDER BY ID,ReportDate

    --===== Housekeeping so can run the test again from SMS

       DROP TABLE #AllYears

       DROP TABLE Table_1

    [/font]

    Jeff,

    How did you get the code to have the proper color coding? Is there a way to do this automatically when posting or did you have to manually color code everything? It does make it a lot easier to read.

    Thanks,

    Ian.

    "If you are going through hell, keep going."
    -- Winston Churchill

  • http://www.Simple-Talk.com (another of RedGate's pages), has an "SQL Pretifier" on it. There's a link on the front page. It will do that kind of color-coding.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Phil Factor also apparently customized one just for here....:

    http://extras.sqlservercentral.com/prettifier/prettifier.aspx

    Use the IFCodes version (that's how to format stuff for SSC).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply