Help needed in migration logic

  • i have to migrate data from a csv file to a SQL server table. The csv file will contain 2 fields area and number. The destination table will have 3 columns : Area, Range From and Range To. The logic behind the migration is to uplaod the numbers for the sam Area in a Range rather than individual rows in the database.

    For Example if csv file looks like below:

    090,1001

    090,1002

    090,1003

    090,1004

    090,1009

    090,1010

    090,1011

    091,1001

    091,1021

    Then the destination table should look like

    AreaRangeFromRangeTo

    09010011004

    09010091011

    09110011001

    09110211021

    Please not that the ranges are merged.

    Now if next time my csv file contains

    090,1003

    090,1004

    090,1005

    090,1006

    090,1007

    090,1008

    091,1000

    091,1002

    Then for the area 090 and number 1003 and 1004 these rows should be ignored as they already exist in the range 1001-1004 in the destination table. The other rows from 1005 to 1008 will be merged to the existing ranges in the table 1001-1004 and 1009-1011 making it 1001-1011 makin the tbale look like

    AreaRangeFromRangeTo

    09010011011

    09110001002

    09110211021

    The logic that i am using right now is to have a OLE DB command component as the destination which first searches the table to check if the range already exists. If it exists it ignores the row. If it finds that the inpout row can form a new range it first deletes that range from the table and inser the new row with new RangeFrom and RangeTo. This process that i am using is taking a lot of time as there are about millions of rows to be migrated.

    Can anyone suggest some better logic or way to accomplish the same so that upload time could be reduced.


    Kindest Regards,

    freak
    techyfreak.blogspot.com

  • I worked it a little differently, getting the ranges in your input identified and then comparing those to what was already recorded.  Not sure about performance with the record counts you're talking about, but it might help...

    set nocount on

    --prep range table

    DECLARE @AreaRanges TABLE (Area varchar(3), RangeFrom int, RangeTo int)

    INSERT INTO @AreaRanges (Area, RangeFrom, RangeTo) VALUES ('090', 1001, 1004)

    INSERT INTO @AreaRanges (Area, RangeFrom, RangeTo) VALUES ('090', 1009, 1011)

    INSERT INTO @AreaRanges (Area, RangeFrom, RangeTo) VALUES ('091', 1001, 1001)

    INSERT INTO @AreaRanges (Area, RangeFrom, RangeTo) VALUES ('091', 1021, 1021)

    --prep input

    DECLARE @Input TABLE (Area varchar(3),  RangeValue int)

    INSERT INTO @Input VALUES ('090',1003)

    INSERT INTO @Input VALUES ('090',1004)

    INSERT INTO @Input VALUES ('090',1005)

    INSERT INTO @Input VALUES ('090',1006)

    INSERT INTO @Input VALUES ('090',1007)

    INSERT INTO @Input VALUES ('090',1008)

    INSERT INTO @Input VALUES ('091',1000)

    INSERT INTO @Input VALUES ('091',1002)

    INSERT INTO @Input VALUES ('090',9999)

    INSERT INTO @Input VALUES ('092',6666)

    INSERT INTO @Input VALUES ('092',6667)

    INSERT INTO @input VALUES ('090',999)

    --identify input ranges

    DECLARE @RangeCandidates TABLE (Area varchar(3), RangeFrom int, RangeTo int, Diff int)

    INSERT INTO @RangeCandidates (Area, RangeFrom, RangeTo, Diff)

    SELECT UpperBounds.Area, LowerBounds.RangeFrom, UpperBounds.RangeTo, UpperBounds.RangeTo - LowerBounds.RangeFrom as Diff

    FROM

     (select 

      a.Area, a.RangeValue as RangeTo

      from @Input a

      where a.RangeValue NOT IN (select RangeValue - 1 from @Input where Area = a.Area)) UpperBounds

     INNER JOIN

     (select 

      a.Area, a.RangeValue as RangeFrom

      from @Input a

      where a.RangeValue NOT IN (select RangeValue + 1 from @Input where Area = a.Area)) LowerBounds

     ON UpperBounds.Area = LowerBounds.Area

    WHERE  UpperBounds.RangeTo - LowerBounds.RangeFrom >= 0

    --get rid of false ranges from the above query

    DELETE a

    FROM @RangeCandidates a

    WHERE a.Diff != (SELECT MIN(Diff) FROM @RangeCandidates WHERE Area = a.Area AND RangeFrom = a.RangeFrom)

    --update existing upper bounds

    UPDATE a

    SET RangeTo = b.RangeTo

    FROM @AreaRanges a INNER JOIN @RangeCandidates b ON a.Area = b.Area

    WHERE b.RangeFrom BETWEEN a.RangeFrom - 1 AND a.RangeTo + 1

    AND b.RangeTo >= a.RangeTo

    --update existing lower bounds

    UPDATE a

    SET RangeFrom = b.RangeFrom

    FROM @AreaRanges a INNER JOIN @RangeCandidates b ON a.Area = b.Area

    WHERE b.RangeTo BETWEEN a.RangeFrom - 1 AND a.RangeTo + 1

    AND b.RangeFrom <= a.RangeFrom

    --you may have crossover in your ranges now.  identify these.

    DECLARE @Crossovers TABLE (Area varchar(3), RangeFrom int, RangeTo int)

    INSERT INTO @Crossovers

    SELECT a.Area, a.RangeFrom, b.Rangeto

    FROM @AreaRanges a INNER JOIN @AreaRanges b ON a.Area = b.Area

    WHERE b.RangeFrom BETWEEN a.RangeFrom AND a.RangeTo

    AND a.RangeFrom != b.RangeFrom

    --remove the existing range entries for crossed-over ranges

    DELETE  a

    FROM @AreaRanges a INNER JOIN @Crossovers b ON a.Area = b.Area

    WHERE a.RangeTo <= b.RangeTo

    AND a.RangeFrom >= b.RangeFrom

    --insert new range record for crossovers

    INSERT INTO @AreaRanges

    SELECT Area, RangeFrom, RangeTo

    FROM @Crossovers

    --insert new ranges, new areas

    INSERT INTO @AreaRanges (Area, RangeFrom, RangeTo)

    SELECT  b.Area, b.RangeFrom, b.RangeTo

    FROM @RangeCandidates b LEFT OUTER JOIN @AreaRanges a

       ON  a.Area = b.Area AND

        (b.RangeFrom BETWEEN a.RangeFrom AND a.RangeTo OR b.RangeTo BETWEEN a.RangeFrom AND a.RangeTo)

    WHERE a.Area IS NULL

    --report ranges

    SELECT * FROM @AreaRanges

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply