April 16, 2007 at 5:06 am
i have to migrate data from a csv file to a SQL server table. The csv file will contain 2 fields area and number. The destination table will have 3 columns : Area, Range From and Range To. The logic behind the migration is to uplaod the numbers for the sam Area in a Range rather than individual rows in the database.
For Example if csv file looks like below:
090,1001
090,1002
090,1003
090,1004
090,1009
090,1010
090,1011
091,1001
091,1021
Then the destination table should look like
AreaRangeFromRangeTo
09010011004
09010091011
09110011001
09110211021
Please not that the ranges are merged.
Now if next time my csv file contains
090,1003
090,1004
090,1005
090,1006
090,1007
090,1008
091,1000
091,1002
Then for the area 090 and number 1003 and 1004 these rows should be ignored as they already exist in the range 1001-1004 in the destination table. The other rows from 1005 to 1008 will be merged to the existing ranges in the table 1001-1004 and 1009-1011 making it 1001-1011 makin the tbale look like
AreaRangeFromRangeTo
09010011011
09110001002
09110211021
The logic that i am using right now is to have a OLE DB command component as the destination which first searches the table to check if the range already exists. If it exists it ignores the row. If it finds that the inpout row can form a new range it first deletes that range from the table and inser the new row with new RangeFrom and RangeTo. This process that i am using is taking a lot of time as there are about millions of rows to be migrated.
Can anyone suggest some better logic or way to accomplish the same so that upload time could be reduced.
April 19, 2007 at 8:23 am
I worked it a little differently, getting the ranges in your input identified and then comparing those to what was already recorded. Not sure about performance with the record counts you're talking about, but it might help...
set nocount on
--prep range table
DECLARE @AreaRanges TABLE (Area varchar(3), RangeFrom int, RangeTo int)
INSERT INTO @AreaRanges (Area, RangeFrom, RangeTo) VALUES ('090', 1001, 1004)
INSERT INTO @AreaRanges (Area, RangeFrom, RangeTo) VALUES ('090', 1009, 1011)
INSERT INTO @AreaRanges (Area, RangeFrom, RangeTo) VALUES ('091', 1001, 1001)
INSERT INTO @AreaRanges (Area, RangeFrom, RangeTo) VALUES ('091', 1021, 1021)
--prep input
DECLARE @Input TABLE (Area varchar(3), RangeValue int)
INSERT INTO @Input VALUES ('090',1003)
INSERT INTO @Input VALUES ('090',1004)
INSERT INTO @Input VALUES ('090',1005)
INSERT INTO @Input VALUES ('090',1006)
INSERT INTO @Input VALUES ('090',1007)
INSERT INTO @Input VALUES ('090',1008)
INSERT INTO @Input VALUES ('091',1000)
INSERT INTO @Input VALUES ('091',1002)
INSERT INTO @Input VALUES ('090',9999)
INSERT INTO @Input VALUES ('092',6666)
INSERT INTO @Input VALUES ('092',6667)
INSERT INTO @input VALUES ('090',999)
--identify input ranges
DECLARE @RangeCandidates TABLE (Area varchar(3), RangeFrom int, RangeTo int, Diff int)
INSERT INTO @RangeCandidates (Area, RangeFrom, RangeTo, Diff)
SELECT UpperBounds.Area, LowerBounds.RangeFrom, UpperBounds.RangeTo, UpperBounds.RangeTo - LowerBounds.RangeFrom as Diff
FROM
(select
a.Area, a.RangeValue as RangeTo
from @Input a
where a.RangeValue NOT IN (select RangeValue - 1 from @Input where Area = a.Area)) UpperBounds
INNER JOIN
(select
a.Area, a.RangeValue as RangeFrom
from @Input a
where a.RangeValue NOT IN (select RangeValue + 1 from @Input where Area = a.Area)) LowerBounds
ON UpperBounds.Area = LowerBounds.Area
WHERE UpperBounds.RangeTo - LowerBounds.RangeFrom >= 0
--get rid of false ranges from the above query
DELETE a
FROM @RangeCandidates a
WHERE a.Diff != (SELECT MIN(Diff) FROM @RangeCandidates WHERE Area = a.Area AND RangeFrom = a.RangeFrom)
--update existing upper bounds
UPDATE a
SET RangeTo = b.RangeTo
FROM @AreaRanges a INNER JOIN @RangeCandidates b ON a.Area = b.Area
WHERE b.RangeFrom BETWEEN a.RangeFrom - 1 AND a.RangeTo + 1
AND b.RangeTo >= a.RangeTo
--update existing lower bounds
UPDATE a
SET RangeFrom = b.RangeFrom
FROM @AreaRanges a INNER JOIN @RangeCandidates b ON a.Area = b.Area
WHERE b.RangeTo BETWEEN a.RangeFrom - 1 AND a.RangeTo + 1
AND b.RangeFrom <= a.RangeFrom
--you may have crossover in your ranges now. identify these.
DECLARE @Crossovers TABLE (Area varchar(3), RangeFrom int, RangeTo int)
INSERT INTO @Crossovers
SELECT a.Area, a.RangeFrom, b.Rangeto
FROM @AreaRanges a INNER JOIN @AreaRanges b ON a.Area = b.Area
WHERE b.RangeFrom BETWEEN a.RangeFrom AND a.RangeTo
AND a.RangeFrom != b.RangeFrom
--remove the existing range entries for crossed-over ranges
DELETE a
FROM @AreaRanges a INNER JOIN @Crossovers b ON a.Area = b.Area
WHERE a.RangeTo <= b.RangeTo
AND a.RangeFrom >= b.RangeFrom
--insert new range record for crossovers
INSERT INTO @AreaRanges
SELECT Area, RangeFrom, RangeTo
FROM @Crossovers
--insert new ranges, new areas
INSERT INTO @AreaRanges (Area, RangeFrom, RangeTo)
SELECT b.Area, b.RangeFrom, b.RangeTo
FROM @RangeCandidates b LEFT OUTER JOIN @AreaRanges a
ON a.Area = b.Area AND
(b.RangeFrom BETWEEN a.RangeFrom AND a.RangeTo OR b.RangeTo BETWEEN a.RangeFrom AND a.RangeTo)
WHERE a.Area IS NULL
--report ranges
SELECT * FROM @AreaRanges
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply