April 24, 2008 at 7:58 am
Hi All,
I have created an SSIS package to import data from an Excel file, then lookup additional info from an SQL datatable (based on multiple criteria), then export to a datatable destination.
I would like to add another step to this process but i'm not sure what I would need to do this, possibly a custom script/SQL task.
After the lookup task has completed I would like to dedupe records based on whether the reference has appeared already in the last 7 days (each row has a reference number and datetimestamp). The data file has records in date order, so I think what would work is a rolling equation that looks at those records in the last 7 days only, then either assigns "1" for unique or "0" for duplicates, in a seperate column (the earliest recorded reference would be the unique value).
I'm not sure how I achieve this though, so I am looking for inspiration, or open to any other suggestions.
Thanks in advance
April 24, 2008 at 8:36 am
piet_dj (4/24/2008)
Hi All,I have created an SSIS package to import data from an Excel file, then lookup additional info from an SQL datatable (based on multiple criteria), then export to a datatable destination.
I would like to add another step to this process but i'm not sure what I would need to do this, possibly a custom script/SQL task.
After the lookup task has completed I would like to dedupe records based on whether the reference has appeared already in the last 7 days (each row has a reference number and datetimestamp). The data file has records in date order, so I think what would work is a rolling equation that looks at those records in the last 7 days only, then either assigns "1" for unique or "0" for duplicates, in a seperate column (the earliest recorded reference would be the unique value).
I'm not sure how I achieve this though, so I am looking for inspiration, or open to any other suggestions.
Thanks in advance
From what I can gather from your request, it looks like the easiest way is to join the data table to a subquery that groups all records in the last 7 days by reference and filters out the unique records using the having clause. Maybe you could change the following to fit your needs?
ALTER TABLE dataTable ADD is_unique bit DEFAULT 1
UPDATEdataTable
SET is_unique = 0
FROMdataTable dt join (
SELECTreference
, count(*) as ref_count
FROMdataTable
WHEREdatetimestamp >= dateadd(d,-7,getdate())
GROUP BY reference
HAVING count(*) > 1) as rc on
dt.reference = rc.reference
WHERE dt.datetimestamp >= dateadd(d,-7,getdate())
--------
[font="Tahoma"]I love deadlines. I like the whooshing sound they make as they fly by. -Douglas Adams[/font]
April 24, 2008 at 10:37 am
Hi Rob,
Thanks for your quick response. I'm new to SSIS and I don't think I explained enough earlier. I want every record to be included in the destination import, with a rolling calculation looking back 7 days, for every row, then moving on to the next and applying the same 7 day rule.
This is what I would like to achieve:
Ref No-----------Date-----------Unique
123456--------01/04/2008---------- 1
123457--------01/04/2008---------- 1
123456--------05/04/2008---------- 0
123457--------10/04/2008---------- 1
I'd like to do this after the lookup step has completed, then import the results into the SQL datatable.
many thanks
April 25, 2008 at 4:10 am
Ah, in that case, although a cursor or table variable would probably be the obvious answer, try the following and see if it hits the spot. It adds the column marking them all as unique, and then updates those rows where the the ref has cropped up in the 7 days prior to the record, and the counterpart. Is that what you're after?
ALTER TABLE dataTable ADD is_unique bit NOT NULL DEFAULT 1
UPDATE dataTable
SET is_unique = 0
FROM dataTable dt join (
SELECT dt1.reference
, dt1.datetimestamp
, count(*) as ref_count
FROM dataTable dt1 left outer join dataTable dt2 on
dt1.reference = dt2.reference
WHERE dt2.datetimestamp between dateadd(d,-7,dt1.datetimestamp) and dt1.datetimestamp
GROUP BY dt1.reference, dt1.datetimestamp
HAVING count(*) > 1) as rc on
dt.reference = rc.reference
and dt.datetimestamp between dateadd(d,-7,rc.datetimestamp) and rc.datetimestamp
There are far more talented SQL Ninjas on this site, so hopefully someone else will refine this / point out the obvious flaws!
--------
[font="Tahoma"]I love deadlines. I like the whooshing sound they make as they fly by. -Douglas Adams[/font]
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply