May 2, 2012 at 7:52 am
Hello All,
Let me explain you the entire scenario.
I am loading the data to my DB from a warehouse by using SSIS ETL jobs.
This process is taking bit long as I am extracting the data from 5 different tables.
This Job is scheduled to run daily. In clean up I am truncating the fact data that means daily it is truncating and loading new data which taking long time.
Please advise how to overcome this situation.
Thanks..!!
May 2, 2012 at 7:58 am
Hi a couple of questions:
Define a bit long?
How large are the tables?
Are you using complex queries to retrieve your datasets?
How many transformations are you performing in your ETL's?
Are you using fast table loads for your inserts?
Andy
==========================================================================================================================
A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
May 2, 2012 at 8:04 am
To extract last 9months data it is taking around 1hour.
And I am not using any complex queries only using few Left outer joins. The Main table has 4 million + records.
Yes, I am using fast load to insert data.
May 2, 2012 at 8:11 am
Does seem a bit long...
Are the destination tables heavily indexed?
Are there no transformations in the ETL?
Could you post the SQL?
Andy
==========================================================================================================================
A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
May 2, 2012 at 8:34 am
This is the SQL: SELECT b.comp_code,
a.cust_code,
b.cust_name,
d.country_code,
e.country_desc,
a.item_code,
c.item_desc,
a.order_no,
a.reference_a_no,
a.reference_b_no,
a.po_no,
a.position_no,
a.delivery_no,
a.selling_price,
a.cost_price,
a.order_qty,
a.delivered_qty,
a.invoice_date,
a.invoice_no,
a.trans_type,
a.invoice_amnt
FROM tableA a, table b, table d, table c, table e
WHERE (a.cust_code = b.cust_code) AND (a.order_no = d.order_no) AND (a.item_code= c.item_code) AND (d.country_code= e.country_code)
AND (invoice_Date BETWEEN '2011-07-01' AND TODAY - 1)
AND (a.delivered_qty > 0) AND (comp_code = '1')
Yes I am using transformation.
May 2, 2012 at 9:06 am
Pihu.awasthi (5/2/2012)
In clean up I am truncating the fact data that means daily it is truncating and loading new data which taking long time.
Why are you truncating the fact table?
Why aren't you doing deltas?
It seems that you are spending a lot of time loading data you already DID.
CEWII
May 2, 2012 at 9:15 am
Can you please tell me how to do the deltas??
May 2, 2012 at 9:24 am
Pihu.awasthi (5/2/2012)
Can you please tell me how to do the deltas??
To all intents and purposes its an incremental load, only loading records that have changed or been loaded since your last refresh.
this stairway explains the basics:
http://www.sqlservercentral.com/stairway/72494/
Again
What transformations are you performing?
Are the tables heavily indexed?
Both of these can slow down loading time.
==========================================================================================================================
A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
May 3, 2012 at 1:39 am
That link really helps.
The tables are not heavily indexed. I am using UnionAll, RowCount, DerivedCoulmn transformations.
I guess I have to use something like this query to insert new rows:
Delete src
From Person.Contact src
Join StageDeletes stage
On stage.Email = src.EmailAddress
Thanks.!
May 3, 2012 at 2:05 am
Glad it helped 🙂
Delete src
From Person.Contact src
Join StageDeletes stage
On stage.Email = src.EmailAddress
Just be careful doing something like this - using the email address to identify new records wasn't the best example. Much better to use a date last modified column, hash values or something along those lines..
Andy
==========================================================================================================================
A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe
May 3, 2012 at 2:43 am
Ohh Yes..!!
I am using the invoice_date field.
DELETE FROM tableA
FROM tableA a
INNER JOIN (SELECT DISTINCT invoice_date FROM tableB) b
ON (a.invoice_date = b.invoice_date)
But this query deleting all the data then inserting new data..
Is it right to use this query for deleting fact?
May 3, 2012 at 6:48 am
Hi Guys,
I am new is ssis.... i want to data populate from source DB to destination DB (insert / update / delete) how can i achieve this...
Is it need to use more than one execute SQL task component?
Please help me to achieve this..
Advance thanks...
Pradeep
May 4, 2012 at 1:45 pm
Have you determined what is actually taking the most time in the job? Is it taking forever to purge the old records, but loading the new rows is quick? The opposite? Can you determine if the query is taking a long time to return records (that is, can you run the query in SSMS and get results back quickly) or if it's taking a long time to write to the target? Are these databases on different servers? Network latency issues?
pradeep: Please start your own thread and don't hijack someone else's.
July 31, 2014 at 9:23 am
This stairway is going to save my life. Thanks so much for posting!:-)
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply