Data Flow Task working very slow due to left join in source

Question

Data Flow Task working very slow due to left join in source

Re1

SSC Eights!

Points: 991
More actions
November 26, 2015 at 3:57 am

#305498

Hi frnds,
I have one DFT which is having straight source to destination mapping.In Source I am using a T- SQL Which is using Left outer Join.As Source query output 20 millions + rows hence I an using batch wise data load to avoid buffer issue. The issue is DFT is taking very much time to load single batch.
I have done below changes to improve the performance
1. Used column names instead of "*" in Select list.
2. Used Fast Load option in Oledb destination
3. connection manager's packet size is set to 32767
4. Used batch wise data load
I want a solution where i can improve the DFT performance.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Have you tested the individual components of the flow to see where the bottleneck is? How long does it take to run the query locally? How long does it take to copy a comparable amount of data across the network between the source and destination servers? And how long does a simple INSERT (with the same amount of data) into the destination table take?

John

Phil Parkin SSC Guru Points: 246985 More actions · Answer 2

John Mitchell-245523 (11/26/2015)
Have you tested the individual components of the flow to see where the bottleneck is? How long does it take to run the query locally? How long does it take to copy a comparable amount of data across the network between the source and destination servers? And how long does a simple INSERT (with the same amount of data) into the destination table take?
John

This is good advice :smooooth:

If the query runs slowly outside of SSIS, it won't magically speed up when included in a package.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Re1 SSC Eights! Points: 991 More actions · Answer 3

Hi,

The source query is taking very much time to produce the data.

below is a sample query used in source component.

tbl_documents table is having 700 rows and tbl_documentdetails table is having 1.6 millions+ rows.

I have tried using inner join instead of exists for DocumentCodeList table it is giving me same perfomanace issue.

SELECT

a.*

,b.value1

,b.value2

FROM

tbl_documents AS a

LEFT OUTER JOIN

dbo.tbl_documentdetails b WITH(NOLOCK)

ON

a.invoiceid=b.invoiceid

AND

a.orderId=b.orderid

AND

a.billid=b.billid

WHERE

EXISTS

(

SELECT

c.invoice

FROM

DocumentCodeList c

WHERE

a.invoiceid = c.invoiceid

AND

b.srnumber BETWEEN 1 AND 100

)

Phil Parkin SSC Guru Points: 246985 More actions · Answer 4

Re1 (11/26/2015)
Hi,
The source query is taking very much time to produce the data.
below is a sample query used in source component.
tbl_documents table is having 700 rows and tbl_documentdetails table is having 1.6 millions+ rows.
I have tried using inner join instead of exists for DocumentCodeList table it is giving me same perfomanace issue.
SELECT
a.*
,b.value1
,b.value2
FROM
tbl_documents AS a
LEFT OUTER JOIN
dbo.tbl_documentdetails b WITH(NOLOCK)
ON
a.invoiceid=b.invoiceid
AND
a.orderId=b.orderid
AND
a.billid=b.billid
WHERE
EXISTS
(
SELECT
c.invoice
FROM
DocumentCodeList c
WHERE
a.invoiceid = c.invoiceid
AND
b.srnumber BETWEEN 1 AND 100
)

Have you tried getting an execution plan and optimising the query?

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Re1 SSC Eights! Points: 991 More actions · Answer 5

Yes I have used execution plan and added missing index also.

The query is taking to much time in Nested Loops(Left outer join).

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

The percentages in a plan are not times, they're estimated cost (emphasis *estimated*)

Can you post the plan and the table and index definitions?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

MadAdmin SSChampion Points: 11261 More actions · Answer 7

does the bit below effectively make the left join an inner join?

b.srnumber BETWEEN 1 AND 100

Also, for anyone else who struggles to read unformatted sql.

SELECT

a.*

, b.value1

, b.value2

FROM

tbl_documents AS a

LEFT OUTER JOIN dbo.tbl_documentdetails AS b WITH

( NOLOCK

) ON a.invoiceid = b.invoiceid

AND a.orderId = b.orderid

AND a.billid = b.billid

WHERE EXISTS

(

SELECT

c.invoice

FROM

DocumentCodeList AS c

WHERE a.invoiceid = c.invoiceid

AND b.srnumber BETWEEN 1 AND 100

);

Catch-all queries done right [/url]
Gail Shaw's Performance Blog[/url]

MMartin1 One Orange Chip Points: 27799 More actions · Answer 8

It looks like that Left Join is essentially an inner join... you are filtering the results back down AFTER you are first putting together the bigger recordset. Make this

tbl_documents AS a

INNER JOIN dbo.tbl_documentdetails AS b --WITH(NOLOCK)

ON a.invoiceid = b.invoiceid

AND a.orderId = b.orderid

AND a.billid = b.billid

AND b.srnumber BETWEEN 1 AND 100

Having Clustered indexes on the ID columns are a good bet as well.

----------------------------------------------------

Re1 SSC Eights! Points: 991 More actions · Answer 9

Thanks for Ur reply MMartin1

I have tried with inner join also, still the same performance issues I am facing.

I have added clustered index on invoiceid,orderid,billid column on tbl_documentdetails table and added clustered index on srnumber coumn on DocumentCodeList table.

MMartin1 One Orange Chip Points: 27799 More actions · Answer 10

Try to get the actual query plan attached here. A couple of people have mentioned that before, plus your table definitions. That will go further in helping us to help you.

----------------------------------------------------

Barkstuff Say Hey Kid Points: 662 More actions · Answer 11

/* try this - it may not make a difference but it might help you find the bottleneck */

;with cte as

(

select a.*

from tbl_documents AS a

inner join DocumentCodeList AS c

WHERE a.invoiceid = c.invoiceid

)

select cte.*,

, b.value1

, b.value2

from cte

LEFT OUTER JOIN dbo.tbl_documentdetails AS b with (nolock)

on a.invoiceid = b.invoiceid

AND a.orderId = b.orderid

AND a.billid = b.billid

where b.srnumber >= 1 AND b.srnumber <= 100