Data Flow Task working very slow due to left join in source

  • Hi frnds,

    I have one DFT which is having straight source to destination mapping.In Source I am using a T- SQL Which is using Left outer Join.As Source query output 20 millions + rows hence I an using batch wise data load to avoid buffer issue. The issue is DFT is taking very much time to load single batch.

    I have done below changes to improve the performance

    1. Used column names instead of "*" in Select list.

    2. Used Fast Load option in Oledb destination

    3. connection manager's packet size is set to 32767

    4. Used batch wise data load

    I want a solution where i can improve the DFT performance.

  • Have you tested the individual components of the flow to see where the bottleneck is? How long does it take to run the query locally? How long does it take to copy a comparable amount of data across the network between the source and destination servers? And how long does a simple INSERT (with the same amount of data) into the destination table take?

    John

  • John Mitchell-245523 (11/26/2015)


    Have you tested the individual components of the flow to see where the bottleneck is? How long does it take to run the query locally? How long does it take to copy a comparable amount of data across the network between the source and destination servers? And how long does a simple INSERT (with the same amount of data) into the destination table take?

    John

    This is good advice :smooooth:

    If the query runs slowly outside of SSIS, it won't magically speed up when included in a package.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi,

    The source query is taking very much time to produce the data.

    below is a sample query used in source component.

    tbl_documents table is having 700 rows and tbl_documentdetails table is having 1.6 millions+ rows.

    I have tried using inner join instead of exists for DocumentCodeList table it is giving me same perfomanace issue.

    SELECT

    a.*

    ,b.value1

    ,b.value2

    FROM

    tbl_documents AS a

    LEFT OUTER JOIN

    dbo.tbl_documentdetails b WITH(NOLOCK)

    ON

    a.invoiceid=b.invoiceid

    AND

    a.orderId=b.orderid

    AND

    a.billid=b.billid

    WHERE

    EXISTS

    (

    SELECT

    c.invoice

    FROM

    DocumentCodeList c

    WHERE

    a.invoiceid = c.invoiceid

    AND

    b.srnumber BETWEEN 1 AND 100

    )

  • Re1 (11/26/2015)


    Hi,

    The source query is taking very much time to produce the data.

    below is a sample query used in source component.

    tbl_documents table is having 700 rows and tbl_documentdetails table is having 1.6 millions+ rows.

    I have tried using inner join instead of exists for DocumentCodeList table it is giving me same perfomanace issue.

    SELECT

    a.*

    ,b.value1

    ,b.value2

    FROM

    tbl_documents AS a

    LEFT OUTER JOIN

    dbo.tbl_documentdetails b WITH(NOLOCK)

    ON

    a.invoiceid=b.invoiceid

    AND

    a.orderId=b.orderid

    AND

    a.billid=b.billid

    WHERE

    EXISTS

    (

    SELECT

    c.invoice

    FROM

    DocumentCodeList c

    WHERE

    a.invoiceid = c.invoiceid

    AND

    b.srnumber BETWEEN 1 AND 100

    )

    Have you tried getting an execution plan and optimising the query?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Yes I have used execution plan and added missing index also.

    The query is taking to much time in Nested Loops(Left outer join).

  • The percentages in a plan are not times, they're estimated cost (emphasis *estimated*)

    Can you post the plan and the table and index definitions?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • does the bit below effectively make the left join an inner join?

    b.srnumber BETWEEN 1 AND 100

    Also, for anyone else who struggles to read unformatted sql.

    SELECT

    a.*

    , b.value1

    , b.value2

    FROM

    tbl_documents AS a

    LEFT OUTER JOIN dbo.tbl_documentdetails AS b WITH

    ( NOLOCK

    ) ON a.invoiceid = b.invoiceid

    AND a.orderId = b.orderid

    AND a.billid = b.billid

    WHERE EXISTS

    (

    SELECT

    c.invoice

    FROM

    DocumentCodeList AS c

    WHERE a.invoiceid = c.invoiceid

    AND b.srnumber BETWEEN 1 AND 100

    );

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • It looks like that Left Join is essentially an inner join... you are filtering the results back down AFTER you are first putting together the bigger recordset. Make this

    tbl_documents AS a

    INNER JOIN dbo.tbl_documentdetails AS b --WITH(NOLOCK)

    ON a.invoiceid = b.invoiceid

    AND a.orderId = b.orderid

    AND a.billid = b.billid

    AND b.srnumber BETWEEN 1 AND 100

    Having Clustered indexes on the ID columns are a good bet as well.

    ----------------------------------------------------

  • Thanks for Ur reply MMartin1

    I have tried with inner join also, still the same performance issues I am facing.

    I have added clustered index on invoiceid,orderid,billid column on tbl_documentdetails table and added clustered index on srnumber coumn on DocumentCodeList table.

  • Try to get the actual query plan attached here. A couple of people have mentioned that before, plus your table definitions. That will go further in helping us to help you.

    ----------------------------------------------------

  • /* try this - it may not make a difference but it might help you find the bottleneck */

    ;with cte as

    (

    select a.*

    from tbl_documents AS a

    inner join DocumentCodeList AS c

    WHERE a.invoiceid = c.invoiceid

    )

    select cte.*,

    , b.value1

    , b.value2

    from cte

    LEFT OUTER JOIN dbo.tbl_documentdetails AS b with (nolock)

    on a.invoiceid = b.invoiceid

    AND a.orderId = b.orderid

    AND a.billid = b.billid

    where b.srnumber >= 1 AND b.srnumber <= 100

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply