JOIN in date range slow

  • JOIN on date range is very slow..

    Look at the bottom where it's using ">=" and "<="

    FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

    and

    FATCA_STG.S_CLIENT_SHARED CLIENT_SHARED

    tables have about 80,000,000 records

    Other tables are tiny.

    SELECT ASSOCIATION.SOURCE_CODE, ASSOCIATION.CUSTOMER_NUMBER

    --INTO #COMMON_I_SS_ARRANGEMENT_LOCATION_ADDR_IDS_CURRENT_VALID_ARRANGEMENT

    FROM (SELECT MAX(DATA_SET.BUS_PROC_DATE) AS BUS_PROC_DATE FROM [FATCA_TDS].[I_DATA_SET] DATA_SET

    WHERE [IS_ANNUAL]='Y' AND DATA_SET.BUS_PROC_DATE <= '2015-11-02')

    LAST_ANNUAL

    JOIN (SELECT ENUM_REF.MNEMONIC FROM [FATCA_STG].[R_SOURCE_SYSTEM_DEFAULT] SYSTEM_DEFAULT

    JOIN FATCA_STG.R_ENUM_REF ENUM_REF ON ( ENUM_REF.SET_NAME = 'LEGAL ENTITY'

    AND SYSTEM_DEFAULT.SET_NAME = 'ACCT-LGL-ENT' AND ENUM_REF.CODE = SYSTEM_DEFAULT.[STR_VALUE] )

    WHERE SYSTEM_DEFAULT.SOURCE_CODE='IDS')

    IDS_LGL_ENT ON (1=1)

    join FATCA_STG.S_CLIENT_SHARED CLIENT_SHARED

    on ( CLIENT_SHARED.SOURCE_CODE = 'IDS'

    AND CLIENT_SHARED.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND CLIENT_SHARED.BUS_PROC_DT<='2015-11-02'

    )

    JOIN

    FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED ASSOCIATION

    on ( ASSOCIATION.SOURCE_CODE = CLIENT_SHARED.SOURCE_CODE

    AND ASSOCIATION.CUSTOMER_NUMBER = CLIENT_SHARED.CUSTOMER_NUMBER

    AND ASSOCIATION.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE

    AND ASSOCIATION.BUS_PROC_DT <= '2015-11-02'

    )

  • ON (1=1)

    makes the join a cross-join

    Got an index on the date column? Otherwise you'll get a table scan, I think. Check the exec plan

    Gerald Britton, Pluralsight courses

  • You might be a victim of triangular joins. http://www.sqlservercentral.com/articles/T-SQL/61539/

    Can you post the information needed as explained in this article: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Gerald. Thanks for your reply.

    FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED indexes:

    ------------------------------------------------------------------

    SOURCE_CODE, BUS_PROC_DT, ACCOUNT_IDENTIFIER

    SOURCE_CODE, BUS_PROC_DT, CUSTOMER_NUMBER

    BUS_PROC_DT

    FATCA_STG.S_CLIENT_SHARED indexes:

    ------------------------------------------------------------------

    SOURCE_CODE, BUS_PROC_DT, CUSTOMER_NUMBER

    ID

    At the beginning that (1=1) drew my attention as well

    but query slows down only when you

    JOIN

    FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

    Without this join it run 1 sec.

  • Luis,

    I attached Execution Plan.

    The rest I covered already.

    S_CLIENT_SHARED

    S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

    have 80 million records.

    Other tables are very small.

    Without JOIN with S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

    sql runs 1 second.

    With this JOIN it takes 20 seconds

  • Folks,

    I just discovered an interesting thing.

    The original query uses DISTINCT.

    SELECT DISTINCT

    ASSOCIATION.SOURCE_CODE, ASSOCIATION.CUSTOMER_NUMBER

    . . . . . . .

    Without DISTINCT

    it returns 1.8 billion records !!

    With DISTINCT - it returns 236,000 rows.

    I have to mention. This is not my code.

    I was assigned to support this ETL application.

    But the more I look at the code the more I hate it.

    Recursive stored procedures ,

    nested Cursors inside of which SQL is being built dynamically . . . yak

    You can't even understand which SQL is being executed...

  • Original query:

    SELECT DISTINCT

    ACCOUNT_SHARED.SOURCE_CODE

    ,ACCOUNT_SHARED.ACCOUNT_IDENTIFIER

    INTO #COMMON_I_SS_ARRANGEMENT_LOCATION_ADDR_IDS_CURRENT_VALID_ARRANGEMENT

    FROM

    (SELECT MAX(DATA_SET.BUS_PROC_DATE) AS BUS_PROC_DATE

    FROM [FATCA_TDS].[I_DATA_SET] DATA_SET WHERE [IS_ANNUAL]='Y'

    AND DATA_SET.BUS_PROC_DATE <= <P_BUS_PROC_DT>)

    LAST_ANNUAL

    JOIN

    (SELECT ENUM_REF.MNEMONIC

    FROM [FATCA_STG].[R_SOURCE_SYSTEM_DEFAULT] SYSTEM_DEFAULT

    JOIN FATCA_STG.R_ENUM_REF ENUM_REF ON

    ( ENUM_REF.SET_NAME = 'LEGAL ENTITY'

    AND SYSTEM_DEFAULT.SET_NAME = 'ACCT-LGL-ENT'

    AND ENUM_REF.CODE = SYSTEM_DEFAULT.[STR_VALUE] )

    WHERE SYSTEM_DEFAULT.SOURCE_CODE='IDS')

    IDS_LGL_ENT

    ON (1=1)

    JOIN FATCA_STG.S_CLIENT_SHARED CLIENT_SHARED

    ON ( CLIENT_SHARED.SOURCE_CODE = <V_SOURCE_CODE>

    AND CLIENT_SHARED.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND CLIENT_SHARED.BUS_PROC_DT<=<P_BUS_PROC_DT>

    )

    JOIN FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED ASSOCIATION

    ON ( ASSOCIATION.SOURCE_CODE = CLIENT_SHARED.SOURCE_CODE

    AND ASSOCIATION.CUSTOMER_NUMBER = CLIENT_SHARED.CUSTOMER_NUMBER

    AND ASSOCIATION.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND ASSOCIATION.BUS_PROC_DT<=<P_BUS_PROC_DT>

    )

    JOIN FATCA_STG.S_ACCOUNT_SHARED ACCOUNT_SHARED

    ON ( ACCOUNT_SHARED.SOURCE_CODE = ASSOCIATION.SOURCE_CODE

    AND ACCOUNT_SHARED.ACCOUNT_IDENTIFIER = ASSOCIATION.ACCOUNT_IDENTIFIER

    AND NOT (ACCOUNT_SHARED.SOURCE_CODE IN ('GMS2','IDP','FCA','IDS') AND ACCOUNT_SHARED.ACCOUNT_STATUS IN ('PENDING'))

    AND ACCOUNT_SHARED.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND ACCOUNT_SHARED.BUS_PROC_DT<=<P_BUS_PROC_DT>

    )

    JOIN FATCA_STG.S_ACCOUNT_SHARED FIS2_ACCOUNT_SHARED

    ON ( FIS2_ACCOUNT_SHARED.SOURCE_CODE = 'FIS2'

    AND FIS2_ACCOUNT_SHARED.ACCOUNT_NUMBER = ACCOUNT_SHARED.ACCOUNT_IDENTIFIER

    AND FIS2_ACCOUNT_SHARED.BUS_PROC_DT = <P_BUS_PROC_DT>

    )

    JOIN FATCA_STG.R_ENUM_REF FIS2_ACCT_LGL_ENT

    ON ( FIS2_ACCT_LGL_ENT.SET_NAME='LEGAL ENTITY'

    AND FIS2_ACCT_LGL_ENT.DESCRIPTION IN (SELECT [FILTER_STRING_VALUE] FROM [FATCA_TDS].[T_ETL_SP_SS_TRANSFORMER_EXTRA_FILTERS] WHERE [FILTER_NAME] = 'ACCEPTED LEGAL ENTITY' AND [INCLUDE_EXCLUDE]='INCLUDE')

    AND ( UPPER(LTRIM(RTRIM(FIS2_ACCOUNT_SHARED.TD_LEGAL_ENTITY))) = UPPER(LTRIM(RTRIM(FIS2_ACCT_LGL_ENT.MNEMONIC)))

    OR UPPER(LTRIM(RTRIM(FIS2_ACCOUNT_SHARED.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')

    )

    )

    JOIN FATCA_STG.R_ENUM_REF ACCT_LGL_ENT

    ON ( ACCT_LGL_ENT.SET_NAME='LEGAL ENTITY'

    AND ACCT_LGL_ENT.DESCRIPTION IN (SELECT [FILTER_STRING_VALUE] FROM [FATCA_TDS].[T_ETL_SP_SS_TRANSFORMER_EXTRA_FILTERS] WHERE [FILTER_NAME] = 'ACCEPTED LEGAL ENTITY' AND [INCLUDE_EXCLUDE]='INCLUDE')

    AND ( UPPER(LTRIM(RTRIM(CASE WHEN ACCOUNT_SHARED.SOURCE_CODE='IDS' THEN IDS_LGL_ENT.MNEMONIC ELSE ACCOUNT_SHARED.TD_LEGAL_ENTITY END))) = UPPER(LTRIM(RTRIM(ACCT_LGL_ENT.MNEMONIC)))

    OR UPPER(LTRIM(RTRIM(ACCOUNT_SHARED.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')

    )

    )

  • regarding

    ON (1=1)

    after all maybe it's not a big deal because LAST_ANNUAL table

    is only one row (always).

    so it's not really a CROSS JOIN.

  • It seems that you had some fun working on this query and you'll have more ahead. 😀

    I'd specify the cross join explicitly, just to let clear that I know what I'm doing.

    The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.

    I'd help you but I'm on my own nightmare right now.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks Luis.

    So you insist CROSS JOIN is a bottleneck.. I guess.

    I actually did a test.

    I removed JOIN ON (1=1).

    But the execution time did not change at all...

    Luis,

    I didn't understand this line:

    ". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."

    What does it mean ?

  • RVO (11/13/2015)


    So you insist CROSS JOIN is a bottleneck.. I guess.

    I actually did a test.

    I removed JOIN ON (1=1).

    But the execution time did not change at all...

    No, I'm just saying that if it's a cross join, you should write it as such. It's basically formatting, but I like to see it as good coding practices.

    Luis,

    I didn't understand this line:

    ". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."

    What does it mean ?

    These images from the posted execution plan should show it.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • First of all, please tell us that all your columns in your join predicates are of the same data type. Implicit conversion can kill the performance of any query.

    You also have several join predicates that use (according to the column names) date columns with a <= or >= comparison against either a literal or another date column. I hope those are of the same datatypes as well. Also, like Gerald asked above, do you have an index on those columns? In fact, I would check for covering indexes on each table, the columns in your join predicates should be the key columns and the returned columns should be the included columns.

    You have several different parts to your query. Can you isolate each one in a "divide-and-conquer" approach? This would let you determine the best way to query each one. It might also make it easier to see where you needs indexes to support your joins.

    Your 1.8B rows versus your 236K rows with the distinct is probably due to a cross join. Whether or not this is accidental is something you have to answer from your knowledge of the process and your expected output. There are times when cross joins are appropriate, but in other places they can get you into trouble.

    Lastly, regarding your question about estimated rows, they're determined by the statistics. If your estimated rows is too far off from your actual rows, then your statistics are probably out of date.

  • Thanks Luis.

    Without DISTINCT, row count is actually 20 billion something.

    Maybe that's what Estimated Row counts show ?

    To me if without DISTINCT query returns 20 billion records from 80 million tables,

    the JOINS are incorrect (incomplete)..

  • Added DDL for two problematical tables...

    S_CLIENT_SHARED

    S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

  • RVO (11/13/2015)


    Thanks Luis.

    Without DISTINCT, row count is actually 20 billion something.

    Maybe that's what Estimated Row counts show ?

    To me if without DISTINCT query returns 20 billion records from 80 million tables,

    the JOINS are incorrect (incomplete)..

    There's no distinct in the plan that you attached. The distinct is in the query that you posted after the execution plan and you didn't post an execution plan for that query.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply