JOIN in date range slow

  • RVO (11/13/2015)


    Added DDL for two problematical tables...

    S_CLIENT_SHARED

    S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

    You're lacking indexes on those tables. If we're talking about millions of rows, that will definitely be a problem.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis,

    Indexes are there.

    I didn't notice DDL script did not generate index info..

    I re-attached new DDL

  • Hello Luis,

    "....No, I'm just saying that if it's a cross join, you should write it as such..."

    Do you mean you would lie to see

    SELECT .....

    FROM

    a

    CROSS JOIN

    b

    style ?

  • RVO (11/17/2015)


    Hello Luis,

    "....No, I'm just saying that if it's a cross join, you should write it as such..."

    Do you mean you would lie to see

    SELECT .....

    FROM

    a

    CROSS JOIN

    b

    style ?

    Yes, it's just personal preference and has no real impact on performance.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis,

    So what should I do about

    ". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."

  • You have a few places in your query where you apparently need trimming on join fields:

    UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER(LTRIM(RTRIM(FIS2_Acct_Lgl_Ent.MNEMONIC)))

    OR UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')

    In my limited experience this is really bad for performance.

    If the tables are under your control, you should ensure that the data in these fields is of such quality that you don't need to use trimming (or case conversion for that matter).

    Also, you might want to rewrite the WHERE XX IN(SELECT YY...) to WHERE EXISTS(SELECT ... WHERE XX=YY). That's often (but not always) more effective.

  • RVO (11/17/2015)


    Luis,

    So what should I do about

    ". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."

    If you see the images I included in a previous post, you can see that the estimated row counts and the actual row counts are very different. This can be caused by stale statistics, which usually cause the opposite effect (actual row count greater than estimated), or bad/uncommon queries that prevent the cardinality estimator to work correctly. Both issues will prevent an optimal execution of the query.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • kaj (11/17/2015)


    You have a few places in your query where you apparently need trimming on join fields:

    UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER(LTRIM(RTRIM(FIS2_Acct_Lgl_Ent.MNEMONIC)))

    OR UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')

    In my limited experience this is really bad for performance.

    If the tables are under your control, you should ensure that the data in these fields is of such quality that you don't need to use trimming (or case conversion for that matter).

    This is true, (almost) any functions on columns in WHERE clauses and JOIN predicates will prevent index seeks. However, trailing spaces won't affect comparisons.

    Also, you might want to rewrite the WHERE XX IN(SELECT YY...) to WHERE EXISTS(SELECT ... WHERE XX=YY). That's often (but not always) more effective.

    Check this: http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

    No real performance difference.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis,

    So what is the solution

    for

    The estimated row counts are way off

    UPDATE STATISTICS ?

  • RVO (11/17/2015)


    Luis,

    So what is the solution

    for

    The estimated row counts are way off

    UPDATE STATISTICS ?

    That should correct that issue, unless is code related. You still have to work on that query as the problems go beyond the statistics.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis,

    After I analyzed the next slow block (let's call it "GMS2 INSERT"),

    I shifted my focus to much bigger issue (in my opinion).

    It is very similar to my original query I posted above (let's call it "IDS INSERT").

    The pattern is the same.

    Just small variations (Source_Code, etc...)

    Look at the query.

    The output is:

    DISTINCT

    CLIENT_SHARED.SOURCE_CODE

    ,CLIENT_SHARED.CUSTOMER_NUMBER. . . .

    Without DISTINCT

    it returns around 25,000 records.

    With DISTINCT it is 127 rows !

    I digged in and found out that JOIN between

    S_CLIENT_SHARED

    and

    S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

    is not correct.

    It is only on 2 columns:

    SOURCE_CODE

    CUSTOMER_NUMBER

    It should be:

    SOURCE_CODE

    CUSTOMER_NUMBER

    BUS_PROC_DT

    I already tested.

    Instead on 36 min, now it runs 1.5 min

    and the output is identical to original query.

    The problem is SQL is generated dynamically from

    [FATCA_TDS.T_ETL_SP_SS_TRANSFORMER_DML] table

    based on very convoluted looping mechanism

    in SP_SS_TRANSFORMER stor proc.

    It is very risky to make the change.

    So far I found this pattern in 79 records of

    [FATCA_TDS.T_ETL_SP_SS_TRANSFORMER_DML] table.

    This would be my last option.

    I wonder though..

    This test I'm doing is in PAT environment.

    Why in PROD this original "GMS2 INSERT" piece of code runs only 2 minutes?

    In PAT - it runs 36 minutes!

    PAT and PROD data volumes are almost identical.

    We copied PROD to PAT

    two weeks ago.

    Does it have anything to do with Memory?

    PAT server: 16 GB RAM

    PROD server: 32 GB RAM

    CPU is same.

    Disk parameters are close .

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply