November 13, 2015 at 1:31 pm
RVO (11/13/2015)
Added DDL for two problematical tables...S_CLIENT_SHARED
S_CLIENT_ACCOUNT_ASSOCIATION_SHARED
You're lacking indexes on those tables. If we're talking about millions of rows, that will definitely be a problem.
November 16, 2015 at 9:20 am
Luis,
Indexes are there.
I didn't notice DDL script did not generate index info..
I re-attached new DDL
November 17, 2015 at 10:34 am
Hello Luis,
"....No, I'm just saying that if it's a cross join, you should write it as such..."
Do you mean you would lie to see
SELECT .....
FROM
a
CROSS JOIN
b
style ?
November 17, 2015 at 10:38 am
RVO (11/17/2015)
Hello Luis,"....No, I'm just saying that if it's a cross join, you should write it as such..."
Do you mean you would lie to see
SELECT .....
FROM
a
CROSS JOIN
b
style ?
Yes, it's just personal preference and has no real impact on performance.
November 17, 2015 at 11:07 am
Luis,
So what should I do about
". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."
November 17, 2015 at 11:14 am
You have a few places in your query where you apparently need trimming on join fields:
UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER(LTRIM(RTRIM(FIS2_Acct_Lgl_Ent.MNEMONIC)))
OR UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')
In my limited experience this is really bad for performance.
If the tables are under your control, you should ensure that the data in these fields is of such quality that you don't need to use trimming (or case conversion for that matter).
Also, you might want to rewrite the WHERE XX IN(SELECT YY...) to WHERE EXISTS(SELECT ... WHERE XX=YY). That's often (but not always) more effective.
November 17, 2015 at 11:43 am
RVO (11/17/2015)
Luis,So what should I do about
". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."
If you see the images I included in a previous post, you can see that the estimated row counts and the actual row counts are very different. This can be caused by stale statistics, which usually cause the opposite effect (actual row count greater than estimated), or bad/uncommon queries that prevent the cardinality estimator to work correctly. Both issues will prevent an optimal execution of the query.
November 17, 2015 at 11:53 am
kaj (11/17/2015)
You have a few places in your query where you apparently need trimming on join fields:
UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER(LTRIM(RTRIM(FIS2_Acct_Lgl_Ent.MNEMONIC)))
OR UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')
In my limited experience this is really bad for performance.
If the tables are under your control, you should ensure that the data in these fields is of such quality that you don't need to use trimming (or case conversion for that matter).
This is true, (almost) any functions on columns in WHERE clauses and JOIN predicates will prevent index seeks. However, trailing spaces won't affect comparisons.
Also, you might want to rewrite the WHERE XX IN(SELECT YY...) to WHERE EXISTS(SELECT ... WHERE XX=YY). That's often (but not always) more effective.
Check this: http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
No real performance difference.
November 17, 2015 at 11:53 am
Luis,
So what is the solution
for
The estimated row counts are way off
UPDATE STATISTICS ?
November 17, 2015 at 12:01 pm
RVO (11/17/2015)
Luis,So what is the solution
for
The estimated row counts are way off
UPDATE STATISTICS ?
That should correct that issue, unless is code related. You still have to work on that query as the problems go beyond the statistics.
November 17, 2015 at 12:25 pm
Luis,
After I analyzed the next slow block (let's call it "GMS2 INSERT"),
I shifted my focus to much bigger issue (in my opinion).
It is very similar to my original query I posted above (let's call it "IDS INSERT").
The pattern is the same.
Just small variations (Source_Code, etc...)
Look at the query.
The output is:
DISTINCT
CLIENT_SHARED.SOURCE_CODE
,CLIENT_SHARED.CUSTOMER_NUMBER. . . .
Without DISTINCT
it returns around 25,000 records.
With DISTINCT it is 127 rows !
I digged in and found out that JOIN between
S_CLIENT_SHARED
and
S_CLIENT_ACCOUNT_ASSOCIATION_SHARED
is not correct.
It is only on 2 columns:
SOURCE_CODE
CUSTOMER_NUMBER
It should be:
SOURCE_CODE
CUSTOMER_NUMBER
BUS_PROC_DT
I already tested.
Instead on 36 min, now it runs 1.5 min
and the output is identical to original query.
The problem is SQL is generated dynamically from
[FATCA_TDS.T_ETL_SP_SS_TRANSFORMER_DML] table
based on very convoluted looping mechanism
in SP_SS_TRANSFORMER stor proc.
It is very risky to make the change.
So far I found this pattern in 79 records of
[FATCA_TDS.T_ETL_SP_SS_TRANSFORMER_DML] table.
This would be my last option.
I wonder though..
This test I'm doing is in PAT environment.
Why in PROD this original "GMS2 INSERT" piece of code runs only 2 minutes?
In PAT - it runs 36 minutes!
PAT and PROD data volumes are almost identical.
We copied PROD to PAT
two weeks ago.
Does it have anything to do with Memory?
PAT server: 16 GB RAM
PROD server: 32 GB RAM
CPU is same.
Disk parameters are close .
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply