JOIN in date range slow

Question

JOIN in date range slow

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183706 More actions · Answer 1

RVO (11/13/2015)
Added DDL for two problematical tables...
S_CLIENT_SHARED
S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

You're lacking indexes on those tables. If we're talking about millions of rows, that will definitely be a problem.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

RVO SSCertifiable Points: 5495 More actions · Answer 2

Luis,

Indexes are there.

I didn't notice DDL script did not generate index info..

I re-attached new DDL

RVO SSCertifiable Points: 5495 More actions · Answer 3

Hello Luis,

"....No, I'm just saying that if it's a cross join, you should write it as such..."

Do you mean you would lie to see

SELECT .....

FROM

a

CROSS JOIN

b

style ?

Luis Cazares SSC Guru Points: 183706 More actions · Answer 4

RVO (11/17/2015)
Hello Luis,
"....No, I'm just saying that if it's a cross join, you should write it as such..."
Do you mean you would lie to see
SELECT .....
FROM
a
CROSS JOIN
b
style ?

Yes, it's just personal preference and has no real impact on performance.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

RVO SSCertifiable Points: 5495 More actions · Answer 5

Luis,

So what should I do about

". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."

kaj SSCommitted Points: 1607 More actions · Answer 6

You have a few places in your query where you apparently need trimming on join fields:

UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER(LTRIM(RTRIM(FIS2_Acct_Lgl_Ent.MNEMONIC)))

OR UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')

In my limited experience this is really bad for performance.

If the tables are under your control, you should ensure that the data in these fields is of such quality that you don't need to use trimming (or case conversion for that matter).

Also, you might want to rewrite the WHERE XX IN(SELECT YY...) to WHERE EXISTS(SELECT ... WHERE XX=YY). That's often (but not always) more effective.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

RVO (11/17/2015)
Luis,
So what should I do about
". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."

If you see the images I included in a previous post, you can see that the estimated row counts and the actual row counts are very different. This can be caused by stale statistics, which usually cause the opposite effect (actual row count greater than estimated), or bad/uncommon queries that prevent the cardinality estimator to work correctly. Both issues will prevent an optimal execution of the query.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Luis Cazares SSC Guru Points: 183706 More actions · Answer 8

kaj (11/17/2015)
You have a few places in your query where you apparently need trimming on join fields:
UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER(LTRIM(RTRIM(FIS2_Acct_Lgl_Ent.MNEMONIC)))
OR UPPER(LTRIM(RTRIM(FIS2_Account_Shared.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')
In my limited experience this is really bad for performance.
If the tables are under your control, you should ensure that the data in these fields is of such quality that you don't need to use trimming (or case conversion for that matter).

This is true, (almost) any functions on columns in WHERE clauses and JOIN predicates will prevent index seeks. However, trailing spaces won't affect comparisons.

Also, you might want to rewrite the WHERE XX IN(SELECT YY...) to WHERE EXISTS(SELECT ... WHERE XX=YY). That's often (but not always) more effective.

Check this: http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

No real performance difference.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

RVO SSCertifiable Points: 5495 More actions · Answer 9

Luis,

So what is the solution

for

The estimated row counts are way off

UPDATE STATISTICS ?

Luis Cazares SSC Guru Points: 183706 More actions · Answer 10

RVO (11/17/2015)
Luis,
So what is the solution
for
The estimated row counts are way off
UPDATE STATISTICS ?

That should correct that issue, unless is code related. You still have to work on that query as the problems go beyond the statistics.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

RVO SSCertifiable Points: 5495 More actions · Answer 11

Luis,

After I analyzed the next slow block (let's call it "GMS2 INSERT"),

I shifted my focus to much bigger issue (in my opinion).

It is very similar to my original query I posted above (let's call it "IDS INSERT").

The pattern is the same.

Just small variations (Source_Code, etc...)

Look at the query.

The output is:

DISTINCT

CLIENT_SHARED.SOURCE_CODE

,CLIENT_SHARED.CUSTOMER_NUMBER. . . .

Without DISTINCT

it returns around 25,000 records.

With DISTINCT it is 127 rows !

I digged in and found out that JOIN between

S_CLIENT_SHARED

and

S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

is not correct.

It is only on 2 columns:

SOURCE_CODE

CUSTOMER_NUMBER

It should be:

SOURCE_CODE

CUSTOMER_NUMBER

BUS_PROC_DT

I already tested.

Instead on 36 min, now it runs 1.5 min

and the output is identical to original query.

The problem is SQL is generated dynamically from

[FATCA_TDS.T_ETL_SP_SS_TRANSFORMER_DML] table

based on very convoluted looping mechanism

in SP_SS_TRANSFORMER stor proc.

It is very risky to make the change.

So far I found this pattern in 79 records of

[FATCA_TDS.T_ETL_SP_SS_TRANSFORMER_DML] table.

This would be my last option.

I wonder though..

This test I'm doing is in PAT environment.

Why in PROD this original "GMS2 INSERT" piece of code runs only 2 minutes?

In PAT - it runs 36 minutes!

PAT and PROD data volumes are almost identical.

We copied PROD to PAT

two weeks ago.

Does it have anything to do with Memory?

PAT server: 16 GB RAM

PROD server: 32 GB RAM

CPU is same.

Disk parameters are close .