JOIN in date range slow

Question

Post reply

JOIN in date range slow

RVO

SSCertifiable

Points: 5495
More actions
November 12, 2015 at 1:10 pm

#399319

JOIN on date range is very slow..
Look at the bottom where it's using ">=" and "<="
FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED
and
FATCA_STG.S_CLIENT_SHARED CLIENT_SHARED
tables have about 80,000,000 records
Other tables are tiny.
SELECT ASSOCIATION.SOURCE_CODE, ASSOCIATION.CUSTOMER_NUMBER
--INTO #COMMON_I_SS_ARRANGEMENT_LOCATION_ADDR_IDS_CURRENT_VALID_ARRANGEMENT
FROM (SELECT MAX(DATA_SET.BUS_PROC_DATE) AS BUS_PROC_DATE FROM [FATCA_TDS].[I_DATA_SET] DATA_SET
WHERE [IS_ANNUAL]='Y' AND DATA_SET.BUS_PROC_DATE <= '2015-11-02')
LAST_ANNUAL
JOIN (SELECT ENUM_REF.MNEMONIC FROM [FATCA_STG].[R_SOURCE_SYSTEM_DEFAULT] SYSTEM_DEFAULT
JOIN FATCA_STG.R_ENUM_REF ENUM_REF ON ( ENUM_REF.SET_NAME = 'LEGAL ENTITY'
AND SYSTEM_DEFAULT.SET_NAME = 'ACCT-LGL-ENT' AND ENUM_REF.CODE = SYSTEM_DEFAULT.[STR_VALUE] )
WHERE SYSTEM_DEFAULT.SOURCE_CODE='IDS')
IDS_LGL_ENT ON (1=1)
join FATCA_STG.S_CLIENT_SHARED CLIENT_SHARED
on ( CLIENT_SHARED.SOURCE_CODE = 'IDS'
AND CLIENT_SHARED.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND CLIENT_SHARED.BUS_PROC_DT<='2015-11-02'
)
JOIN
FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED ASSOCIATION
on ( ASSOCIATION.SOURCE_CODE = CLIENT_SHARED.SOURCE_CODE
AND ASSOCIATION.CUSTOMER_NUMBER = CLIENT_SHARED.CUSTOMER_NUMBER
AND ASSOCIATION.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE
AND ASSOCIATION.BUS_PROC_DT <= '2015-11-02'
)

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply

g.britton SSChampion Points: 13863 More actions · Answer 1

ON (1=1)

makes the join a cross-join

Got an index on the date column? Otherwise you'll get a table scan, I think. Check the exec plan

Gerald Britton, Pluralsight courses

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

You might be a victim of triangular joins. http://www.sqlservercentral.com/articles/T-SQL/61539/

Can you post the information needed as explained in this article: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

RVO SSCertifiable Points: 5495 More actions · Answer 3

Gerald. Thanks for your reply.

FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED indexes:

------------------------------------------------------------------

SOURCE_CODE, BUS_PROC_DT, ACCOUNT_IDENTIFIER

SOURCE_CODE, BUS_PROC_DT, CUSTOMER_NUMBER

BUS_PROC_DT

FATCA_STG.S_CLIENT_SHARED indexes:

------------------------------------------------------------------

SOURCE_CODE, BUS_PROC_DT, CUSTOMER_NUMBER

ID

At the beginning that (1=1) drew my attention as well

but query slows down only when you

JOIN

FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

Without this join it run 1 sec.

RVO SSCertifiable Points: 5495 More actions · Answer 4

Luis,

I attached Execution Plan.

The rest I covered already.

S_CLIENT_SHARED

S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

have 80 million records.

Other tables are very small.

Without JOIN with S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

sql runs 1 second.

With this JOIN it takes 20 seconds

RVO SSCertifiable Points: 5495 More actions · Answer 5

Folks,

I just discovered an interesting thing.

The original query uses DISTINCT.

SELECT DISTINCT

ASSOCIATION.SOURCE_CODE, ASSOCIATION.CUSTOMER_NUMBER

. . . . . . .

Without DISTINCT

it returns 1.8 billion records !!

With DISTINCT - it returns 236,000 rows.

I have to mention. This is not my code.

I was assigned to support this ETL application.

But the more I look at the code the more I hate it.

Recursive stored procedures ,

nested Cursors inside of which SQL is being built dynamically . . . yak

You can't even understand which SQL is being executed...

RVO SSCertifiable Points: 5495 More actions · Answer 6

Original query:

SELECT DISTINCT

ACCOUNT_SHARED.SOURCE_CODE

,ACCOUNT_SHARED.ACCOUNT_IDENTIFIER

INTO #COMMON_I_SS_ARRANGEMENT_LOCATION_ADDR_IDS_CURRENT_VALID_ARRANGEMENT

FROM

(SELECT MAX(DATA_SET.BUS_PROC_DATE) AS BUS_PROC_DATE

FROM [FATCA_TDS].[I_DATA_SET] DATA_SET WHERE [IS_ANNUAL]='Y'

AND DATA_SET.BUS_PROC_DATE <= <P_BUS_PROC_DT>)

LAST_ANNUAL

JOIN

(SELECT ENUM_REF.MNEMONIC

FROM [FATCA_STG].[R_SOURCE_SYSTEM_DEFAULT] SYSTEM_DEFAULT

JOIN FATCA_STG.R_ENUM_REF ENUM_REF ON

( ENUM_REF.SET_NAME = 'LEGAL ENTITY'

AND SYSTEM_DEFAULT.SET_NAME = 'ACCT-LGL-ENT'

AND ENUM_REF.CODE = SYSTEM_DEFAULT.[STR_VALUE] )

WHERE SYSTEM_DEFAULT.SOURCE_CODE='IDS')

IDS_LGL_ENT

ON (1=1)

JOIN FATCA_STG.S_CLIENT_SHARED CLIENT_SHARED

ON ( CLIENT_SHARED.SOURCE_CODE = <V_SOURCE_CODE>

AND CLIENT_SHARED.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND CLIENT_SHARED.BUS_PROC_DT<=<P_BUS_PROC_DT>

)

JOIN FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED ASSOCIATION

ON ( ASSOCIATION.SOURCE_CODE = CLIENT_SHARED.SOURCE_CODE

AND ASSOCIATION.CUSTOMER_NUMBER = CLIENT_SHARED.CUSTOMER_NUMBER

AND ASSOCIATION.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND ASSOCIATION.BUS_PROC_DT<=<P_BUS_PROC_DT>

)

JOIN FATCA_STG.S_ACCOUNT_SHARED ACCOUNT_SHARED

ON ( ACCOUNT_SHARED.SOURCE_CODE = ASSOCIATION.SOURCE_CODE

AND ACCOUNT_SHARED.ACCOUNT_IDENTIFIER = ASSOCIATION.ACCOUNT_IDENTIFIER

AND NOT (ACCOUNT_SHARED.SOURCE_CODE IN ('GMS2','IDP','FCA','IDS') AND ACCOUNT_SHARED.ACCOUNT_STATUS IN ('PENDING'))

AND ACCOUNT_SHARED.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND ACCOUNT_SHARED.BUS_PROC_DT<=<P_BUS_PROC_DT>

)

JOIN FATCA_STG.S_ACCOUNT_SHARED FIS2_ACCOUNT_SHARED

ON ( FIS2_ACCOUNT_SHARED.SOURCE_CODE = 'FIS2'

AND FIS2_ACCOUNT_SHARED.ACCOUNT_NUMBER = ACCOUNT_SHARED.ACCOUNT_IDENTIFIER

AND FIS2_ACCOUNT_SHARED.BUS_PROC_DT = <P_BUS_PROC_DT>

)

JOIN FATCA_STG.R_ENUM_REF FIS2_ACCT_LGL_ENT

ON ( FIS2_ACCT_LGL_ENT.SET_NAME='LEGAL ENTITY'

AND FIS2_ACCT_LGL_ENT.DESCRIPTION IN (SELECT [FILTER_STRING_VALUE] FROM [FATCA_TDS].[T_ETL_SP_SS_TRANSFORMER_EXTRA_FILTERS] WHERE [FILTER_NAME] = 'ACCEPTED LEGAL ENTITY' AND [INCLUDE_EXCLUDE]='INCLUDE')

AND ( UPPER(LTRIM(RTRIM(FIS2_ACCOUNT_SHARED.TD_LEGAL_ENTITY))) = UPPER(LTRIM(RTRIM(FIS2_ACCT_LGL_ENT.MNEMONIC)))

OR UPPER(LTRIM(RTRIM(FIS2_ACCOUNT_SHARED.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')

)

JOIN FATCA_STG.R_ENUM_REF ACCT_LGL_ENT

ON ( ACCT_LGL_ENT.SET_NAME='LEGAL ENTITY'

AND ACCT_LGL_ENT.DESCRIPTION IN (SELECT [FILTER_STRING_VALUE] FROM [FATCA_TDS].[T_ETL_SP_SS_TRANSFORMER_EXTRA_FILTERS] WHERE [FILTER_NAME] = 'ACCEPTED LEGAL ENTITY' AND [INCLUDE_EXCLUDE]='INCLUDE')

AND ( UPPER(LTRIM(RTRIM(CASE WHEN ACCOUNT_SHARED.SOURCE_CODE='IDS' THEN IDS_LGL_ENT.MNEMONIC ELSE ACCOUNT_SHARED.TD_LEGAL_ENTITY END))) = UPPER(LTRIM(RTRIM(ACCT_LGL_ENT.MNEMONIC)))

OR UPPER(LTRIM(RTRIM(ACCOUNT_SHARED.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')

)

RVO SSCertifiable Points: 5495 More actions · Answer 7

regarding

ON (1=1)

after all maybe it's not a big deal because LAST_ANNUAL table

is only one row (always).

so it's not really a CROSS JOIN.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 8

It seems that you had some fun working on this query and you'll have more ahead. 😀

I'd specify the cross join explicitly, just to let clear that I know what I'm doing.

The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.

I'd help you but I'm on my own nightmare right now.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

RVO SSCertifiable Points: 5495 More actions · Answer 9

Thanks Luis.

So you insist CROSS JOIN is a bottleneck.. I guess.

I actually did a test.

I removed JOIN ON (1=1).

But the execution time did not change at all...

Luis,

I didn't understand this line:

". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."

What does it mean ?

Luis Cazares SSC Guru Points: 183706 More actions · Answer 10

RVO (11/13/2015)
So you insist CROSS JOIN is a bottleneck.. I guess.
I actually did a test.
I removed JOIN ON (1=1).
But the execution time did not change at all...

No, I'm just saying that if it's a cross join, you should write it as such. It's basically formatting, but I like to see it as good coding practices.

Luis,
I didn't understand this line:
". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."
What does it mean ?

These images from the posted execution plan should show it.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Ed Wagner SSC Guru Points: 287026 More actions · Answer 11

First of all, please tell us that all your columns in your join predicates are of the same data type. Implicit conversion can kill the performance of any query.

You also have several join predicates that use (according to the column names) date columns with a <= or >= comparison against either a literal or another date column. I hope those are of the same datatypes as well. Also, like Gerald asked above, do you have an index on those columns? In fact, I would check for covering indexes on each table, the columns in your join predicates should be the key columns and the returned columns should be the included columns.

You have several different parts to your query. Can you isolate each one in a "divide-and-conquer" approach? This would let you determine the best way to query each one. It might also make it easier to see where you needs indexes to support your joins.

Your 1.8B rows versus your 236K rows with the distinct is probably due to a cross join. Whether or not this is accidental is something you have to answer from your knowledge of the process and your expected output. There are times when cross joins are appropriate, but in other places they can get you into trouble.

Lastly, regarding your question about estimated rows, they're determined by the statistics. If your estimated rows is too far off from your actual rows, then your statistics are probably out of date.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

RVO SSCertifiable Points: 5495 More actions · Answer 12

Thanks Luis.

Without DISTINCT, row count is actually 20 billion something.

Maybe that's what Estimated Row counts show ?

To me if without DISTINCT query returns 20 billion records from 80 million tables,

the JOINS are incorrect (incomplete)..

RVO SSCertifiable Points: 5495 More actions · Answer 13

Added DDL for two problematical tables...

S_CLIENT_SHARED

S_CLIENT_ACCOUNT_ASSOCIATION_SHARED

Luis Cazares SSC Guru Points: 183706 More actions · Answer 14

RVO (11/13/2015)
Thanks Luis.
Without DISTINCT, row count is actually 20 billion something.
Maybe that's what Estimated Row counts show ?
To me if without DISTINCT query returns 20 billion records from 80 million tables,
the JOINS are incorrect (incomplete)..

There's no distinct in the plan that you attached. The distinct is in the query that you posted after the execution plan and you didn't post an execution plan for that query.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2