November 12, 2015 at 1:10 pm
JOIN on date range is very slow..
Look at the bottom where it's using ">=" and "<="
FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED
and
FATCA_STG.S_CLIENT_SHARED CLIENT_SHARED
tables have about 80,000,000 records
Other tables are tiny.
SELECT ASSOCIATION.SOURCE_CODE, ASSOCIATION.CUSTOMER_NUMBER
--INTO #COMMON_I_SS_ARRANGEMENT_LOCATION_ADDR_IDS_CURRENT_VALID_ARRANGEMENT
FROM (SELECT MAX(DATA_SET.BUS_PROC_DATE) AS BUS_PROC_DATE FROM [FATCA_TDS].[I_DATA_SET] DATA_SET
WHERE [IS_ANNUAL]='Y' AND DATA_SET.BUS_PROC_DATE <= '2015-11-02')
LAST_ANNUAL
JOIN (SELECT ENUM_REF.MNEMONIC FROM [FATCA_STG].[R_SOURCE_SYSTEM_DEFAULT] SYSTEM_DEFAULT
JOIN FATCA_STG.R_ENUM_REF ENUM_REF ON ( ENUM_REF.SET_NAME = 'LEGAL ENTITY'
AND SYSTEM_DEFAULT.SET_NAME = 'ACCT-LGL-ENT' AND ENUM_REF.CODE = SYSTEM_DEFAULT.[STR_VALUE] )
WHERE SYSTEM_DEFAULT.SOURCE_CODE='IDS')
IDS_LGL_ENT ON (1=1)
join FATCA_STG.S_CLIENT_SHARED CLIENT_SHARED
on ( CLIENT_SHARED.SOURCE_CODE = 'IDS'
AND CLIENT_SHARED.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND CLIENT_SHARED.BUS_PROC_DT<='2015-11-02'
)
JOIN
FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED ASSOCIATION
on ( ASSOCIATION.SOURCE_CODE = CLIENT_SHARED.SOURCE_CODE
AND ASSOCIATION.CUSTOMER_NUMBER = CLIENT_SHARED.CUSTOMER_NUMBER
AND ASSOCIATION.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE
AND ASSOCIATION.BUS_PROC_DT <= '2015-11-02'
)
November 12, 2015 at 1:29 pm
ON (1=1)
makes the join a cross-join
Got an index on the date column? Otherwise you'll get a table scan, I think. Check the exec plan
Gerald Britton, Pluralsight courses
November 12, 2015 at 1:41 pm
You might be a victim of triangular joins. http://www.sqlservercentral.com/articles/T-SQL/61539/
Can you post the information needed as explained in this article: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
November 12, 2015 at 5:06 pm
Gerald. Thanks for your reply.
FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED indexes:
------------------------------------------------------------------
SOURCE_CODE, BUS_PROC_DT, ACCOUNT_IDENTIFIER
SOURCE_CODE, BUS_PROC_DT, CUSTOMER_NUMBER
BUS_PROC_DT
FATCA_STG.S_CLIENT_SHARED indexes:
------------------------------------------------------------------
SOURCE_CODE, BUS_PROC_DT, CUSTOMER_NUMBER
ID
At the beginning that (1=1) drew my attention as well
but query slows down only when you
JOIN
FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED
Without this join it run 1 sec.
November 12, 2015 at 5:16 pm
Luis,
I attached Execution Plan.
The rest I covered already.
S_CLIENT_SHARED
S_CLIENT_ACCOUNT_ASSOCIATION_SHARED
have 80 million records.
Other tables are very small.
Without JOIN with S_CLIENT_ACCOUNT_ASSOCIATION_SHARED
sql runs 1 second.
With this JOIN it takes 20 seconds
November 12, 2015 at 5:33 pm
Folks,
I just discovered an interesting thing.
The original query uses DISTINCT.
SELECT DISTINCT
ASSOCIATION.SOURCE_CODE, ASSOCIATION.CUSTOMER_NUMBER
. . . . . . .
Without DISTINCT
it returns 1.8 billion records !!
With DISTINCT - it returns 236,000 rows.
I have to mention. This is not my code.
I was assigned to support this ETL application.
But the more I look at the code the more I hate it.
Recursive stored procedures ,
nested Cursors inside of which SQL is being built dynamically . . . yak
You can't even understand which SQL is being executed...
November 12, 2015 at 5:53 pm
Original query:
SELECT DISTINCT
ACCOUNT_SHARED.SOURCE_CODE
,ACCOUNT_SHARED.ACCOUNT_IDENTIFIER
INTO #COMMON_I_SS_ARRANGEMENT_LOCATION_ADDR_IDS_CURRENT_VALID_ARRANGEMENT
FROM
(SELECT MAX(DATA_SET.BUS_PROC_DATE) AS BUS_PROC_DATE
FROM [FATCA_TDS].[I_DATA_SET] DATA_SET WHERE [IS_ANNUAL]='Y'
AND DATA_SET.BUS_PROC_DATE <= <P_BUS_PROC_DT>)
LAST_ANNUAL
JOIN
(SELECT ENUM_REF.MNEMONIC
FROM [FATCA_STG].[R_SOURCE_SYSTEM_DEFAULT] SYSTEM_DEFAULT
JOIN FATCA_STG.R_ENUM_REF ENUM_REF ON
( ENUM_REF.SET_NAME = 'LEGAL ENTITY'
AND SYSTEM_DEFAULT.SET_NAME = 'ACCT-LGL-ENT'
AND ENUM_REF.CODE = SYSTEM_DEFAULT.[STR_VALUE] )
WHERE SYSTEM_DEFAULT.SOURCE_CODE='IDS')
IDS_LGL_ENT
ON (1=1)
JOIN FATCA_STG.S_CLIENT_SHARED CLIENT_SHARED
ON ( CLIENT_SHARED.SOURCE_CODE = <V_SOURCE_CODE>
AND CLIENT_SHARED.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND CLIENT_SHARED.BUS_PROC_DT<=<P_BUS_PROC_DT>
)
JOIN FATCA_STG.S_CLIENT_ACCOUNT_ASSOCIATION_SHARED ASSOCIATION
ON ( ASSOCIATION.SOURCE_CODE = CLIENT_SHARED.SOURCE_CODE
AND ASSOCIATION.CUSTOMER_NUMBER = CLIENT_SHARED.CUSTOMER_NUMBER
AND ASSOCIATION.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND ASSOCIATION.BUS_PROC_DT<=<P_BUS_PROC_DT>
)
JOIN FATCA_STG.S_ACCOUNT_SHARED ACCOUNT_SHARED
ON ( ACCOUNT_SHARED.SOURCE_CODE = ASSOCIATION.SOURCE_CODE
AND ACCOUNT_SHARED.ACCOUNT_IDENTIFIER = ASSOCIATION.ACCOUNT_IDENTIFIER
AND NOT (ACCOUNT_SHARED.SOURCE_CODE IN ('GMS2','IDP','FCA','IDS') AND ACCOUNT_SHARED.ACCOUNT_STATUS IN ('PENDING'))
AND ACCOUNT_SHARED.BUS_PROC_DT >= LAST_ANNUAL.BUS_PROC_DATE AND ACCOUNT_SHARED.BUS_PROC_DT<=<P_BUS_PROC_DT>
)
JOIN FATCA_STG.S_ACCOUNT_SHARED FIS2_ACCOUNT_SHARED
ON ( FIS2_ACCOUNT_SHARED.SOURCE_CODE = 'FIS2'
AND FIS2_ACCOUNT_SHARED.ACCOUNT_NUMBER = ACCOUNT_SHARED.ACCOUNT_IDENTIFIER
AND FIS2_ACCOUNT_SHARED.BUS_PROC_DT = <P_BUS_PROC_DT>
)
JOIN FATCA_STG.R_ENUM_REF FIS2_ACCT_LGL_ENT
ON ( FIS2_ACCT_LGL_ENT.SET_NAME='LEGAL ENTITY'
AND FIS2_ACCT_LGL_ENT.DESCRIPTION IN (SELECT [FILTER_STRING_VALUE] FROM [FATCA_TDS].[T_ETL_SP_SS_TRANSFORMER_EXTRA_FILTERS] WHERE [FILTER_NAME] = 'ACCEPTED LEGAL ENTITY' AND [INCLUDE_EXCLUDE]='INCLUDE')
AND ( UPPER(LTRIM(RTRIM(FIS2_ACCOUNT_SHARED.TD_LEGAL_ENTITY))) = UPPER(LTRIM(RTRIM(FIS2_ACCT_LGL_ENT.MNEMONIC)))
OR UPPER(LTRIM(RTRIM(FIS2_ACCOUNT_SHARED.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')
)
)
JOIN FATCA_STG.R_ENUM_REF ACCT_LGL_ENT
ON ( ACCT_LGL_ENT.SET_NAME='LEGAL ENTITY'
AND ACCT_LGL_ENT.DESCRIPTION IN (SELECT [FILTER_STRING_VALUE] FROM [FATCA_TDS].[T_ETL_SP_SS_TRANSFORMER_EXTRA_FILTERS] WHERE [FILTER_NAME] = 'ACCEPTED LEGAL ENTITY' AND [INCLUDE_EXCLUDE]='INCLUDE')
AND ( UPPER(LTRIM(RTRIM(CASE WHEN ACCOUNT_SHARED.SOURCE_CODE='IDS' THEN IDS_LGL_ENT.MNEMONIC ELSE ACCOUNT_SHARED.TD_LEGAL_ENTITY END))) = UPPER(LTRIM(RTRIM(ACCT_LGL_ENT.MNEMONIC)))
OR UPPER(LTRIM(RTRIM(ACCOUNT_SHARED.TD_LEGAL_ENTITY))) = UPPER('TD SECURITIES CANADA')
)
)
November 12, 2015 at 6:11 pm
regarding
ON (1=1)
after all maybe it's not a big deal because LAST_ANNUAL table
is only one row (always).
so it's not really a CROSS JOIN.
November 13, 2015 at 11:35 am
It seems that you had some fun working on this query and you'll have more ahead. 😀
I'd specify the cross join explicitly, just to let clear that I know what I'm doing.
The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.
I'd help you but I'm on my own nightmare right now.
November 13, 2015 at 11:55 am
Thanks Luis.
So you insist CROSS JOIN is a bottleneck.. I guess.
I actually did a test.
I removed JOIN ON (1=1).
But the execution time did not change at all...
Luis,
I didn't understand this line:
". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."
What does it mean ?
November 13, 2015 at 12:12 pm
RVO (11/13/2015)
So you insist CROSS JOIN is a bottleneck.. I guess.I actually did a test.
I removed JOIN ON (1=1).
But the execution time did not change at all...
No, I'm just saying that if it's a cross join, you should write it as such. It's basically formatting, but I like to see it as good coding practices.
Luis,
I didn't understand this line:
". . . The estimated row counts are way off, but I'm not sure if they're like that from the non-equi joins.."
What does it mean ?
These images from the posted execution plan should show it.
November 13, 2015 at 12:28 pm
First of all, please tell us that all your columns in your join predicates are of the same data type. Implicit conversion can kill the performance of any query.
You also have several join predicates that use (according to the column names) date columns with a <= or >= comparison against either a literal or another date column. I hope those are of the same datatypes as well. Also, like Gerald asked above, do you have an index on those columns? In fact, I would check for covering indexes on each table, the columns in your join predicates should be the key columns and the returned columns should be the included columns.
You have several different parts to your query. Can you isolate each one in a "divide-and-conquer" approach? This would let you determine the best way to query each one. It might also make it easier to see where you needs indexes to support your joins.
Your 1.8B rows versus your 236K rows with the distinct is probably due to a cross join. Whether or not this is accidental is something you have to answer from your knowledge of the process and your expected output. There are times when cross joins are appropriate, but in other places they can get you into trouble.
Lastly, regarding your question about estimated rows, they're determined by the statistics. If your estimated rows is too far off from your actual rows, then your statistics are probably out of date.
November 13, 2015 at 1:18 pm
Thanks Luis.
Without DISTINCT, row count is actually 20 billion something.
Maybe that's what Estimated Row counts show ?
To me if without DISTINCT query returns 20 billion records from 80 million tables,
the JOINS are incorrect (incomplete)..
November 13, 2015 at 1:26 pm
Added DDL for two problematical tables...
S_CLIENT_SHARED
S_CLIENT_ACCOUNT_ASSOCIATION_SHARED
November 13, 2015 at 1:26 pm
RVO (11/13/2015)
Thanks Luis.Without DISTINCT, row count is actually 20 billion something.
Maybe that's what Estimated Row counts show ?
To me if without DISTINCT query returns 20 billion records from 80 million tables,
the JOINS are incorrect (incomplete)..
There's no distinct in the plan that you attached. The distinct is in the query that you posted after the execution plan and you didn't post an execution plan for that query.
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply