June 22, 2020 at 4:27 am
Hi ,
Below is the query which has two tables, based on the business requirement , have written the query as below. but it is taking 30 mins to complete the query. There is no Insert or Update or Delete happen when the query is executed. This query is executed from SSIS package.Is there a better way to write the query ? or is the below query is fine.
When in the final query when C.columnid is removed then i see it run in 3 mins, but where as it is needed based on business requirement.
Below are the number if records in each table, TableA has 200000 and TableB has 200000000.
Select Count(*) from [TableA] --> 200000
Select Count(*) from [TableB]--> 200000000
;WITH CTETableA AS
(
SELECT DISTINCT
[ColumnID]
FROM TableA
WHERE [ColumnID] IS NOT NULL
AND [Date] >= <Day before yesterday date>
AND [Date] < <Yesterday date>
)
,CTETableB AS
(
SELECT DISTINCT
[ColumnID] --- Int
,[Column_S] -- Varchar(20)
FROM TableB
WHERE [Date] >= <Day before yesterday date>
AND [Date] < <Yesterday date>
)
SELECT
b.[ColumnID] --Int
,s.[Column_var]-- Varchar(25)
,s.[Column_int]-- Int
,s.[Date] --- Datetime
,b.[Column_Var]-- Varchar(20)
,b.[ColumnTime] --- Time
,b.[Column_S] -- Varchar(20)
,b.[Column_float] -- Float
FROM TableB b
LEFT JOIN CTETableB c ON c.[ColumnID] = b.[ColumnID]
LEFT JOIN TableA d ON d.[ColumnID] = b.[ColumnID] AND d.[Column_S] = b.[Column_S]
LEFT JOIN CTETableA s ON s.[ColumnID] = b.[ColumnID]
WHERE ((b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>) OR c.[ColumnID] IS NOT NULL OR d.[ColumnID] IS NOT NULL)
June 22, 2020 at 8:14 am
Giving sample sql like this may lead to errors - and for this you may need to supply us with a full explain plan (both slow and fast).
we will also need to know which indexes these tables have
but I may be missing something but it seems that, at least the way you posted, CTETableB is not required at all and should be removed.
regardless - this seems like a typical case of "if master has changed or child has changed" extract data - that being the case doing a union of 2 queries would normally give better results - first query detects changes to master, second detects changes to child.
June 22, 2020 at 11:06 am
CTETableB is needed as we are having the not null condition [OR c.[ColumnID] IS NOT NULL OR d.[ColumnID] IS NOT NULL] in the query.
TableA [ColumnID] is clustered index and Date with non-cluster index, non- unique.
TableB [ColumnID] is clustered index and Date with non-cluster index , non- unique.
Execution plan- Don't have permission to execute.
How could Union would help us here , in which way we could represent it to achieve?. Again thanks for your time and help.
June 22, 2020 at 12:35 pm
according to the code you posted it does not seem you need that cte - and without your full code we don't know if you missed something on your post or if it accurately represents the full query joins and criteria.
,CTETableB AS
(
SELECT DISTINCT
[ColumnID] --- Int
,[Column_S] -- Varchar(20)
FROM TableB
WHERE [Date] >= <Day before yesterday date>
AND [Date] < <Yesterday date>
)
select *
from TableB b
LEFT JOIN CTETableB c ON c.[ColumnID] = b.[ColumnID]
where (b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>) OR c.[ColumnID] IS NOT NULL
in both CTETableB and the main select you are filtering on the same field and criteria so having the cte with a left outer really equates to
select *
from TableB b
where (b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>)
how ever having the cte as is may even be giving you duplicate records if there are multilple records on TableB with same ColumnID so I would check that
Regarding the union
-- first select records where TableB date field meets the criteria
SELECT
b.[ColumnID] --Int
,s.[Column_var]-- Varchar(25)
,s.[Column_int]-- Int
,s.[Date] --- Datetime
,b.[Column_Var]-- Varchar(20)
,b.[ColumnTime] --- Time
,b.[Column_S] -- Varchar(20)
,b.[Column_float] -- Float
FROM TableB b
LEFT JOIN TableA s ON s.[ColumnID] = b.[ColumnID] and (s.[Date] >= <Day before yesterday date> AND s.[Date] < <Yesterday date>) -- I wonder if this was correct in the first place
WHERE (b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>)
union -- remove duplicate entries - depending on the data this may need to be done slightly differently using rownumber
-- second retrieve records from TableB where there is a match on TableA where date meets the criteria and Column_S has a match
SELECT
b.[ColumnID] --Int
,s.[Column_var]-- Varchar(25)
,s.[Column_int]-- Int
,s.[Date] --- Datetime
,b.[Column_Var]-- Varchar(20)
,b.[ColumnTime] --- Time
,b.[Column_S] -- Varchar(20)
,b.[Column_float] -- Float
FROM TableB b
inner JOIN TableA s ON s.[ColumnID] = b.[ColumnID] AND d.[Column_S] = b.[Column_S]
and (s.[Date] >= <Day before yesterday date> AND s.[Date] < <Yesterday date>) -- TableA filtering
An index on table A with Date, column_S and ColumnID required - potentially with include of column_var and columnn_int
An index on table B with Date, Column_S, ColumnID required - potentially with remaining columns required to cover the select
as we don't know your data neither your true table and query the above is just a possible approach - may be incorrect but only you will be able to determine that.
June 24, 2020 at 6:57 am
Once again thanks for your time.
Yes you are correct on the observation on
"how ever having the cte as is may even be giving you duplicate records if there are multiple records on TableB with same ColumnID so I would check that"
you could see in my code that i have added c.[ColumnID] IS NOT NULL or d.[ColumnID] IS NOT NULL ,which will fetch the record for the whole day when left join is there
Here the query is based on datewith timestamp as well, say if it is day before yesterday(22-06-2020 2:00:05:056) till yesterday(23-06-202 3:34:34:978) , so in the first cte we would be getting set of records say 100 , in second cte we are getting some 1000 records for the dates mentioned, while doing the left join we would be getting the additional rows
cte1
columnid, date
1, '22-06-2020 2:00:05:056'
2, '22-06-2020 5:00:08:232'
1, '22-06-2020 1:12:05:344'
cte2
columnid, date
1, '22-06-2020 7:13:12:234'
2, '22-06-2020 4:11:21:213'
3, '22-06-2020 3:12:08:784'
so while doing this we would be ending with 3 records with left join with [ColumnID] IS NOT NULL and not with 2 records, columnid 1 will be coming 2 time and columnid 2 will be coming 1 time, so total 3 records.
;With CTETableB AS
(
SELECT DISTINCT
[ColumnID] --- Int
,[Column_S] -- Varchar(20)
FROM TableB
WHERE [Date] >= <Day before yesterday date>
AND [Date] < <Yesterday date>
)
select *
from TableB b
LEFT JOIN CTETableB c ON c.[ColumnID] = b.[ColumnID]
where (b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>) OR c.[ColumnID] IS NOT NULL
in both CTETableB and the main select you are filtering on the same field and criteria so having the cte with a left outer really equates to
select *
from TableB b
where (b.[Date] >= <Day before yesterday date> AND b.[Date] < <Yesterday date>)
how ever having the cte as is may even be giving you duplicate records if there are multilple records on TableB with same ColumnID so I would check that
July 23, 2020 at 7:39 am
Any suggestions on the above, still the issues persists. the time consumed here is more. Below is the reads and writes and time taken for the query to complete.
Run time Reads Writes
20min 3,596,149 97
45min 73,412,456 378,279
48min 198,270,103 1,558,015
22min 202,130,607 1,303,290
34min 143,565,802 1,123,014
25min 15,137,118 2,233
July 23, 2020 at 9:01 am
The query is a mess.
its difficult to understand the requirements from the code because it lacks of logic, not to mention some constructions in the query eliminate others.
My guess would be you want actually full outer join of 2sets fot the specific date.
you definitely don't need scan each table twice for that.
Just include all required columns into CTE's and then join these CTE's and only them:
Hate this framework.
15 mins reworking the code from a mobile device - and it drops everything at once
So, try following my suggestions yourself, post your code - we'll adjust it if something wrong
_____________
Code for TallyGenerator
July 23, 2020 at 9:07 am
2 more things:
- forget the word "DISTINCT". Use Group by when needed (I bet you don't need it in your query);
- if you have an index rebuilding routine, use that window of opportunity to rebuild the current wrong indexing into correct one:
ColumnID -PK, NON-clustered
[Date] - clustered
this willl bring the response time of your query down to several seconds.
_____________
Code for TallyGenerator
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply