January 24, 2020 at 8:38 pm
Hi,
we had case today where query with index seek provided much worst performance compare to index scan.
original query using 40+ union and 5+ intersect ( it is machine build by java script..)
I used AdventureWorks2017 to recreate similar scenario
Step 1
create nonclustered index IDX_AG_001 on [Person].[Address] (AddressLine1) with(online =on)
Step 2
set statistics time on
SELECT
[AddressLine1]
FROM [Person].[Address]
where
[AddressLine1] like '6%e'
intersect
SELECT
[AddressLine1]
FROM [Person].[Address]
where
[AddressLine1] like '6%e'
intersect
SELECT
[AddressLine1]
FROM [Person].[Address]
where
[AddressLine1] like '6%e'
intersect
(SELECT
[AddressLine1]
FROM [Person].[Address]
where
[AddressLine1] like '6%e'
union
SELECT
[AddressLine1]
FROM [Person].[Address]
where
[AddressLine1] like '6%e'
)
set statistics time off
on my laptop it provide index seek with stat
SQL Server Execution Times: CPU time = 46 ms, elapsed time = 247 ms.
(original query was stopped after 15 minutes of execution)
Then application changed code to
Step 2
set statistics time on
SELECT
[AddressLine1]
FROM [Person].[Address]
where
left([AddressLine1],1) ='6'
and
right([AddressLine1],1) = 'e'
intersect
SELECT
[AddressLine1]
FROM [Person].[Address]
where
left([AddressLine1],1) ='6'
and
right([AddressLine1],1) = 'e'
intersect
SELECT
[AddressLine1]
FROM [Person].[Address]
where
left([AddressLine1],1) ='6'
and
right([AddressLine1],1) = 'e'
intersect
(SELECT
[AddressLine1]
FROM [Person].[Address]
where
left([AddressLine1],1) ='6'
and
right([AddressLine1],1) = 'e'
union
SELECT
[AddressLine1]
FROM [Person].[Address]
where
left([AddressLine1],1) ='6'
and
right([AddressLine1],1) = 'e'
)
set statistics time off
on my laptop it provide index scan with stat
SQL Server Execution Times: CPU time = 16 ms, elapsed time = 169 ms.
( original query executed 363 ms)
Question
Why second query runs faster ?
Thank you
January 25, 2020 at 9:10 pm
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
January 27, 2020 at 12:56 pm
It's not just a seek versus a scan issue here. There are a whole slew of differences between these two queries, even though it looks like the result sets are the same. If you look at the two plans, the second one is filtering information as it retrieves it and/or through the join operations. The first one on the other hand is running Filter operators to remove data after retrieving it.
So, despite the fact that one has more reads, caused by the scan, the extra CPU time for the Filter operations is resulting in a longer execution time.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 27, 2020 at 1:25 pm
Seek and Scan is not under your control unless you specifically instruct SQL Server Optimizer to do it using Hints. In the above case it has nothing to do with scan and seek because hints are nowhere mentioned. The difference in the Execution Time is due to Filter clause (WHERE clause). In the first statement you have done wildcard search using LIKE whereas in the second statement exact search with comparatively strong filter clause. In my opinion the second statement should perform better.
I'm wondering why you need so many UNION's and INTERSECTION's? I can see in each query only a single column ADDRESSLINE1 is used both in SELECT as well as WHERE. I think you can achieve the same result by avoiding UNION and INTERSECTION. Your existing query will cause overhead both in terms of Disk IO as well as CPU due to repeated task of scanning (Logical & Physical) the same dataset/pages again and again.
January 27, 2020 at 4:13 pm
Thanks everybody for help.
I was assigned to project to check why Batch processing takes 100% of CPU
Using query store I found that during batch processing job it calls from 50 to 200 + queries where only difference is number of unions and intersect operations, because of it each query get unique query_id
SELECT T0.product_pk
FROM products T0
WHERE(T0.product_pk IN
(
(
select
product01
from
customer as customer_7
WHERE((customer_7.p_validation LIKE 'productsssss/13333.0/%.kzpd10007666c'
OR customer_7.p_desc LIKE 'Dumbler/KZKTD/%.kzpd10007666c')
AND customer_7.p_condition = '8559317127113')
AND (customer_7.CodeP = @P0)
)
INTERSECT
(
select
product01
from
customer as customer_8
WHERE((customer_8.p_validation LIKE 'productsssss/13333.0/%.kzpd10007666a'
OR customer_8.p_desc LIKE 'Dumbler/KZKTD/%.kzpd10007666a')
AND customer_8.p_condition = '8559317127113')
AND (customer_8.CodeP = @P1)
)
INTERSECT
(
select
product01
from
customer as customer_9
WHERE((customer_9.p_validation LIKE 'productsssss/13333.0/%.kzpd10007666d'
OR customer_9.p_desc LIKE 'Dumbler/KZKTD/%.kzpd10007666d')
AND customer_9.p_condition = '8559317127113')
AND (customer_9.CodeP = @P2)
union
select
product01
from
customer as customer_9
WHERE((customer_7.p_validation LIKE 'productsssss/13333.0/%.kzpd10007666k'
OR customer_9.p_desc LIKE 'Dumbler/KZKTD/%.kzpd10007666k')
AND customer_9.p_condition = '8559317127113')
AND (customer_9.CodeP = @P3)
)
)
.. add another 150 unions ...
)
I was told told that code generated in application by Java class and people can't/want to change it ...
I run
select
count(1),
count(distinct(p_validation)),
count(distinct(product01)),
count(distinct(p_condition)),
count(distinct(CodeP))
from Customer
and got following result
Count(1) = 6885256
count(distinct(p_validation)) = 504100
count(distinct(product01)) = 15
count(distinct(p_condition)) =10
count(distinct(CodeP)) = 1
I created index bellow
CREATE NONCLUSTERED INDEX [IDX_customer_7_DBA_001] ON [dbo].[customer]([p_validation])INCLUDE ( [product01],[p_condition],[CodeP]) with (online =on)
index help to reduce CPU from 100% to 11% ,but job would continue to run from 5 to 12 + hrs
at this point I told developers that they need to consider code changes
one of the developers accessed java class and changed code use different where clause
WHERE
(
(
(LEFT(customer_8.p_validation,37) = 'Eproductsssss/13333.0/%.kzpd10007666a' and RIGHT(customer_8.p_validation,13) = 'kzpd10007666a')
or
(LEFT(customer_8.p_desc,14) = 'Dumbler/KZKTD/' and RIGHT(customer_8.p_desc,9) = 'kzpd10007666a')
)
AND customer_8.p_condition = '8796286059104'
)
AND (
customer_8.CodeP = @P0
)
... and job execution reduced from 12 hrs to 45 minutes
Modified query use read more data but run faster
As you point out in "the second one is filtering information as it retrieves it and/or through the join operations"
January 27, 2020 at 7:12 pm
All those functions on the columns, LEFT/RIGHT, that's horrible. Before we get to why all the INTERSECT & UNION from the generated code, that stuff is going to inflict pain. Then, 100+ UNION queries is also going to be problematic. Also, all the wild card searches in the middle of strings, same problem. This is seriously bad code. No index is going to fix it. The code needs to be adjusted and, just guessing, you probably have some table & design issues too.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply