May 20, 2008 at 8:49 am
I am using following query to build a table. right now this query takes about 3 hours. which is way too long. I see that in the execution plan 73% of weight is assigned to a sort. 9% to parallelism distribute streams and 9% to parallelism gather streams.
I was wondering how can speed up this query based on this info?
currently there are no relationships defined or any constraints or indexes on either table in the selection.
select
TempProjectString,InfoProjID
INTOCPMRSProjectData
from (
select -- a.proj_id,
a.proj_id as TempProjectString
,b.invc_proj_id as InfoProjID
,rn = row_number() over (partition by a.proj_id order by len(a.proj_id) - len(b.invc_proj_id))
from proj_bill B
join TempData A
on charindex(reverse(b.invc_proj_id), reverse(a.proj_id)) > 0
) d
where rn = 1
go
May 20, 2008 at 8:52 am
Creating indexes would be a good start.
if possible simplify your join
May 20, 2008 at 10:52 am
Two things, get an index on the tables, especially a clustered index. Put that on the most likely access path, meaning, if they always select by PK, on the PK or if they always select by a foreign key and a date, use those two columns. It really depends on your app.
Second, check your parallelism threshold. That's an awfully simply query to go through all the process of distributing and gathering streams for parallel execution. If the threshold is set to a good number, you might experiment, on this query only, of trying the max degree of parallelism hint (MAXDOP).
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 20, 2008 at 10:56 am
Also, create the table first if you can. When you run a SELECT INTO statement to create a table, system resources are locked for the duration of the create statement. If you already know the schema, create the table using a CREATE TABLE statement, then load it with INSERT INTO....SELECT FROM construct.
May 20, 2008 at 12:15 pm
shahab (5/20/2008)
I am using following query to build a table. right now this query takes about 3 hours. which is way too long. I see that in the execution plan 73% of weight is assigned to a sort. 9% to parallelism distribute streams and 9% to parallelism gather streams.I was wondering how can speed up this query based on this info?
currently there are no relationships defined or any constraints or indexes on either table in the selection.
select
TempProjectString,InfoProjID
INTOCPMRSProjectData
from (
select -- a.proj_id,
a.proj_id as TempProjectString
,b.invc_proj_id as InfoProjID
,rn = row_number() over (partition by a.proj_id order by len(a.proj_id) - len(b.invc_proj_id))
from proj_bill B
join TempData A
on charindex(reverse(b.invc_proj_id), reverse(a.proj_id)) > 0
) d
where rn = 1
go
the derived table's row_number() calculation will cause every row in proj_bill to be scanned for every matching row in TempData so the query will be very slow and resource intensive.
since both terms of the charindex() are reversed, the reversals are redundant. since you're comparing charindex() to zero, that's the equivalent of like (which is probably better optimized internally than charindex()).
since you're selecting rn=1, it's somewhat arbitrary as to which invc_proj_id you'll receive if multiple invc_proj_id 's have the same length. so, selecting the min(invc_proj_id) from a group by should be equivalent (and more predictable). the group by greatly reduces the number of rows that are considered by row_number().
select
TempProjectString,InfoProjID
INTOCPMRSProjectData
from (
select a.proj_id as TempProjectString,
min(b.invc_proj_id) as InfoProjID,
-- len(a.proj_id) - len(b.invc_proj_id) as xxx,
row_number() over (partition by a.proj_id
order by len(a.proj_id) - len(b.invc_proj_id)) as rn
from
proj_bill B join TempData A
on a.proj_id like '%'+b.invc_proj_id+'%'
group by a.proj_id, len(a.proj_id) - len(b.invc_proj_id)
) d
where d.rn = 1
May 20, 2008 at 6:38 pm
I'm pretty sure indexing won't do much good on this because of all the formulas in the join and the Row_Number partition/order by.
Why must it be ordered by the LEN?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply