tuning query

Question

Post reply

tuning query

sqlstar2011

SSCertifiable

Points: 6000
More actions
May 20, 2008 at 8:49 am

#187977

I am using following query to build a table. right now this query takes about 3 hours. which is way too long. I see that in the execution plan 73% of weight is assigned to a sort. 9% to parallelism distribute streams and 9% to parallelism gather streams.
I was wondering how can speed up this query based on this info?
currently there are no relationships defined or any constraints or indexes on either table in the selection.
select
TempProjectString,InfoProjID
INTOCPMRSProjectData
from (
select -- a.proj_id,
a.proj_id as TempProjectString
,b.invc_proj_id as InfoProjID
,rn = row_number() over (partition by a.proj_id order by len(a.proj_id) - len(b.invc_proj_id))
from proj_bill B
join TempData A
on charindex(reverse(b.invc_proj_id), reverse(a.proj_id)) > 0
) d
where rn = 1
go

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

steveb. SSC-Forever Points: 46733 More actions · Answer 1

Creating indexes would be a good start.

if possible simplify your join

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 2

Two things, get an index on the tables, especially a clustered index. Put that on the most likely access path, meaning, if they always select by PK, on the PK or if they always select by a foreign key and a date, use those two columns. It really depends on your app.

Second, check your parallelism threshold. That's an awfully simply query to go through all the process of distributing and gathering streams for parallel execution. If the threshold is set to a good number, you might experiment, on this query only, of trying the max degree of parallelism hint (MAXDOP).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

John Rowan SSC Guru Points: 56440 More actions · Answer 3

Also, create the table first if you can. When you run a SELECT INTO statement to create a table, system resources are locked for the duration of the create statement. If you already know the schema, create the table using a CREATE TABLE statement, then load it with INSERT INTO....SELECT FROM construct.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

antonio.collins SSCrazy Eights Points: 9600 More actions · Answer 4

shahab (5/20/2008)
I am using following query to build a table. right now this query takes about 3 hours. which is way too long. I see that in the execution plan 73% of weight is assigned to a sort. 9% to parallelism distribute streams and 9% to parallelism gather streams.
I was wondering how can speed up this query based on this info?
currently there are no relationships defined or any constraints or indexes on either table in the selection.
select
TempProjectString,InfoProjID
INTOCPMRSProjectData
from (
select -- a.proj_id,
a.proj_id as TempProjectString
,b.invc_proj_id as InfoProjID
,rn = row_number() over (partition by a.proj_id order by len(a.proj_id) - len(b.invc_proj_id))
from proj_bill B
join TempData A
on charindex(reverse(b.invc_proj_id), reverse(a.proj_id)) > 0
) d
where rn = 1
go

the derived table's row_number() calculation will cause every row in proj_bill to be scanned for every matching row in TempData so the query will be very slow and resource intensive.

since both terms of the charindex() are reversed, the reversals are redundant. since you're comparing charindex() to zero, that's the equivalent of like (which is probably better optimized internally than charindex()).

since you're selecting rn=1, it's somewhat arbitrary as to which invc_proj_id you'll receive if multiple invc_proj_id 's have the same length. so, selecting the min(invc_proj_id) from a group by should be equivalent (and more predictable). the group by greatly reduces the number of rows that are considered by row_number().

select

TempProjectString,InfoProjID

INTOCPMRSProjectData

from (

select a.proj_id as TempProjectString,

min(b.invc_proj_id) as InfoProjID,

-- len(a.proj_id) - len(b.invc_proj_id) as xxx,

row_number() over (partition by a.proj_id

order by len(a.proj_id) - len(b.invc_proj_id)) as rn

from

proj_bill B join TempData A

on a.proj_id like '%'+b.invc_proj_id+'%'

group by a.proj_id, len(a.proj_id) - len(b.invc_proj_id)

) d

where d.rn = 1

Jeff Moden SSC Guru Points: 1004361 More actions · Answer 5

I'm pretty sure indexing won't do much good on this because of all the formulas in the join and the Row_Number partition/order by.

Why must it be ordered by the LEN?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)