query take long time when execute it so how to solve issue ?

Question

Post reply

query take long time when execute it so how to solve issue ?

ahmed_elbarbary.2010

Hall of Fame

Points: 3118

September 10, 2022 at 2:39 pm

Go to Answer

#4086190

I work on sql server 2019 i face issue i can't reduce high cost of sort it reach to 86 percent

so How to reduce it please

additionally it take too much time to execute it .

it take 6:06 minutes

execution plan as below

https://www.brentozar.com/pastetheplan/?id=Sy_IaNtgo

statment that make issue

select a.RecomendationId,cast(STRING_AGG(cast(f1.FeatureValue as varchar(300)) ,'|')  WITHIN GROUP(ORDER BY f1.FeatureId ASC)as varchar(300))  AS DiffFeatures into ExtractReports.dbo.TechnologyOriginalFeaturesEqual from extractreports.dbo.partsrecomendationActive a with(nolock)
 inner join ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 with(nolock) on f1.partid=a.OrignalPartId
 inner join [Technology].Receipe Ft WITH(NOLOCK) on ft.featureid=f1.featureid and ft.operatorid=1
 group by a.RecomendationId

issue is statment execution take too much time to execute

ddl with indexes

create table ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg
       (
       ID int identity(1,1),
       PartId int,
       FeatureID int,
       FeatureName varchar(200),
       FeatureValue varchar(200)
       )
    
       ALTER TABLE ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg
 ADD CONSTRAINT PK_TechnologyPlPartsFeaturValuesOrg PRIMARY KEY (ID);
 create index partidoriginalParts_ix on ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg(partid)
 create index FlagRecomendationorg_ix on ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg(featureid,FeatureValue)include(partid)
    
 create table extractreports.dbo.partsrecomendationActive
 (
 RecomendationId int identity(1,1),
 RecomendationPartId int,
 OrignalPartId int
    
 )
    
 create clustered index recomendations_ix on extractreports.dbo.partsrecomendationActive(RecomendationId)
    
 create nonclustered index recomendationsparts_ix on extractreports.dbo.partsrecomendationActive(RecomendationPartId)
    
 create nonclustered index recomendationsoriginal_ix on extractreports.dbo.partsrecomendationActive(OrignalPartId)
    
 CREATE TABLE [Technology].[Receipe](
     [ReceipeID] [int] IDENTITY(1,1) NOT NULL,
     [PLID] [int] NULL,
     [FeatureID] [int] NULL,
     [OperatorID] [int] NULL,
     [FeatureTypeID] [int] NULL,
 PRIMARY KEY CLUSTERED 
 (
     [ReceipeID] ASC
 ))

This topic was modified 2 years, 2 months ago by ahmed_elbarbary.2010. Reason: add more details for clear

Jonathan AC Roberts

SSCoach

Points: 18352

September 12, 2022 at 6:09 pm

Answer

#4086766

Can you try this?

It should move the sort to the inserts into the clustered index on the temporary table

DROP TABLE IF EXISTS #temp;

CREATE TABLE #temp
(
    RecomendationId int, -- [sic]
    FeatureValue varchar(200), 
    FeatureId int,
    PRIMARY KEY CLUSTERED (RecomendationId, FeatureId)
);

INSERT INTO #temp(RecomendationId, FeatureValue, FeatureId)
SELECT a.RecomendationId, f1.FeatureValue, f1.FeatureId  
  FROM extractreports.dbo.partsrecomendationActive a with(nolock)
 INNER JOIN ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 with(nolock) 
         ON f1.partid=a.OrignalPartId
 INNER JOIN [Technology].Receipe Ft WITH(NOLOCK) 
         ON ft.featureid=f1.featureid 
        AND ft.operatorid=1;

select x.RecomendationId, 
       CAST(STRING_AGG(cast(x.FeatureValue as varchar(300)) ,'|')  WITHIN GROUP(ORDER BY x.FeatureId ASC)as varchar(300))  AS DiffFeatures 
  into ExtractReports.dbo.TechnologyOriginalFeaturesEqual 
  from #temp x
 group by x.RecomendationId;

Can you also get the times for the insert into and the select into statements?

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Steve Collins SSCrazy Points: 2157 More actions · Answer 1

Why is there no primary key on the extractreports.dbo.partsrecomendationActive table? You have a clustered index and yet UNIQUE is not specified. This seems to cause a merge join and it's all downhill from there.

The SORT which is causing the issue:

Order By
[ExtractReports].[dbo].[partsrecomendationActive].[a].RecomendationId Ascending
[ExtractReports].[dbo].[TechnologyPlPartsFeaturValuesOrg].[f1].FeatureID Ascending

This seems likely to be caused by the combination of:

WITHIN GROUP(ORDER BY f1.FeatureId ASC)

and

group by a.RecomendationId

The optimizer attempts to SORT 65 million rows after a merge join. Yeeeesh. One way to re-write the query would be to get rid of the GROUP BY by using the PK of the base table and then CROSS APPLY (or OUTER APPLY) the rows to be aggregated. How or is it necessary to summarize extractreports.dbo.partsrecomendationActive so that RecomendationId is unique? It seems intended to be unique but not guaranteed by constraint so duplicate id's could exist?

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 2

I suspect a couple of accidental CROSS JOINs is the issue. Please run the following code and post the results.

 SELECT  f1_AllCount = COUNT(f1.PartId)
        ,f1_UnqCount = COUNT(DISTINCT f1.PartID)
   FROM Technology.Receipe                   ft
   JOIN dbo.TechnologyPlPartsFeaturValuesOrg f1 ON ft.featureid = f1.featureid
  WHERE ft.operatorid = 1
;
 SELECT  a_AllCount = COUNT(a.OrignalPartId)
        ,a_UnqCount = COUNT(DISTINCT a.OrignalPartId)
   FROM dbo.partsrecomendationActive a
;

Where I'm going with this is a possible separation of the "data layer" and the "presentation" layer. I think you first need to find what you're looking for using "pre-aggregation" to get the correct key lookups for the eventual string aggregation. That also means that I think the ultimate output of more than 13 million rows is wildly incorrect and the condition of the "Sort" is merely a symptom of the accidental CROSS JOIN problem.

I could certainly be wrong in that but that's why I need you to run the code above and post the results.

Remember... "Set-Based" <> "All in one query".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ahmed_elbarbary.2010 Hall of Fame Points: 3118 More actions · Answer 3

so how to use cross apply as you write below

The optimizer attempts to SORT 65 million rows after a merge join. Yeeeesh. One way to re-write the query would be to get rid of the GROUP BY by using the PK of the base table and then CROSS APPLY (or OUTER APPLY) the rows to be aggregated. How or is it necessary to summarize extractreports.dbo.partsrecomendationActive so that RecomendationId is unique? It seems intended to be unique but not guaranteed by constraint so duplicate id's could exist?

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 4

ahmed_elbarbary.2010 wrote:

so how to use cross apply as you write below
The optimizer attempts to SORT 65 million rows after a merge join. Yeeeesh. One way to re-write the query would be to get rid of the GROUP BY by using the PK of the base table and then CROSS APPLY (or OUTER APPLY) the rows to be aggregated. How or is it necessary to summarize extractreports.dbo.partsrecomendationActive so that RecomendationId is unique? It seems intended to be unique but not guaranteed by constraint so duplicate id's could exist?

Please run the 2 queries I asked you to run and provide the results. It's part of the information necessary for CROSS APPLY or any other solution.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ahmed_elbarbary.2010 Hall of Fame Points: 3118 More actions · Answer 5

SELECT f1_AllCount = COUNT(f1.PartId)
,f1_UnqCount = COUNT(DISTINCT f1.PartID)
FROM Technology.Receipe ft
JOIN ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 ON ft.featureid = f1.featureid
WHERE ft.operatorid = 1;

result is

f1_AllCount         f1_UnqCount
124140                  24828

SELECT a_AllCount = COUNT(a.OrignalPartId)
,a_UnqCount = COUNT(DISTINCT a.OrignalPartId)
FROM extractreports.dbo.partsrecomendationActive a with(nolock)

Result is

a_AllCount   a_UnqCount
13052654     24828

two queries result as you tell me

ahmed_elbarbary.2010 Hall of Fame Points: 3118 More actions · Answer 6

i post counts so what i do after that

or what query best for achieve that quickly

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 7

ahmed_elbarbary.2010 wrote:

SELECT f1_AllCount = COUNT(f1.PartId)
,f1_UnqCount = COUNT(DISTINCT f1.PartID)
FROM Technology.Receipe ft
JOIN ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 ON ft.featureid = f1.featureid
WHERE ft.operatorid = 1;

result is

f1_AllCount         f1_UnqCount
124140                  24828

SELECT a_AllCount = COUNT(a.OrignalPartId)
,a_UnqCount = COUNT(DISTINCT a.OrignalPartId)
FROM extractreports.dbo.partsrecomendationActive a with(nolock)

Result is

a_AllCount   a_UnqCount
13052654     24828

two queries result as you tell me

Thank you. This proves that there are no accidental CROSS JOINS (along with what you were kind enough to previously provide as an execution plan but it does prove that we're probably looking at way too much more data that we probably need to be looking at. I'm at work right now and so can't jump into this right now but I believe a little "pre-aggregation" and "Divide'n'Conquer" may work a treat here. In other words, and I could be wrong, but I think we can seriously reduce the data being handled by the STRING_AGG/GROUP BY, which is the source of the painful SORT.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 8

In the ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg table, what does the FeatureValue column contain in reference to the PartId? Also, is FeatureValue a unique value by FeatureID or does it change with each part having the same FeatureID?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 9

In a very real "get real" moment, I have to ask... how many times per day will this be used and how often will the underlying data be modified.

What I'm thinking is that you might want to just consider running it once after any changes to the data and then just read from the table multiple times. In other words, I'm saying it's starting to look like there's very little advantage to trying to make it run in less than 6 minutes. To put it in plain English, I don't see how it will be worth it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

aaron.reese SSChampion Points: 13498 More actions · Answer 10

Would STUFF...FOR XML be faster that STRING_AGG? I think they both will be doing ReBAR under the hood.

Can you pre-calculate the string contatenation and persist it? You could then index it and it would be a single join.

Can you limit the query to changed records and do a MERGE?

Aaron

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 11

The "pre-aggreation" I spoke of earlier is identical to what you're suggesting. I'm just trying to get the OP to provide some extra information.

I'll also tell you there is no "e" in RBAR... it's only pronounced that way because of the hidden meaning. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)