I work on sql server 2019 i face issue i can't reduce high cost of sort it reach to 86 percent
so How to reduce it please
additionally it take too much time to execute it .
it take 6:06 minutes
execution plan as below
https://www.brentozar.com/pastetheplan/?id=Sy_IaNtgo
statment that make issue
select a.RecomendationId,cast(STRING_AGG(cast(f1.FeatureValue as varchar(300)) ,'|') WITHIN GROUP(ORDER BY f1.FeatureId ASC)as varchar(300)) AS DiffFeatures into ExtractReports.dbo.TechnologyOriginalFeaturesEqual from extractreports.dbo.partsrecomendationActive a with(nolock)
inner join ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 with(nolock) on f1.partid=a.OrignalPartId
inner join [Technology].Receipe Ft WITH(NOLOCK) on ft.featureid=f1.featureid and ft.operatorid=1
group by a.RecomendationId
issue is statment execution take too much time to execute
ddl with indexes
create table ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg
(
ID int identity(1,1),
PartId int,
FeatureID int,
FeatureName varchar(200),
FeatureValue varchar(200)
)
ALTER TABLE ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg
ADD CONSTRAINT PK_TechnologyPlPartsFeaturValuesOrg PRIMARY KEY (ID);
create index partidoriginalParts_ix on ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg(partid)
create index FlagRecomendationorg_ix on ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg(featureid,FeatureValue)include(partid)
create table extractreports.dbo.partsrecomendationActive
(
RecomendationId int identity(1,1),
RecomendationPartId int,
OrignalPartId int
)
create clustered index recomendations_ix on extractreports.dbo.partsrecomendationActive(RecomendationId)
create nonclustered index recomendationsparts_ix on extractreports.dbo.partsrecomendationActive(RecomendationPartId)
create nonclustered index recomendationsoriginal_ix on extractreports.dbo.partsrecomendationActive(OrignalPartId)
CREATE TABLE [Technology].[Receipe](
[ReceipeID] [int] IDENTITY(1,1) NOT NULL,
[PLID] [int] NULL,
[FeatureID] [int] NULL,
[OperatorID] [int] NULL,
[FeatureTypeID] [int] NULL,
PRIMARY KEY CLUSTERED
(
[ReceipeID] ASC
))
September 10, 2022 at 4:00 pm
Why is there no primary key on the extractreports.dbo.partsrecomendationActive table? You have a clustered index and yet UNIQUE is not specified. This seems to cause a merge join and it's all downhill from there.
The SORT which is causing the issue:
Order By
[ExtractReports].[dbo].[partsrecomendationActive].[a].RecomendationId Ascending
[ExtractReports].[dbo].[TechnologyPlPartsFeaturValuesOrg].[f1].FeatureID Ascending
This seems likely to be caused by the combination of:
WITHIN GROUP(ORDER BY f1.FeatureId ASC)
and
group by a.RecomendationId
The optimizer attempts to SORT 65 million rows after a merge join. Yeeeesh. One way to re-write the query would be to get rid of the GROUP BY by using the PK of the base table and then CROSS APPLY (or OUTER APPLY) the rows to be aggregated. How or is it necessary to summarize extractreports.dbo.partsrecomendationActive so that RecomendationId is unique? It seems intended to be unique but not guaranteed by constraint so duplicate id's could exist?
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
September 10, 2022 at 6:16 pm
I suspect a couple of accidental CROSS JOINs is the issue. Please run the following code and post the results.
SELECT f1_AllCount = COUNT(f1.PartId)
,f1_UnqCount = COUNT(DISTINCT f1.PartID)
FROM Technology.Receipe ft
JOIN dbo.TechnologyPlPartsFeaturValuesOrg f1 ON ft.featureid = f1.featureid
WHERE ft.operatorid = 1
;
SELECT a_AllCount = COUNT(a.OrignalPartId)
,a_UnqCount = COUNT(DISTINCT a.OrignalPartId)
FROM dbo.partsrecomendationActive a
;
Where I'm going with this is a possible separation of the "data layer" and the "presentation" layer. I think you first need to find what you're looking for using "pre-aggregation" to get the correct key lookups for the eventual string aggregation. That also means that I think the ultimate output of more than 13 million rows is wildly incorrect and the condition of the "Sort" is merely a symptom of the accidental CROSS JOIN problem.
I could certainly be wrong in that but that's why I need you to run the code above and post the results.
Remember... "Set-Based" <> "All in one query".
--Jeff Moden
Change is inevitable... Change for the better is not.
September 11, 2022 at 12:02 am
so how to use cross apply as you write below
The optimizer attempts to SORT 65 million rows after a merge join. Yeeeesh. One way to re-write the query would be to get rid of the GROUP BY by using the PK of the base table and then CROSS APPLY (or OUTER APPLY) the rows to be aggregated. How or is it necessary to summarize extractreports.dbo.partsrecomendationActive so that RecomendationId is unique? It seems intended to be unique but not guaranteed by constraint so duplicate id's could exist?
September 11, 2022 at 4:47 am
so how to use cross apply as you write below
The optimizer attempts to SORT 65 million rows after a merge join. Yeeeesh. One way to re-write the query would be to get rid of the GROUP BY by using the PK of the base table and then CROSS APPLY (or OUTER APPLY) the rows to be aggregated. How or is it necessary to summarize extractreports.dbo.partsrecomendationActive so that RecomendationId is unique? It seems intended to be unique but not guaranteed by constraint so duplicate id's could exist?
Please run the 2 queries I asked you to run and provide the results. It's part of the information necessary for CROSS APPLY or any other solution.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 11, 2022 at 9:48 am
SELECT f1_AllCount = COUNT(f1.PartId)
,f1_UnqCount = COUNT(DISTINCT f1.PartID)
FROM Technology.Receipe ft
JOIN ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 ON ft.featureid = f1.featureid
WHERE ft.operatorid = 1;
result is
f1_AllCount f1_UnqCount
124140 24828
SELECT a_AllCount = COUNT(a.OrignalPartId)
,a_UnqCount = COUNT(DISTINCT a.OrignalPartId)
FROM extractreports.dbo.partsrecomendationActive a with(nolock)
Result is
a_AllCount a_UnqCount
13052654 24828
two queries result as you tell me
September 11, 2022 at 5:55 pm
i post counts so what i do after that
or what query best for achieve that quickly
September 11, 2022 at 6:29 pm
SELECT f1_AllCount = COUNT(f1.PartId)
,f1_UnqCount = COUNT(DISTINCT f1.PartID)
FROM Technology.Receipe ft
JOIN ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 ON ft.featureid = f1.featureid
WHERE ft.operatorid = 1;result is
f1_AllCount f1_UnqCount
124140 24828SELECT a_AllCount = COUNT(a.OrignalPartId)
,a_UnqCount = COUNT(DISTINCT a.OrignalPartId)
FROM extractreports.dbo.partsrecomendationActive a with(nolock)Result is
a_AllCount a_UnqCount
13052654 24828two queries result as you tell me
Thank you. This proves that there are no accidental CROSS JOINS (along with what you were kind enough to previously provide as an execution plan but it does prove that we're probably looking at way too much more data that we probably need to be looking at. I'm at work right now and so can't jump into this right now but I believe a little "pre-aggregation" and "Divide'n'Conquer" may work a treat here. In other words, and I could be wrong, but I think we can seriously reduce the data being handled by the STRING_AGG/GROUP BY, which is the source of the painful SORT.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 12, 2022 at 12:50 am
In the ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg table, what does the FeatureValue column contain in reference to the PartId? Also, is FeatureValue a unique value by FeatureID or does it change with each part having the same FeatureID?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 12, 2022 at 1:00 am
In a very real "get real" moment, I have to ask... how many times per day will this be used and how often will the underlying data be modified.
What I'm thinking is that you might want to just consider running it once after any changes to the data and then just read from the table multiple times. In other words, I'm saying it's starting to look like there's very little advantage to trying to make it run in less than 6 minutes. To put it in plain English, I don't see how it will be worth it.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 12, 2022 at 4:39 pm
Would STUFF...FOR XML be faster that STRING_AGG? I think they both will be doing ReBAR under the hood.
Can you pre-calculate the string contatenation and persist it? You could then index it and it would be a single join.
Can you limit the query to changed records and do a MERGE?
Aaron
September 12, 2022 at 5:26 pm
The "pre-aggreation" I spoke of earlier is identical to what you're suggesting. I'm just trying to get the OP to provide some extra information.
I'll also tell you there is no "e" in RBAR... it's only pronounced that way because of the hidden meaning. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
Can you try this?
It should move the sort to the inserts into the clustered index on the temporary table
DROP TABLE IF EXISTS #temp;
CREATE TABLE #temp
(
RecomendationId int, -- [sic]
FeatureValue varchar(200),
FeatureId int,
PRIMARY KEY CLUSTERED (RecomendationId, FeatureId)
);
INSERT INTO #temp(RecomendationId, FeatureValue, FeatureId)
SELECT a.RecomendationId, f1.FeatureValue, f1.FeatureId
FROM extractreports.dbo.partsrecomendationActive a with(nolock)
INNER JOIN ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 with(nolock)
ON f1.partid=a.OrignalPartId
INNER JOIN [Technology].Receipe Ft WITH(NOLOCK)
ON ft.featureid=f1.featureid
AND ft.operatorid=1;
select x.RecomendationId,
CAST(STRING_AGG(cast(x.FeatureValue as varchar(300)) ,'|') WITHIN GROUP(ORDER BY x.FeatureId ASC)as varchar(300)) AS DiffFeatures
into ExtractReports.dbo.TechnologyOriginalFeaturesEqual
from #temp x
group by x.RecomendationId;
Can you also get the times for the insert into and the select into statements?
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply