query take long time when execute it so how to solve issue ?

  • I work on sql server 2019 i face issue i can't reduce high cost of sort it reach to 86 percent

    so How to reduce it please

    additionally it take too much time to execute it .

    it take 6:06 minutes

    execution plan as below

    https://www.brentozar.com/pastetheplan/?id=Sy_IaNtgo

    statment that make issue

    select a.RecomendationId,cast(STRING_AGG(cast(f1.FeatureValue as varchar(300)) ,'|')  WITHIN GROUP(ORDER BY f1.FeatureId ASC)as varchar(300))  AS DiffFeatures into ExtractReports.dbo.TechnologyOriginalFeaturesEqual from extractreports.dbo.partsrecomendationActive a with(nolock)
    inner join ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 with(nolock) on f1.partid=a.OrignalPartId
    inner join [Technology].Receipe Ft WITH(NOLOCK) on ft.featureid=f1.featureid and ft.operatorid=1
    group by a.RecomendationId

    issue is statment execution take too much time to execute

    ddl with indexes

    create table ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg
    (
    ID int identity(1,1),
    PartId int,
    FeatureID int,
    FeatureName varchar(200),
    FeatureValue varchar(200)
    )

    ALTER TABLE ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg
    ADD CONSTRAINT PK_TechnologyPlPartsFeaturValuesOrg PRIMARY KEY (ID);
    create index partidoriginalParts_ix on ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg(partid)
    create index FlagRecomendationorg_ix on ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg(featureid,FeatureValue)include(partid)

    create table extractreports.dbo.partsrecomendationActive
    (
    RecomendationId int identity(1,1),
    RecomendationPartId int,
    OrignalPartId int

    )

    create clustered index recomendations_ix on extractreports.dbo.partsrecomendationActive(RecomendationId)

    create nonclustered index recomendationsparts_ix on extractreports.dbo.partsrecomendationActive(RecomendationPartId)

    create nonclustered index recomendationsoriginal_ix on extractreports.dbo.partsrecomendationActive(OrignalPartId)

    CREATE TABLE [Technology].[Receipe](
    [ReceipeID] [int] IDENTITY(1,1) NOT NULL,
    [PLID] [int] NULL,
    [FeatureID] [int] NULL,
    [OperatorID] [int] NULL,
    [FeatureTypeID] [int] NULL,
    PRIMARY KEY CLUSTERED
    (
    [ReceipeID] ASC
    ))

    • This topic was modified 2 years, 3 months ago by  ahmed_elbarbary.2010. Reason: add more details for clear
  • Why is there no primary key on the extractreports.dbo.partsrecomendationActive table?  You have a clustered index and yet UNIQUE is not specified.   This seems to cause a merge join and it's all downhill from there.

    The SORT which is causing the issue:

    Order By
    [ExtractReports].[dbo].[partsrecomendationActive].[a].RecomendationId Ascending
    [ExtractReports].[dbo].[TechnologyPlPartsFeaturValuesOrg].[f1].FeatureID Ascending

    This seems likely to be caused by the combination of:

    WITHIN GROUP(ORDER BY f1.FeatureId ASC)

    and

    group by a.RecomendationId

    The optimizer attempts to SORT 65 million rows after a merge join.  Yeeeesh.   One way to re-write the query would be to get rid of the GROUP BY by using the PK of the base table and then CROSS APPLY (or OUTER APPLY) the rows to be aggregated.  How or is it necessary to summarize extractreports.dbo.partsrecomendationActive so that RecomendationId is unique?  It seems intended to be unique but not guaranteed by constraint so duplicate id's could exist?

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • I suspect a couple of accidental CROSS JOINs is the issue.  Please run the following code and post the results.

     SELECT  f1_AllCount = COUNT(f1.PartId)
    ,f1_UnqCount = COUNT(DISTINCT f1.PartID)
    FROM Technology.Receipe ft
    JOIN dbo.TechnologyPlPartsFeaturValuesOrg f1 ON ft.featureid = f1.featureid
    WHERE ft.operatorid = 1
    ;
    SELECT a_AllCount = COUNT(a.OrignalPartId)
    ,a_UnqCount = COUNT(DISTINCT a.OrignalPartId)
    FROM dbo.partsrecomendationActive a
    ;

    Where I'm going with this is a possible separation of the "data layer" and the "presentation" layer. I think you first need to find what you're looking for using "pre-aggregation" to get the correct key lookups for the eventual string aggregation.  That also means that I think the ultimate output of more than 13 million rows is wildly incorrect and the condition of the "Sort" is merely a symptom of the accidental CROSS JOIN problem.

    I could certainly be wrong in that but that's why I need you to run the code above and post the results.

    Remember... "Set-Based" <> "All in one query".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • so how to use cross apply as you write below

    The optimizer attempts to SORT 65 million rows after a merge join.  Yeeeesh.   One way to re-write the query would be to get rid of the GROUP BY by using the PK of the base table and then CROSS APPLY (or OUTER APPLY) the rows to be aggregated.  How or is it necessary to summarize extractreports.dbo.partsrecomendationActive so that RecomendationId is unique?  It seems intended to be unique but not guaranteed by constraint so duplicate id's could exist?

  • ahmed_elbarbary.2010 wrote:

    so how to use cross apply as you write below

    The optimizer attempts to SORT 65 million rows after a merge join.  Yeeeesh.   One way to re-write the query would be to get rid of the GROUP BY by using the PK of the base table and then CROSS APPLY (or OUTER APPLY) the rows to be aggregated.  How or is it necessary to summarize extractreports.dbo.partsrecomendationActive so that RecomendationId is unique?  It seems intended to be unique but not guaranteed by constraint so duplicate id's could exist?

    Please run the 2 queries I asked you to run and provide the results.  It's part of the information necessary for CROSS APPLY or any other solution.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SELECT f1_AllCount = COUNT(f1.PartId)
    ,f1_UnqCount = COUNT(DISTINCT f1.PartID)
    FROM Technology.Receipe ft
    JOIN ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 ON ft.featureid = f1.featureid
    WHERE ft.operatorid = 1;

    result is

    f1_AllCount         f1_UnqCount
    124140                  24828
    SELECT a_AllCount = COUNT(a.OrignalPartId)
    ,a_UnqCount = COUNT(DISTINCT a.OrignalPartId)
    FROM extractreports.dbo.partsrecomendationActive a with(nolock)

    Result is

    a_AllCount   a_UnqCount
    13052654 24828

    two queries result as you tell me

  • i post counts so what i do after that

    or what query best for achieve that quickly

  • ahmed_elbarbary.2010 wrote:

    SELECT f1_AllCount = COUNT(f1.PartId)
    ,f1_UnqCount = COUNT(DISTINCT f1.PartID)
    FROM Technology.Receipe ft
    JOIN ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 ON ft.featureid = f1.featureid
    WHERE ft.operatorid = 1;

    result is

    f1_AllCount         f1_UnqCount
    124140                  24828
    SELECT a_AllCount = COUNT(a.OrignalPartId)
    ,a_UnqCount = COUNT(DISTINCT a.OrignalPartId)
    FROM extractreports.dbo.partsrecomendationActive a with(nolock)

    Result is

    a_AllCount   a_UnqCount
    13052654 24828

    two queries result as you tell me

    Thank you.  This proves that there are no accidental CROSS JOINS (along with what you were kind enough to previously provide as an execution plan but it does prove that we're probably looking at way too much more data that we probably need to be looking at.  I'm at work right now and so can't jump into this right now but I believe a little "pre-aggregation" and "Divide'n'Conquer" may work a treat here.  In other words, and I could be wrong, but I think we can seriously reduce the data being handled by the STRING_AGG/GROUP BY, which is the source of the painful SORT.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • In the ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg table, what does the FeatureValue column contain in reference to the PartId?  Also, is FeatureValue a unique value by FeatureID or does it change with each part having the same FeatureID?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • In a very real "get real" moment, I have to ask... how many times per day will this be used and how often will the underlying data be modified.

    What I'm thinking is that you might want to just consider running it once after any changes to the data and then just read from the table multiple times.  In other words, I'm saying it's starting to look like there's very little advantage to trying to make it run in less than 6 minutes.  To put it in plain English, I don't see how it will be worth it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Would STUFF...FOR XML be faster that STRING_AGG? I think they both will be doing ReBAR under the hood.

    Can you pre-calculate the string contatenation and persist it?  You could then index it and it would be a single join.

    Can you limit the query to changed records and do a MERGE?

     

    Aaron

     

     

     

     

  • The "pre-aggreation" I spoke of earlier is identical to what you're suggesting.  I'm just trying to get the OP to provide some extra information.

    I'll also tell you there is no "e" in RBAR... it's only pronounced that way because of the hidden meaning. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Can you try this?

    It should move the sort to the inserts into the clustered index on the temporary table

    DROP TABLE IF EXISTS #temp;

    CREATE TABLE #temp
    (
    RecomendationId int, -- [sic]
    FeatureValue varchar(200),
    FeatureId int,
    PRIMARY KEY CLUSTERED (RecomendationId, FeatureId)
    );

    INSERT INTO #temp(RecomendationId, FeatureValue, FeatureId)
    SELECT a.RecomendationId, f1.FeatureValue, f1.FeatureId
    FROM extractreports.dbo.partsrecomendationActive a with(nolock)
    INNER JOIN ExtractReports.dbo.TechnologyPlPartsFeaturValuesOrg f1 with(nolock)
    ON f1.partid=a.OrignalPartId
    INNER JOIN [Technology].Receipe Ft WITH(NOLOCK)
    ON ft.featureid=f1.featureid
    AND ft.operatorid=1;

    select x.RecomendationId,
    CAST(STRING_AGG(cast(x.FeatureValue as varchar(300)) ,'|') WITHIN GROUP(ORDER BY x.FeatureId ASC)as varchar(300)) AS DiffFeatures
    into ExtractReports.dbo.TechnologyOriginalFeaturesEqual
    from #temp x
    group by x.RecomendationId;

    Can you also get the times for the insert into and the select into statements?

     

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply