select DISTINCT cost too high?

  • Hi,

    Pl. suggested me, how to resolve this issuse? In select statment sub-tree DISTINCT cost is 79%. as per atttached actual exec.plan. Is it possible wirtting query alternative ways?

    UPDATE CSV_Details_MainFile

    SET Mark_Rev_No = (CASE

    WHEN (SELECT COUNT(mark_rev_no) as mark_rev_no

    FROM CSV_Details_MainFile H1

    WHERE H1.GA_Drg_NO = CSV_Details_MainFile.GA_Drg_NO

    AND H1.Mark_No = CSV_Details_MainFile.Mark_No) != 0

    THEN

    (SELECT ISNULL(mark_rev_no,0) AS mark_rev_no FROM CSV_Details_MainFile H1

    WHERE H1.GA_Drg_NO = CSV_Details_MainFile.GA_Drg_NO

    and H1.Rev_NO = CSV_Details_MainFile.Rev_NO

    AND H1.Mark_No = CSV_Details_MainFile.Mark_No

    and DeleteFlag='1'

    GROUP BY H1.GA_Drg_NO,H1.Mark_No ,H1.mark_rev_no)

    ELSE

    '1'

    END)

    WHERE CSV_Details_MainFile.GA_Drg_NO='C63-GPE105-499-005' AND Rev_NO = '1' AND

    CSV_Details_MainFile.Mark_No in (SELECT DISTINCT Mark_No FROM CSV_Details_MainFile

    WHERE GA_Drg_NO = 'C63-GPE105-499-005' AND Rev_NO ='1' and DeleteFlag is null)

    Thanks

    ananda

  • Yes of course, but folks will need a sample data script and a decent description of what the query is supposed to do.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ok.. I will update you once getting from Dev.Team for the all details..

    thanks

    ananda

  • Is that 79% cost of the DISTINCT a problem?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You are using the Distinct in a sub-select in the where clause using IN, why use distinct at all? Whether you have 1 record that matches or 100 records it won't cause dupicates because it is just checking the "IN" and not a join.

  • Sarah Wagner (8/6/2013)


    You are using the Distinct in a sub-select in the where clause using IN, why use distinct at all? Whether you have 1 record that matches or 100 records it won't cause dupicates because it is just checking the "IN" and not a join.

    The optimiser's smart enough to ignore distincts in an IN subquery. If removing duplicate rows is required, the optimiser will put in some distinct operation, if not it won't. That's regardless of what is specified in the query.

    http://sqlinthewild.co.za/index.php/2011/01/18/distincting-an-in-subquery/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • When you display execution plan in SSMS, does it suggest any new index?

    From what I see, it could at least use an index on Mark_No, but query plan analyzer may suggest a better composite index.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • ananda.murugesan (8/6/2013)


    ok.. I will update you once getting from Dev.Team for the all details..

    thanks

    ananda

    Here's a guess at the business logic:

    Where a set, partitioned on GA_Drg_NO, Rev_NO and Mark_No,

    contains at least one row where DeleteFlag IS NULL and at least one row where DeleteFlag = '1',

    pick the value of mark_rev_no from a row where DeleteFlag = '1'

    and apply it to all rows of the set.

    Can you confirm if this is correct or not and correct it if it it isn't?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi, ChrisM@Work.. wow what a understand that update query logic..Absloute correct...

    Hi, Eric M Russell.. SSMS - actual execution plan suggested to create one Non clustred index, that was created... (index name is Idx_ActualPlan)

    CREATE NONCLUSTERED INDEX [Idx_ByActual_Plan] ON [dbo].[CSV_Details_MainFile]

    (

    [GA_Drg_NO] ASC,

    [Rev_NO] ASC,

    [Mark_No] ASC,

    [DeleteFlag] ASC

    )

    INCLUDE ( [Mark_Rev_No]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    Before total execution time was 00:00:28 seconds, after that it was just took 00:00:02 seconds after created NC index..But second time actual exex.plan was showing 72% of total cost in SELECT DISTINCT sort.

    Thanks

    ananda

  • -- sample data

    DROP TABLE #CSV_Details_MainFile

    CREATE TABLE #CSV_Details_MainFile (GA_Drg_NO VARCHAR(25), Rev_NO CHAR(1), Mark_No VARCHAR(3), mark_rev_no VARCHAR(3), DeleteFlag CHAR(1))

    INSERT INTO #CSV_Details_MainFile (GA_Drg_NO, Rev_NO, Mark_No, mark_rev_no, DeleteFlag) VALUES

    ('C63-GPE105-499-003','1','10','10','1'),

    ('C63-GPE105-499-003','1','10','10','1'),

    ('C63-GPE105-499-004','1','10','10',NULL),

    ('C63-GPE105-499-004','1','10','40','1'),

    ('C63-GPE105-499-004','2','10','10',NULL),

    ('C63-GPE105-499-004','2','10','10',NULL),

    ('C63-GPE105-499-005','1','10','10',NULL),

    ('C63-GPE105-499-005','1','10','10',NULL),

    ('C63-GPE105-499-005','1','10','20','1'),

    ('C63-GPE105-499-005','1','10','10','1')

    CREATE CLUSTERED INDEX cx_LoadsOfStuff ON #CSV_Details_MainFile (GA_Drg_NO, Rev_NO, Mark_No, DeleteFlag)

    ----------------------------------------------------------------------------------------------------------------------------

    -- Stare & Compare

    SELECT *

    FROM #CSV_Details_MainFile m

    INNER JOIN (

    SELECT GA_Drg_NO, Rev_NO, Mark_No, mark_rev_no = MAX(mark_rev_no)

    FROM (

    SELECT GA_Drg_NO, Rev_NO, Mark_No, mark_rev_no, DeleteFlag,

    rn = DENSE_RANK() OVER(PARTITION BY GA_Drg_NO, Rev_NO, Mark_No ORDER BY DeleteFlag)

    FROM #CSV_Details_MainFile

    ) d

    WHERE rn = 2

    GROUP BY GA_Drg_NO, Rev_NO, Mark_No

    ) s

    ON s.GA_Drg_NO = m.GA_Drg_NO

    AND s.Rev_NO = m.Rev_NO

    AND s.Mark_No = m.Mark_No

    ----------------------------------------------------------------------------------------------------------------------------

    -- Execute proposed new method

    UPDATE m SET Mark_Rev_No = s.mark_rev_no

    FROM #CSV_Details_MainFile m

    INNER JOIN (

    SELECT GA_Drg_NO, Rev_NO, Mark_No, mark_rev_no = MAX(mark_rev_no)

    FROM (

    SELECT GA_Drg_NO, Rev_NO, Mark_No, mark_rev_no, DeleteFlag,

    rn = DENSE_RANK() OVER(PARTITION BY GA_Drg_NO, Rev_NO, Mark_No ORDER BY DeleteFlag)

    FROM #CSV_Details_MainFile

    ) d

    WHERE rn = 2

    GROUP BY GA_Drg_NO, Rev_NO, Mark_No

    ) s

    ON s.GA_Drg_NO = m.GA_Drg_NO

    AND s.Rev_NO = m.Rev_NO

    AND s.Mark_No = m.Mark_No

    WHERE m.GA_Drg_NO = 'C63-GPE105-499-005' AND m.Rev_NO = '1'

    -- execute old method

    UPDATE #CSV_Details_MainFile

    SET Mark_Rev_No = (CASE

    WHEN (SELECT COUNT(mark_rev_no) as mark_rev_no

    FROM #CSV_Details_MainFile H1

    WHERE H1.GA_Drg_NO = #CSV_Details_MainFile.GA_Drg_NO

    AND H1.Mark_No = #CSV_Details_MainFile.Mark_No) != 0

    THEN

    (SELECT ISNULL(mark_rev_no,0) AS mark_rev_no FROM #CSV_Details_MainFile H1

    WHERE H1.GA_Drg_NO = #CSV_Details_MainFile.GA_Drg_NO

    and H1.Rev_NO = #CSV_Details_MainFile.Rev_NO

    AND H1.Mark_No = #CSV_Details_MainFile.Mark_No

    and DeleteFlag='1'

    GROUP BY H1.GA_Drg_NO,H1.Mark_No ,H1.mark_rev_no)

    ELSE

    '1'

    END)

    WHERE #CSV_Details_MainFile.GA_Drg_NO='C63-GPE105-499-005' AND Rev_NO = '1'

    AND #CSV_Details_MainFile.Mark_No in (SELECT DISTINCT Mark_No FROM #CSV_Details_MainFile

    WHERE GA_Drg_NO = 'C63-GPE105-499-005' AND Rev_NO ='1' and DeleteFlag is null)


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • ananda.murugesan (8/6/2013)


    Before total execution time was 00:00:28 seconds, after that it was just took 00:00:02 seconds after created NC index..But second time actual exex.plan was showing 72% of total cost in SELECT DISTINCT sort.

    Again, is that 72% a problem? Why are you worrying again about operator costs?

    To emphasise the point, if I have a query that has 95% of its cost in a clustered index scan, is that always, automatically and without further consideration a problem that needs to be addressed?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • No more problem addressed even query working fine with 00:00:02 seconds and DISTINCT cost 72%..

    Thank you..

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply