select DISTINCT cost too high?

Question

select DISTINCT cost too high?

SQL Galaxy

One Orange Chip

Points: 29479
More actions
August 6, 2013 at 5:50 am

#296474

Hi,
Pl. suggested me, how to resolve this issuse? In select statment sub-tree DISTINCT cost is 79%. as per atttached actual exec.plan. Is it possible wirtting query alternative ways?
UPDATE CSV_Details_MainFile
SET Mark_Rev_No = (CASE
WHEN (SELECT COUNT(mark_rev_no) as mark_rev_no
FROM CSV_Details_MainFile H1
WHERE H1.GA_Drg_NO = CSV_Details_MainFile.GA_Drg_NO
AND H1.Mark_No = CSV_Details_MainFile.Mark_No) != 0
THEN
(SELECT ISNULL(mark_rev_no,0) AS mark_rev_no FROM CSV_Details_MainFile H1
WHERE H1.GA_Drg_NO = CSV_Details_MainFile.GA_Drg_NO
and H1.Rev_NO = CSV_Details_MainFile.Rev_NO
AND H1.Mark_No = CSV_Details_MainFile.Mark_No
and DeleteFlag='1'
GROUP BY H1.GA_Drg_NO,H1.Mark_No ,H1.mark_rev_no)
ELSE
'1'
END)
WHERE CSV_Details_MainFile.GA_Drg_NO='C63-GPE105-499-005' AND Rev_NO = '1' AND
CSV_Details_MainFile.Mark_No in (SELECT DISTINCT Mark_No FROM CSV_Details_MainFile
WHERE GA_Drg_NO = 'C63-GPE105-499-005' AND Rev_NO ='1' and DeleteFlag is null)
Thanks
ananda

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

Yes of course, but folks will need a sample data script and a decent description of what the query is supposed to do.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

SQL Galaxy One Orange Chip Points: 29479 More actions · Answer 2

ok.. I will update you once getting from Dev.Team for the all details..

thanks

ananda

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

Is that 79% cost of the DISTINCT a problem?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Sarah Wagner Hall of Fame Points: 3178 More actions · Answer 4

You are using the Distinct in a sub-select in the where clause using IN, why use distinct at all? Whether you have 1 record that matches or 100 records it won't cause dupicates because it is just checking the "IN" and not a join.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

Sarah Wagner (8/6/2013)
You are using the Distinct in a sub-select in the where clause using IN, why use distinct at all? Whether you have 1 record that matches or 100 records it won't cause dupicates because it is just checking the "IN" and not a join.

The optimiser's smart enough to ignore distincts in an IN subquery. If removing duplicate rows is required, the optimiser will put in some distinct operation, if not it won't. That's regardless of what is specified in the query.

http://sqlinthewild.co.za/index.php/2011/01/18/distincting-an-in-subquery/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Eric M Russell SSC Guru Points: 125623 More actions · Answer 6

When you display execution plan in SSMS, does it suggest any new index?

From what I see, it could at least use an index on Mark_No, but query plan analyzer may suggest a better composite index.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

ananda.murugesan (8/6/2013)
ok.. I will update you once getting from Dev.Team for the all details..
thanks
ananda

Here's a guess at the business logic:

Where a set, partitioned on GA_Drg_NO, Rev_NO and Mark_No,

contains at least one row where DeleteFlag IS NULL and at least one row where DeleteFlag = '1',

pick the value of mark_rev_no from a row where DeleteFlag = '1'

and apply it to all rows of the set.

Can you confirm if this is correct or not and correct it if it it isn't?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

SQL Galaxy One Orange Chip Points: 29479 More actions · Answer 8

Hi, ChrisM@Work.. wow what a understand that update query logic..Absloute correct...

Hi, Eric M Russell.. SSMS - actual execution plan suggested to create one Non clustred index, that was created... (index name is Idx_ActualPlan)

CREATE NONCLUSTERED INDEX [Idx_ByActual_Plan] ON [dbo].[CSV_Details_MainFile]

(

[GA_Drg_NO] ASC,

[Rev_NO] ASC,

[Mark_No] ASC,

[DeleteFlag] ASC

)

INCLUDE ( [Mark_Rev_No]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

Before total execution time was 00:00:28 seconds, after that it was just took 00:00:02 seconds after created NC index..But second time actual exex.plan was showing 72% of total cost in SELECT DISTINCT sort.

Thanks

ananda

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 9

-- sample data

DROP TABLE #CSV_Details_MainFile

CREATE TABLE #CSV_Details_MainFile (GA_Drg_NO VARCHAR(25), Rev_NO CHAR(1), Mark_No VARCHAR(3), mark_rev_no VARCHAR(3), DeleteFlag CHAR(1))

INSERT INTO #CSV_Details_MainFile (GA_Drg_NO, Rev_NO, Mark_No, mark_rev_no, DeleteFlag) VALUES

('C63-GPE105-499-003','1','10','10','1'),

('C63-GPE105-499-004','1','10','10',NULL),

('C63-GPE105-499-004','1','10','40','1'),

('C63-GPE105-499-004','2','10','10',NULL),

('C63-GPE105-499-005','1','10','10',NULL),

('C63-GPE105-499-005','1','10','20','1'),

('C63-GPE105-499-005','1','10','10','1')

CREATE CLUSTERED INDEX cx_LoadsOfStuff ON #CSV_Details_MainFile (GA_Drg_NO, Rev_NO, Mark_No, DeleteFlag)

----------------------------------------------------------------------------------------------------------------------------

-- Stare & Compare

SELECT *

FROM #CSV_Details_MainFile m

INNER JOIN (

SELECT GA_Drg_NO, Rev_NO, Mark_No, mark_rev_no = MAX(mark_rev_no)

FROM (

SELECT GA_Drg_NO, Rev_NO, Mark_No, mark_rev_no, DeleteFlag,

rn = DENSE_RANK() OVER(PARTITION BY GA_Drg_NO, Rev_NO, Mark_No ORDER BY DeleteFlag)

FROM #CSV_Details_MainFile

) d

WHERE rn = 2

GROUP BY GA_Drg_NO, Rev_NO, Mark_No

) s

ON s.GA_Drg_NO = m.GA_Drg_NO

AND s.Rev_NO = m.Rev_NO

AND s.Mark_No = m.Mark_No

----------------------------------------------------------------------------------------------------------------------------

-- Execute proposed new method

UPDATE m SET Mark_Rev_No = s.mark_rev_no

FROM #CSV_Details_MainFile m

INNER JOIN (

SELECT GA_Drg_NO, Rev_NO, Mark_No, mark_rev_no = MAX(mark_rev_no)

FROM (

SELECT GA_Drg_NO, Rev_NO, Mark_No, mark_rev_no, DeleteFlag,

rn = DENSE_RANK() OVER(PARTITION BY GA_Drg_NO, Rev_NO, Mark_No ORDER BY DeleteFlag)

FROM #CSV_Details_MainFile

) d

WHERE rn = 2

GROUP BY GA_Drg_NO, Rev_NO, Mark_No

) s

ON s.GA_Drg_NO = m.GA_Drg_NO

AND s.Rev_NO = m.Rev_NO

AND s.Mark_No = m.Mark_No

WHERE m.GA_Drg_NO = 'C63-GPE105-499-005' AND m.Rev_NO = '1'

-- execute old method

UPDATE #CSV_Details_MainFile

SET Mark_Rev_No = (CASE

WHEN (SELECT COUNT(mark_rev_no) as mark_rev_no

FROM #CSV_Details_MainFile H1

WHERE H1.GA_Drg_NO = #CSV_Details_MainFile.GA_Drg_NO

AND H1.Mark_No = #CSV_Details_MainFile.Mark_No) != 0

THEN

(SELECT ISNULL(mark_rev_no,0) AS mark_rev_no FROM #CSV_Details_MainFile H1

WHERE H1.GA_Drg_NO = #CSV_Details_MainFile.GA_Drg_NO

and H1.Rev_NO = #CSV_Details_MainFile.Rev_NO

AND H1.Mark_No = #CSV_Details_MainFile.Mark_No

and DeleteFlag='1'

GROUP BY H1.GA_Drg_NO,H1.Mark_No ,H1.mark_rev_no)

ELSE

'1'

END)

WHERE #CSV_Details_MainFile.GA_Drg_NO='C63-GPE105-499-005' AND Rev_NO = '1'

AND #CSV_Details_MainFile.Mark_No in (SELECT DISTINCT Mark_No FROM #CSV_Details_MainFile

WHERE GA_Drg_NO = 'C63-GPE105-499-005' AND Rev_NO ='1' and DeleteFlag is null)

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 10

ananda.murugesan (8/6/2013)
Before total execution time was 00:00:28 seconds, after that it was just took 00:00:02 seconds after created NC index..But second time actual exex.plan was showing 72% of total cost in SELECT DISTINCT sort.

Again, is that 72% a problem? Why are you worrying again about operator costs?

To emphasise the point, if I have a query that has 95% of its cost in a clustered index scan, is that always, automatically and without further consideration a problem that needs to be addressed?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQL Galaxy One Orange Chip Points: 29479 More actions · Answer 11

No more problem addressed even query working fine with 00:00:02 seconds and DISTINCT cost 72%..

Thank you..