November 20, 2008 at 6:49 am
Hello,
I have been told that the keyword DISTINCT is rather costly for sqlserver and was wondering with witch statement we could bypass this keyword and still become the resultset
consider this:
CREATE TABLE [dbo].[tbl_1](
[tbl1_id] [int] NOT NULL,
[field1] [varchar](50) COLLATE Latin1_General_BIN NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[tbl_2](
[tbl2_id] [int] NOT NULL,
[tbl1_id] [int] NOT NULL,
[tbl2_actief] [bit] NOT NULL CONSTRAINT [DF_tbl_2_tbl2_actief] DEFAULT ((1))
) ON [PRIMARY]
INSERT INTO [tbl_1] (
[tbl1_id],
[field1]
)
SELECT 1,'A'
UNION ALL
SELECT 2,'B'
UNION ALL
SELECT 3,'C'
INSERT INTO [tbl_2] (
[tbl2_id],
[tbl1_id],
[tbl2_actief]
)
SELECT 1,1,1
UNION ALL
SELECT 2,1,1
UNION ALL
SELECT 3,2,0
SELECT DISTINCT [t].[tbl1_id], [field1] FROM [tbl_1] AS t INNER JOIN [tbl_2] AS t2 ON t.[tbl1_id] = t2.[tbl1_id] WHERE [tbl2_actief] = 1
shows me only:
tbl1_id field1
1 A
This works for me because it shows me al fields from table1 for witch there is a relation in table 2 and this relation is active in table2
Because i need only all fields from table 1 and none from table 2 the distinct does what it should for me
But can we make another select statement that does not need the distinct en still gives me the same results back .?
PS: is it really really true that the distinct keyword is very expensive.? or am i asking a stupid question here?
wkr,
Eddy
November 20, 2008 at 7:38 am
so you want only UNIQUE Field1 from Table1? is that the requirement?
and it's okay to have duplicate entries in Table2?
GROUP BY or ROW_NUMBER can mimic DISTINCT as well
November 20, 2008 at 8:11 am
That is idd whats the demand,
DISTINCT works fine, don't understand me wrong, but cause i have been told its more expensive i'm looking for a more faster query method if possible
because my resultsets in production will contain several thousands of rows and will be called multiple times a minute during peak hours.
So i have just created my full statement for a view, and for now with the test data is runs lightning fast
Queryplan seems fine to
but still i want to be proactive for when we will reach production numbers.
Total View select statement looks like:
SELECT b.BRL_ID, b.PAR_ID_FIRMA, b.OFF_ID, b.PAR_ID_TYPE, b.PAR_ID_DISC,
b.ADR_ID_KLANT,b.ARG_ID, b.ART_ID, b.ADR_ID_LAADNR, b.ADR_ID_LOSNR,
b.PLS_ID_LADEN, b.PLS_ID_LOSSEN, b.PAR_ID_EENHEID, b.PAR_ID_MUNT,
b.BRL_BASISNR, b.BRL_BEGDATUM, b.BRL_ENDDATUM, b.BRL_REMOVED,
b.BRL_REGDATUM, dbo.ADRES.ADR_NAAM AS KLANTNAAM, ADRES_1.ADR_NAAM AS LAADPLAATS,
ADRES_2.ADR_NAAM AS LOSPLAATS, dbo.PLAATS.PLS_CODE AS PLAATSCODELADEN, dbo.PLAATS.PLS_OMSCHR AS PLAATSLADEN,
PLAATS_1.PLS_CODE AS PLAATSCODELOSSEN, PLAATS_1.PLS_OMSCHR AS PLAATSLOSSEN, dbo.ADRSRT.ADS_NR AS KLANTNR,
dbo.MASTERREL.PAR_ID_TYPE AS PAR_ID_TYPE_MASTER, dbo.ARTIKEL.ART_OMS_NED AS ARTIKELNAAM
FROM
(SELECT DISTINCT
b.BRL_ID, b.OFF_ID, b.MRL_ID, b.PAR_ID_FIRMA, b.PAR_ID_TYPE, b.PAR_ID_DISC, b.BRL_REGDATUM, b.BRL_BEGDATUM, b.BRL_ENDDATUM,
b.BRL_REMOVED, b.BRL_REMOVED_LISA, b.BRL_BASISNR, b.ADR_ID_KLANT, b.ARG_ID, b.ART_ID, b.ADR_ID_LAADNR, b.ADR_ID_LOSNR,
b.PAR_ID_EENHEID, b.PAR_ID_MUNT, b.PLS_ID_LADEN, b.PLS_ID_LOSSEN, b.BRL_FACTLIJST, b.BRL_OPM_FACTLIJST, b.BRL_OPM_FACTUUR,
b.BRL_OPM_NED_LAAPLAATS, b.BRL_OPM_FRA_LAADPLAATS, b.BRL_OPM_NED_LOSPLAATS, b.BRL_OPM_FRA_LOSPLAATS,
b.BRL_OPL_EIGENDOM
FROM dbo.BASISREL AS b INNER JOIN
dbo.DEELREL AS d ON b.BRL_ID = d.BRL_ID
WHERE (d.DLR_REMOVED = 0)) AS b
INNER JOIN
dbo.MASTERREL ON b.MRL_ID = dbo.MASTERREL.MRL_ID INNER JOIN
dbo.ARTIKEL ON b.ART_ID = dbo.ARTIKEL.ART_ID LEFT OUTER JOIN
dbo.ADRES AS ADRES_2 ON b.ADR_ID_LOSNR = ADRES_2.ADR_ID LEFT OUTER JOIN
dbo.ADRES AS ADRES_1 ON b.ADR_ID_LAADNR = ADRES_1.ADR_ID LEFT OUTER JOIN
dbo.PLAATS AS PLAATS_1 ON b.PLS_ID_LOSSEN = PLAATS_1.PLS_ID LEFT OUTER JOIN
dbo.PLAATS ON b.PLS_ID_LADEN = dbo.PLAATS.PLS_ID LEFT OUTER JOIN
dbo.ADRES ON b.ADR_ID_KLANT = dbo.ADRES.ADR_ID LEFT OUTER JOIN
dbo.ADRSRT ON dbo.ADRES.ADR_ID = dbo.ADRSRT.ADR_ID AND dbo.ADRSRT.ADS_TYPE = 1
If any expert see's something that could make this "always" run fast no mather how many thousand records there are in Basisrel then please tell me how to improve this one.
Wkr,
Eddy
November 20, 2008 at 8:33 am
How many rows are returned by this...
[font="Courier New"]SELECT DISTINCT
b.BRL_ID, b.OFF_ID, b.MRL_ID, b.PAR_ID_FIRMA, b.PAR_ID_TYPE, b.PAR_ID_DISC, b.BRL_REGDATUM, b.BRL_BEGDATUM, b.BRL_ENDDATUM,
b.BRL_REMOVED, b.BRL_REMOVED_LISA, b.BRL_BASISNR, b.ADR_ID_KLANT, b.ARG_ID, b.ART_ID, b.ADR_ID_LAADNR, b.ADR_ID_LOSNR,
b.PAR_ID_EENHEID, b.PAR_ID_MUNT, b.PLS_ID_LADEN, b.PLS_ID_LOSSEN, b.BRL_FACTLIJST, b.BRL_OPM_FACTLIJST, b.BRL_OPM_FACTUUR,
b.BRL_OPM_NED_LAAPLAATS, b.BRL_OPM_FRA_LAADPLAATS, b.BRL_OPM_NED_LOSPLAATS, b.BRL_OPM_FRA_LOSPLAATS,
b.BRL_OPL_EIGENDOM
FROM dbo.BASISREL AS b
INNER JOIN dbo.DEELREL AS d ON b.BRL_ID = d.BRL_ID
WHERE d.DLR_REMOVED = 0 [/font]
...with and without the DISTINCT?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 20, 2008 at 9:27 am
in my testtable
i have 4 rows when run without the distinct and 3 rows if i use the distinct keyword
what is exactly what it should return cause i have put 2x DEELREL With the same BASISREL.BRL_ID in it.
But we need to create a selection view for our planning team and they do not need the DEELREL Data at that point, thats why i want only the "Distinct" BASISREL returned here
In our dataschema there is ALWAYS AT LEAST 1 DEELREL for each BASISREL,
Wkr,
Eddy
November 20, 2008 at 9:34 am
How many rows in DEELREL and BASISREL?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 20, 2008 at 9:48 am
Can you make this join 1 to 1 ? b.BRL_ID = d.BRL_ID
Where did you read that DISTINCT had a high cost associated with it?
November 20, 2008 at 10:27 am
Sorts are expensive operations and distinct requires a sort, as do most other methods of removing duplicates from a resuuklt set.
The best 'workaround' is to find out why there are duplicated (bad data, bad database design, bad query) and fix that. If you don't need distinct, don't use it.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 20, 2008 at 10:40 am
Makes sense. That said, I've read that in certain circumstances, such as IN clauses, it's usually beneficial to add DISTINCT, even if you shouldn't really need it. Not sure if it's accurate as I've never done any testing of it. It should help if it eliminates duplicates in an IN clause though.
November 20, 2008 at 10:46 am
Garadin (11/20/2008)
Makes sense. That said, I've read that in certain circumstances, such as IN clauses, it's usually beneficial to add DISTINCT, even if you shouldn't really need it. Not sure if it's accurate as I've never done any testing of it. It should help if it eliminates duplicates in an IN clause though.
It's not only not beneficial, it can be harmful.
Duplicates in an IN are ignored. All that's important to the IN is what values exist, not how many times the value repeats. The optimiser typically uses a semi-join on an in with a subquery where only the presence of the row is important, the join isn't actually done.
Most of the time the optimiser ignores distinct in IN, because it's irrelevant. Sometimes it does run distinct and it's a waste of resources.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 20, 2008 at 10:50 am
GilaMonster (11/20/2008)
It's not only not beneficial, it can be harmful.Duplicates in an IN are ignored. All that's important to the IN is what values exist, not how many times the value repeats. The optimiser typically uses a semi-join on an in with a subquery where only the presence of the row is important, the join isn't actually done.
Most of the time the optimiser ignores distinct in IN, because it's irrelevant. Sometimes it does run distinct and it's a waste of resources.
Good to know. Guess I'll stop doing that :Wow: Stupid multi-database query optimization book.
November 20, 2008 at 2:03 pm
Test, test, test....
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 20, 2008 at 2:41 pm
Indeed. The problem at the moment is that I don't really understand how to read execution plans / see the specifics of which might be better for what reason. I mean, I can of course do the "Well... this one runs in 1/3 the time... it's better!", but the specifics are still much a mystery. :ermm: Definitely need to do some reading on that.
Thanks for the tips Gail.
November 20, 2008 at 3:46 pm
Distinct in my eyes is only good for analysing why/if there are duplicate records in your record set.
just like suquerys are only good for refactoring!
Carlton..
November 20, 2008 at 4:21 pm
Garadin (11/20/2008)
Indeed. The problem at the moment is that I don't really understand how to read execution plans / see the specifics of which might be better for what reason.
Take a look at Grant Fritchey's book. I think Redgate's still giving it away. If mot, mail him. Also take a look at my blog
http://sqlinthewild.co.za/index.php/category/sql-server/execution-plans/
If there's anything you want to see that's not there, let me know.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply