December 3, 2008 at 8:30 am
When constructing WHERE clauses, I know that non-sargable clauses should be avoided. Is this the case for GROUP BY clauses as well?
For example, should the following existing query below be rewritten due to the GROUP BY clause? And if so, how should I go about accomplishing this? Do I union (tbl_pa_products JOIN tbl_pa_brands AS b) and (tbl_pa_productss JOIN tbl_pa_brands as realb)?
SELECT CAST(ISNULL(realb.pk_brand_id, b.pk_brand_id) AS VARCHAR) + '.b' AS pk_id,
ISNULL(realb.brand_name, b.brand_name) AS [name],
( SELECT pk_detail_settings_id
FROM tbl_pa_detail_settings
WHERE fk_brand_id = ISNULL(realb.pk_brand_id, b.pk_brand_id)
) AS default_detail_settings_id
FROM tbl_pa_products AS p
JOIN tbl_pa_brands AS b ON b.pk_brand_id = p.fk_brand_id
LEFT JOIN tbl_pa_brands AS realb ON realb.pk_brand_id = b.fk_brand_map_id
GROUP BY ISNULL(realb.pk_brand_id, b.pk_brand_id),
ISNULL(realb.brand_name, b.brand_name)
ORDER BY COUNT(DISTINCT p.pk_product_id) DESC,
ISNULL(realb.brand_name, b.brand_name)
Here are the table structures:
CREATE TABLE [dbo].[tbl_pa_products](
[pk_product_id] [int] IDENTITY(1,1) NOT NULL,
[parser_batch_id] [int] NOT NULL,
[fk_retailer_id] [int] NOT NULL,
[store_product_id] [varchar](800) NOT NULL,
[fk_brand_id] [int] NOT NULL,
[product_name] [varchar](800) NOT NULL,
[product_link] [varchar](5000) NOT NULL,
[product_post_data] [varchar](8000) NULL
)
CREATE TABLE [dbo].[tbl_pa_brands](
[pk_brand_id] [int] NOT NULL,
[fk_brand_map_id] [int] NULL,
[fk_brand_parent_id] [int] NULL,
[fk_discovery_record_id] [int] NOT NULL,
[brand_name] [varchar](250) NOT NULL,
[date_added] [datetime] NOT NULL,
[in_use] [bit] NOT NULL,
[mapping_user_id] [int] NULL,
[date_mapped] [datetime] NULL,
CONSTRAINT [PK_tbl_pa_brands] PRIMARY KEY NONCLUSTERED
(
[pk_brand_id] ASC
)
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[tbl_pa_brands] WITH CHECK ADD CONSTRAINT [FK_tbl_pa_brands_tbl_pa_brands] FOREIGN KEY([fk_brand_map_id])
REFERENCES [dbo].[tbl_pa_brands] ([pk_brand_id])
GO
ALTER TABLE [dbo].[tbl_pa_brands] CHECK CONSTRAINT [FK_tbl_pa_brands_tbl_pa_brands]
GO
ALTER TABLE [dbo].[tbl_pa_brands] WITH CHECK ADD CONSTRAINT [FK_tbl_pa_brands_tbl_pa_brands1] FOREIGN KEY([fk_brand_parent_id])
REFERENCES [dbo].[tbl_pa_brands] ([pk_brand_id])
GO
ALTER TABLE [dbo].[tbl_pa_brands] CHECK CONSTRAINT [FK_tbl_pa_brands_tbl_pa_brands1]
December 3, 2008 at 9:31 am
It won't matter as much in a Group By as it does in a Where, since the Where clause problem is table/index scans instead of seeks, and that doesn't apply to Group By.
Why not test both queries, look at the their execution plans, run them with stats (time and IO) turned on, and then you'll have an exact answer?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
December 3, 2008 at 9:40 am
What is less clear is SARG-ability in HAVING clauses. That is, to what extent can the having clause conditions carry through to selectivity before the GROUP BY, possibly through an index, of the HAVING conditions.
I don't know the answer but it is an interesting question.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 3, 2008 at 10:07 am
GSquared (12/3/2008)
It won't matter as much in a Group By as it does in a Where, since the Where clause problem is table/index scans instead of seeks, and that doesn't apply to Group By.Why not test both queries, look at the their execution plans, run them with stats (time and IO) turned on, and then you'll have an exact answer?
From my own observations on this - while it won't make as big a difference, having the indexes set up to handle the WHERE first, then the group by stuff next certainly does make a difference. Of course - it may take some really big data sets to notice it.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 4, 2008 at 11:53 am
Matt Miller (12/3/2008)
GSquared (12/3/2008)
It won't matter as much in a Group By as it does in a Where, since the Where clause problem is table/index scans instead of seeks, and that doesn't apply to Group By.Why not test both queries, look at the their execution plans, run them with stats (time and IO) turned on, and then you'll have an exact answer?
From my own observations on this - while it won't make as big a difference, having the indexes set up to handle the WHERE first, then the group by stuff next certainly does make a difference. Of course - it may take some really big data sets to notice it.
I did a little testing, and it can have a difference if the Group By columns are part of an index that can be used for range scans. For example, if you group by OrderID and sum(Price), and have an index on those two columns, it will be faster than if you group by isnull(OrderID, JobID) and sum(isnull(Price, Cost)), or some such, if they aren't in the same index. Since these would, in most cases, be in different tables, unless you have an indexed view, they generally won't be in the same index, and the query will suffer.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
December 4, 2008 at 4:45 pm
GSquared (12/4/2008)
Matt Miller (12/3/2008)
GSquared (12/3/2008)
It won't matter as much in a Group By as it does in a Where, since the Where clause problem is table/index scans instead of seeks, and that doesn't apply to Group By.Why not test both queries, look at the their execution plans, run them with stats (time and IO) turned on, and then you'll have an exact answer?
From my own observations on this - while it won't make as big a difference, having the indexes set up to handle the WHERE first, then the group by stuff next certainly does make a difference. Of course - it may take some really big data sets to notice it.
I did a little testing, and it can have a difference if the Group By columns are part of an index that can be used for range scans. For example, if you group by OrderID and sum(Price), and have an index on those two columns, it will be faster than if you group by isnull(OrderID, JobID) and sum(isnull(Price, Cost)), or some such, if they aren't in the same index. Since these would, in most cases, be in different tables, unless you have an indexed view, they generally won't be in the same index, and the query will suffer.
Thanks for the info. It's very much appreciated!
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply