July 10, 2015 at 7:05 am
Hi All,
I've been trying to make the following query more performant by breaking it up into smaller pieces.
SELECT MT.A3+MT.A4 AS A34,MT.A3
-- ,M.*
FROM
Master_TAB M
JOIN (SELECT M.A1,t3.A3,t3.A4,M.A6,M.A2,ROW_NUMBER() OVER (PARTITION BY A1,A6,A3,A4 ORDER BY A5 DESC) AS rownum
FROM Master_TAB M
JOIN TABle2 t2 ON M.A2=t2.A2
JOIN Table3 t3 ON t2.A2=t3.A2
) MT ON M.A1=MT.A1 AND M.A2=MT.A2 AND MT.rownum=1
ORDER BY MT.A3+MT.A4,MT.A1;
I know that the Spill is caused by the Sort but I can't remove the sort (sort can't be done in front end).
my master table had 1.7 million rows and almost 200 columns (bad design? I know but can't be changed as there's too much that would be affected)
every row is little over 1KB
here's my attempt...
-- MASTER_TAB has 1.7 million rows and 50 columns
CREATE TABLE [dbo].[tmp_ABC](
[A1] [varchar](13) NOT NULL,
[A2] [varchar](5) NOT NULL,
[A3] [varchar](4) NOT NULL,
[A4] [varchar](4) NOT NULL,
[A5] [int] NULL
) ON [PRIMARY]
-- inserted data into tmp_ABC from master Table MASTER_TAB (PK= A1,A2,A3)
create index IDX_tmp_ABC on [tmp_all_price_Target] (A1, A3, A4, A5) INCLUDE (A2)
CREATE TABLE [dbo].[tmp_DEF](
[A1] [varchar](13) NOT NULL,
[A2] [varchar](5) NOT NULL,
[A3] [varchar](4) NOT NULL,
[A4] [varchar](4) NOT NULL,
[A5] [int] NULL
) ON [PRIMARY]
insert into tmp_DEF
Select p.A1, t.A2, p.A3, p.A4,p.A5
from [tmp_ABC] t
join (SELECT A1, A3, A4, max(A5) as A5
FROM tmp_ABC
group by A1, A3, A4) pon t.A1 = p.A1 and t.A3 = p.A3 and t.A4=p.A4 and t.A5 = p.A5
create index IDX_tmp_DEF on [tmp_all_UPC_Sto_Loc_for_pri] (A1,A2, A3, A4, A5)
create index IDX_tmp_DEF_sort on [tmp_all_UPC_Sto_Loc_for_pri] (A3,A4) INCLUDE (A1,A2, A5)
-- this is the Query that is causing the Spill (in reality I'm supposed to bring back all 200 columns fro the master table but for debug purposes I limited the columns)
Select c.A3+c.A4 as A34, c.A3, c.A1
-- M.*
from tmp_DEF c
join MASTER_TAB M on M.A1 = c.A1 and M.A2 = c.A2
order by c.A3, C.A4
if I just run the following I get no spill:
Select c.A3+c.A4 as A34, c.A3, c.A1
from tmp_DEF c
order by c.A3, C.A4
as soon as I add the Master table as a Join I get the Spill...
I read many articles, tried many suggested things (creating indexes... clustered, non-clustered) without success.
Maybe I'm totally in Left Field and should enhance the performance going another route?
I've been breaking my head for the past 3 days and can't seem to figure it out... I had to turn for help here...
Thank you in advanced
JG
July 10, 2015 at 8:32 am
Just curious as to whether you might be allowed to add the ROW_NUMBER() column to the table as a computed, persisted column... Might simplify things quite a bit, although you might pay for it on INSERTs. Just a thought...
Steve (aka sgmunson) π π π
Rent Servers for Income (picks and shovels strategy)
July 10, 2015 at 9:01 am
Hi sgmunson,
I can't touch the Master Table...
but a computed persistent column is virtual and calculated so it wouldn't affect the inserts or updates which might be a good idea...
but how will it affect the performance?
row numbers can change with every insert... not sure how this will work...
maybe explain more what and how you see it...
I appreciate this very much.
thanks
JG
July 10, 2015 at 9:19 am
I think you can optimise this by using a CROSS APPLY and TOP(1). Doing it with this method you should be able to remove the master table from the subquery and join to the outermost Master_TAB.
SELECT MT.A3+MT.A4 AS A34,
MT.A3
-- ,M.*
FROM Master_TAB M
CROSS APPLY (SELECT TOP(1)
t3.A3,
t3.A4
FROM TABle2 t2
INNER JOIN Table3 t3
ON t2.A2=t3.A2
WHERE t2.A2 = M.A2
ORDER BY A5 DESC
) AS MT
ORDER BY MT.A3 + MT.A4, M.A1;
July 10, 2015 at 9:34 am
-- The outer query makes little sense, it only adds duplicate rows to a set which you're eliminating rows from
SELECT MT.A3+MT.A4 AS A34, MT.A3
-- ,M.*
FROM Master_TAB M
JOIN (
SELECT M.A1, t3.A3, t3.A4, M.A6, M.A2, ROW_NUMBER() OVER (PARTITION BY A1, A6, A3, A4 ORDER BY A5 DESC) AS rownum
FROM Master_TAB M
JOIN TABle2 t2
ON M.A2 = t2.A2
JOIN Table3 t3
ON t2.A2 = t3.A2
) MT
ON M.A1=MT.A1 AND M.A2=MT.A2 AND MT.rownum=1
ORDER BY MT.A3+MT.A4,MT.A1;
-- Try variations on this
SELECT MT.A3 + MT.A4 AS A34, MT.A3
-- ,M.*
FROM (
SELECT
--M.A1,
t3.A3,
t3.A4,
--M.A6, M.A2,
ROW_NUMBER() OVER (PARTITION BY A1, A6, A3, A4 ORDER BY A5 DESC) AS rownum
FROM Master_TAB M
JOIN TABle2 t2
ON M.A2=t2.A2
JOIN Table3 t3
ON t2.A2=t3.A2
) MT
WHERE MT.rownum=1
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 10, 2015 at 9:46 am
normally it would work, but the t2 table is a table that contains locations and groups of locations and the master table is the item master which contains pricing for groups of locations and single locations... depending on the priority which is A5 the price goes to a group or a location... each location can only get one item price.
the t3 table is the links between the locations and groups of locations...
the priority is set at the location and group level...
if Location X is part of 5 groups + itself and an item is only defined for 3 groups, the query should return the highest group of the 3 groups (not 6)
I hope this clarifies things...
July 10, 2015 at 9:55 am
I should add that the final result has to be the item, the location and the price
example if Loc1, Loc2, and Loc3 are part of Group1 and the item price is for Group1,
although the master table contains
Item X, Group1
the final query should return:
item X, Loc1, priceY
item X, Loc2, priceY
item X, Loc3, priceY
and the sort is to sort by location.
July 10, 2015 at 10:14 am
Couple of things. First, it would help if you could post the DDL for the table(s) involved, some sample data for the table(s) that is representative of the problem domain (again, sample data not production data), and the expected results of the query.
All of this should be readily consumable (in other words all we need to do to is cut/paste/execute in SSMS).
Regarding the DDL for the table(s), you only need to provide the columns needed for the query, not all 200 or how ever many there are if only 50 to 10 are actually used. Also need the indexes on the tables, and of course this may mean adding additional columns to the DDL for the table(s).
Second, how often is this query run and is there any filtering done to reduce the data returned?
July 10, 2015 at 10:25 am
I will try to work on sample data and DDL info.... and post it tonight.
thanks
JG
July 10, 2015 at 11:37 am
Here it is... This script creates and populates the necessary tables to optimize the final query.
The final Query spills data to Tempdb which I'm trying to avoid.
CREATE TABLE [dbo].[Master_TAB](
[UPC] [varchar](13) NOT NULL,
[LocGroup] [varchar](5) NOT NULL,
[priceLvl] [int] NOT NULL,
[Price] [money] NULL,
CONSTRAINT [Master_TAB_P01] PRIMARY KEY CLUSTERED
([UPC] ASC,
[LocGroup] ASC,
[PriceLvl] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [MASTER_TAB_LocGroup] ON [dbo].[Master_TAB]
([LocGroup] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
CREATE NONCLUSTERED INDEX [MASTER_TAB_PriceLvl] ON [dbo].[Master_TAB]
([PriceLvl] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
insert into [Master_TAB] values ('ABC','ALL',1,10.25)
insert into [Master_TAB] values ('ABC','00002',1,10.55)
insert into [Master_TAB] values ('ABC','Grp1',1,10.75)
insert into [Master_TAB] values ('DEF','ALL',1,10.25)
insert into [Master_TAB] values ('DEF','00001',1,10.55)
insert into [Master_TAB] values ('DEF','Grp2',1,10.75)
-- insert 10000 skus
Declare @Cnt int
Set @Cnt = 0
while @cnt <10000
begin
insert into [Master_TAB]
Select 'ABC' + convert(varchar(10),@cnt), 'ALL',1,10.25
set @cnt = @cnt + 1
end
CREATE TABLE [dbo].[GroupLoc](
[GroupLoc] [varchar](5) NOT NULL,
[Priority] [int] NULL,
CONSTRAINT [GroupLoc_GroupLoc] PRIMARY KEY CLUSTERED
([GroupLoc] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
insert into GroupLoc values ('00001', 10)
insert into GroupLoc values ('00002', 10)
insert into GroupLoc values ('00003', 10)
insert into GroupLoc values ('00004', 10)
insert into GroupLoc values ('All', 5)
insert into GroupLoc values ('grp1', 15)
insert into GroupLoc values ('grp2', 11)
insert into GroupLoc values ('grp3', 13)
CREATE TABLE [dbo].GroupLocLink(
[GroupLoc] [varchar](5) NOT NULL,
[Loc] [varchar](4) NOT NULL,
[Terminal] [varchar](4) NOT NULL,
CONSTRAINT [GroupLocLink_GroupLoc] PRIMARY KEY CLUSTERED
([GroupLoc] ASC,
[Loc] ASC,
[Terminal] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [GroupLocLink_Loc] ON [dbo].GroupLocLink
([Loc] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
CREATE NONCLUSTERED INDEX [GroupLocLink_Term] ON [dbo].GroupLocLink
([Terminal] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
insert into GroupLocLink values ('00001','0001','0001')
insert into GroupLocLink values ('00001','0001','0002')
insert into GroupLocLink values ('00002','0002','0001')
insert into GroupLocLink values ('00003','0003','0001')
insert into GroupLocLink values ('00004','0004','0001')
insert into GroupLocLink values ('grp1','0001','0001')
insert into GroupLocLink values ('grp1','0001','0002')
insert into GroupLocLink values ('grp1','0003','0001')
insert into GroupLocLink values ('grp2','0002','0001')
insert into GroupLocLink values ('grp2','0004','0001')
insert into GroupLocLink values ('grp3','0002','0001')
insert into GroupLocLink values ('grp3','0003','0001')
insert into GroupLocLink values ('grp3','0004','0001')
insert into GroupLocLink values ('All','0001','0001')
insert into GroupLocLink values ('All','0001','0002')
insert into GroupLocLink values ('All','0002','0001')
insert into GroupLocLink values ('All','0003','0001')
insert into GroupLocLink values ('All','0004','0001')
SELECT MT.loc+MT.terminal AS LocTerm,MT.Loc ,[priority], M.*
FROM
Master_TAB M JOIN
(SELECT M.UPC,L.Loc,L.terminal,M.PriceLvl,L.GroupLoc, [priority], ROW_NUMBER() OVER (PARTITION BY UPC,PriceLvl,L.LOC,terminal ORDER BY [Priority] DESC) AS rownum
FROM Master_TAB M
JOIN GroupLoc G ON M.LocGroup=G.GroupLoc
JOIN GroupLoclink L ON G.GroupLoc = L.GroupLoc
) MT ON M.UPC=MT.UPC AND M.PriceLvl=MT.PriceLvl AND M.LocGroup=MT.GroupLoc AND MT.rownum=1
ORDER BY MT.loc,MT.terminal,M.UPC;
/* CLean Up
drop table [Master_TAB]
drop table [GroupLoc]
drop table GroupLocLink
*/
July 10, 2015 at 12:05 pm
sgmunson (7/10/2015)
Just curious as to whether you might be allowed to add the ROW_NUMBER() column to the table as a computed, persisted column... Might simplify things quite a bit, although you might pay for it on INSERTs. Just a thought...
I don't believe that windowed functions can be used in computed columns, although I can't find anything specifically stating that.
When I tried to create one, I got an error stating that windowed functions can only be used in a SELECT or an ORDER BY clause.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
July 10, 2015 at 1:23 pm
drew.allen (7/10/2015)
sgmunson (7/10/2015)
Just curious as to whether you might be allowed to add the ROW_NUMBER() column to the table as a computed, persisted column... Might simplify things quite a bit, although you might pay for it on INSERTs. Just a thought...I don't believe that windowed functions can be used in computed columns, although I can't find anything specifically stating that.
When I tried to create one, I got an error stating that windowed functions can only be used in a SELECT or an ORDER BY clause.
Drew
Can your column = (SELECT ROW_NUMBER() OVER(PARTITION BY somefield ORDER BY someotherfield) FROM thistablename) ?
Steve (aka sgmunson) π π π
Rent Servers for Income (picks and shovels strategy)
July 10, 2015 at 1:30 pm
If my memory is correct, it will complain about not being able to use subqueries in that context, and requiring a scalar expression. It makes sense, since a query like that could (and likely would) return many rows.
It is a bit strange that it isn't just like other places where subqueries can be used with expressions that expect one value, and fail only if they return more than 1 value. Probably a choice they made for simplicity's sake.
Cheers!
July 13, 2015 at 5:44 am
Firstly, ROW_NUMBER() doesn't necessarily invoke a sort. Compare the execution plan of these two queries:
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY [UPC], [LocGroup] ORDER BY PriceLvl) FROM #Master_TAB
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY [LocGroup], [UPC] ORDER BY PriceLvl) FROM #Master_TAB
In your case the parameters passed to ROW_NUMBER are not sufficiently aligned with an existing
index sort order for SQL Server to avoid a sort.
Secondly, the sort order of the ROW_NUMBER is different to the sort order of the outer query. Even
if the sort order of ROW_NUMBER() was aligned with an index to make a sort operation unnecessary,
you would still have the sort on the outer query. This outer query sort cost is reported in the plan to
be about 2%. If you hardcode rownum as 1, the sort to support ROW_NUMBER is eliminated from the plan
but the cost of the sort on the outer query rises to about 94%, so the gain in performance is way less
than you would expect.
Thirdly, a spill occurs when the actual number of rows being sorted exceeds the estimate. SQL Server
relies on statistics to derive an excution plan. Try updating statistics on the tables used by the query
to see of the plan changes. You could also consider alternative shapes for the query which might assist
the optimiser choose a better plan. Here's an example:
SELECT *
FROM (
SELECT
l.loc + l.terminal AS LocTerm,
l.Loc,
l.[priority],
l.terminal,
m.*,
rownum = ROW_NUMBER() OVER (PARTITION BY l.LOC, l.terminal, m.UPC, m.PriceLvl ORDER BY l.[Priority] DESC)
FROM #Master_TAB M
CROSS APPLY (
SELECT l.loc, l.terminal, g.[priority]
FROM #GroupLoc G
INNER JOIN #GroupLoclink L
ON M.LocGroup = L.GroupLoc
WHERE M.LocGroup = G.GroupLoc
) l
) d
WHERE rownum = 1
ORDER BY d.loc, d.terminal, d.UPC
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 13, 2015 at 6:42 am
Thanks for your response...
1. the execution plans of the Row_number for:
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY [UPC], [LocGroup] ORDER BY PriceLvl) FROM #Master_TAB
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY [LocGroup], [UPC] ORDER BY PriceLvl) FROM #Master_TAB
Tells me that the sort is invoked in the second query due to the fact that there's no index that covers [LocGroup], [UPC]...
The first of the 2 queries doesn't invoke a sort due to the fact that [UPC], [LocGroup] is covered by the PK .
2. correct me if I'm wrong or if I misunderstood, Although the execution plans of the queries are relative, For a query cost, 2 sort spills will have 2 lower costs overall, compared to 1 spill that will have 1 higher %.
conclusion is that sometimes you can't avoid the spills but you need to manage them. right?
3. RE:"Thirdly, a spill occurs when the actual number of rows being sorted exceeds the estimate."
That's one case... In my case the statistics are up to date and the estimated rows match the actual rows.
Based on the articles I read and this thread, We have to try to avoid Spills but sometimes it's just not feasible and you need to minimize the number of spills...
Is this a good conclusion?
thank you
JG
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply