February 4, 2008 at 2:49 pm
Ranking the whole result set is probably your only option.
One last idea, have you tried:
select top 5 ...
from ...
order by ranking desc
union
select ...
from ...
where product = @param
So the first query just gets the top N, and the Union gets the one you want?
I'm not sure that would solve anything, but have you tried it?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 4, 2008 at 2:55 pm
You're missing the point I'm trying to make: the two are related. By not giving it something to work with that it "likes" (i.e. appropriate indexing, etc...), and by requiring the RANK() function over the 1.2M to just to get the rankings, it is going to force a scan of all 1.2M rows, just to find the distinct values and create said ranks.
Depending on how the view is set up - it's very possible that it by itself is forcing the entire view to be materialized just so a really small subset can be used, which is yet another spot to look at.
Try this on:
Use the Top(N) with TIES to get you the records you need, and THEN create the rankings on the results of the Top(N).
such as
declare @g datetime;
select @g=getdate();
select RANK() OVER (ORDER BY orderamount DESC) rnk,
*
from (
select top(10) WITH ties
*
from pivottable
Order by orderamount DESC
) pvt;
select datediff(ms,@g,getdate());
In this way - at least you're running the RANK() on a much smaller subset, versus creating a hash table that might contain 1.2Million values that then need to be ordered at run-time.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 4, 2008 at 3:34 pm
You're missing the point I'm trying to make: the two are related. By not giving it something to work with that it "likes" (i.e. appropriate indexing, etc...), and by requiring the RANK() function over the 1.2M to just to get the rankings, it is going to force a scan of all 1.2M rows, just to find the distinct values and create said ranks.
...snip...
Use the Top(N) with TIES to get you the records you need, and THEN create the rankings on the results of the Top(N).
...snip...
I understand your point, but you missed mine. The RANK() is already based on the dataset corresponding to the given criteria, not all records in the table(s). So if someone provides criteria as size=mid, engine=v6, qtr=3Q 07 they'll get a different rank for Toyota Camry over 3yr_lease_value than they would for engine=4cyl, qtr=3Q 07 ranked over 2yr_lease_value. TOP (99) WITH TIES will not include Toyota Camry if its rank is > 99. Also nearly all rankable metrics are summarizations or calculations that include business rules so the only way to sort/order/rank is to calculate the metric.
February 4, 2008 at 4:17 pm
I did catch that, and I'm not denying in any way that your metric is required to get to the top 10, 15, 20, etc... choices they might want. What I'm trying to impress upon you is that once you have calculated the personalized rating or metric as you call it, using the Rank() function against the TOP(N) with TIES subquery will return your results faster than the RANK() function against the whole set, since the top(N) is an ordering function, and not an aggregation-type function. And you'll still end up with the same results - just quite a bit faster.
In other words - if you replicated the query above , and replaced "orderamount" with the name of your calculated metric, by itself you should see a noticeable difference in efficiency in reads and execution time. RANK() is pretty much always going to go out an build a hash table of all of the values, so it's not going to be good for performance against a large set.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 4, 2008 at 4:58 pm
... using the Rank() function against the TOP(N) with TIES subquery will return your results faster than the RANK() function against the whole set, since the top(N) is an ordering function, and not an aggregation-type function. And you'll still end up with the same results - just quite a bit faster.
No it won't be any faster because RANK() can only be calculated on the entire candidate data set (what's identified by the where clause). If the where clause boils down to 2000 rows, it's going to calculate RANK() for 2000 items; if its 2M rows, all 2M will be ranked.
But maybe I'm wrong. So, here's a simple example.
create table #z ( model varchar(20), metric1 int, metric2 int )
insert into #z values ('honda accord', 1540, 200 )
insert into #z values ('honda civic', 1530, 250 )
insert into #z values ('chevy malibu', 1520, 26 )
insert into #z values ('toyota camry', 1250, 27 )
insert into #z values ('toyota avalon', 1350, 30 )
insert into #z values ('toyota rav4', 1450, 40 )
insert into #z values ('honda pilot', 150, 50 )
insert into #z values ('honda odyssey', 550, 55 )
insert into #z values ('chevy suburban', 750, 600 )
insert into #z values ('pontiac gto', 850, 100 )
insert into #z values ('toyota land cruiser', 950, 300 )
insert into #z values ('pontiac g6', 650, 155 )
insert into #z values ('ford fusion', 250, 255 )
insert into #z values ('ford edge', 440, 270 )
insert into #z values ('mazda 3', 430, 300 )
insert into #z values ('mazda 6', 180, 205 )
insert into #z values ('mazda rx7', 70, 200 )
insert into #z values ('kia spectra', 90, 205 )
insert into #z values ('hyundai tucson', 80, 25 )
insert into #z values ('nissan murano', 110, 50 )
insert into #z values ('nissan altima', 130, 55 )
Toyota Camry is #6 when ranked by metic1, but #19 when ranked by metric2. What's the query to produce this top 10 by metric1
honda accord15401
honda civic15302
chevy malibu15203
toyota rav414504
toyota avalon13505
toyota camry12506
toyota land cruiser9507
pontiac gto8508
chevy suburban7509
pontiac g665010
and this top 10 by metric 2 with Toyota Camry included in both?
chevy suburban6001
toyota land cruiser3002
mazda 33002
ford edge2704
ford fusion2555
honda civic2506
mazda 62057
kia spectra2057
mazda rx72009
honda accord2009
toyota camry2719
February 4, 2008 at 8:39 pm
ACtually - no. RANK just needs to be run over the top N and in everything from "camry" upwards. Everything else is worthless.
with that in mind:
declare @g datetime;
--create index ix_pivot4 on testpivot(orderamount) include(coid,prodid)
declare @camryID int;
declare @camryamount int;
select @camryID=5466326; --#50000 in the ranking
--select @camryID=5466326; --#2415667 in the ranking
select @g=getdate();
select @camryamount=orderamount from testpivot where rid=@camryID
select *
from (
select Top(20) RANK() over (order by orderamount desc) rnk,
*
from (
select top(20) with ties
coid,prodid,orderamount,rid
fromtestpivot
wherecoid between 1 and 5
and prodid between 1 and 40
order by orderamount desc
) pvt
order by orderamount desc
) topN
UNION
select * from
(select Top 1 *
from(
select RANK() over (order by orderamount desc) rnk,
coid,prodid,orderamount,rid
fromtestpivot
where(orderamount>@camryamount
and coid between 1 and 10
and prodid between 1 and 40)
or rid=@camryID
) camry
order by orderamount) c;
select datediff(ms,@g,getdate())
dbcc freeproccache
---and now - for the RANK solution
select @g=getdate();
;With RANKCTE as (
select RANK() over (order by orderamount desc) rnk,
coid,prodid,orderamount,rid
fromtestpivot
wherecoid between 1 and 10
and prodid between 1 and 40)
select * from RANKCTE
where rnk<=20
UNION
select * from RANKCTE where rid=@camryid
order by orderamount desc;
select datediff(ms,@g,getdate())
Results:
WITH the index in place
the TOP(n) syntax : from 190ms to 19453ms
the CTE: 21780 to 24570
WITHOUT any helper index in place
the TOP(n) syntax : from 14760ms to 37516ms
the CTE: 71733 to 78943
(variations seem to have to do with how far down on the list the "camry" is).
So - yes it's a little bit more to write, but you can outpace RANK() fairly dramatically if you should so desire.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 4, 2008 at 11:27 pm
This will give you the result you seek.
set nocount on
declare @z table ( model varchar(20), metric1 int, metric2 int )
insert into @z values ('honda accord', 1540, 200 )
insert into @z values ('honda civic', 1530, 250 )
insert into @z values ('chevy malibu', 1520, 26 )
insert into @z values ('toyota camry', 1250, 27 )
insert into @z values ('toyota avalon', 1350, 30 )
insert into @z values ('toyota rav4', 1450, 40 )
insert into @z values ('honda pilot', 150, 50 )
insert into @z values ('honda odyssey', 550, 55 )
insert into @z values ('chevy suburban', 750, 600 )
insert into @z values ('pontiac gto', 850, 100 )
insert into @z values ('toyota land cruiser', 950, 300 )
insert into @z values ('pontiac g6', 650, 155 )
insert into @z values ('ford fusion', 250, 255 )
insert into @z values ('ford edge', 440, 270 )
insert into @z values ('mazda 3', 430, 300 )
insert into @z values ('mazda 6', 180, 205 )
insert into @z values ('mazda rx7', 70, 200 )
insert into @z values ('kia spectra', 90, 205 )
insert into @z values ('hyundai tucson', 80, 25 )
insert into @z values ('nissan murano', 110, 50 )
insert into @z values ('nissan altima', 130, 55 )
;WITH MyCTE1 (Model, Row, Ranking, Metric1, Metric2)
AS (
SELECTModel,
ROW_NUMBER() OVER (ORDER BY Metric1 DESC),
RANK() OVER (ORDER BY Metric1 DESC),
Metric1,
Metric2
FROM@z
)
-- Show the expected result
SELECTTOP 10 WITH TIES
Ranking,
Model,
Metric1,
Metric2
FROMMyCTE1
WHERERow <= 10
OR Model = 'Toyota Camry'
ORDER BYCASE WHEN Model = 'Toyota Camry' AND Row > 10 THEN 10 ELSE Row END
;WITH MyCTE2 (Model, Row, Ranking, Metric1, Metric2)
AS (
SELECTModel,
ROW_NUMBER() OVER (ORDER BY Metric2 DESC),
RANK() OVER (ORDER BY Metric2 DESC),
Metric1,
Metric2
FROM@z
)
-- Show the expected result
SELECTTOP 10 WITH TIES
Ranking,
Model,
Metric1,
Metric2
FROMMyCTE2
WHERERow <= 10
OR Model = 'Toyota Camry'
ORDER BYCASE WHEN Model = 'Toyota Camry' AND Row > 10 THEN 10 ELSE Row END
Could you please timetest it and post back times here?
N 56°04'39.16"
E 12°55'05.25"
February 5, 2008 at 6:39 am
peso: that's the CTE version that was discussed earlier. it performs the same as the temp table version if the expression is only referred to once.
as i said before, the bulk of the time is calculating the rank. getting the top N results is a breeze (only 70ms to get them out of a temp table). i don't think there is a way to speed up a rank()ed query.
February 5, 2008 at 6:55 am
Notice that my latest calls your Temp table solution, the CTE solution.
You really should give some of this a whirl. Running RANK() to the whole set just to get 10 or 11 records back is ultimately a waste. It's the same as running a count to check for existence - you're using a bazooka to swat a fly.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 5, 2008 at 7:26 am
Here is a fast algorithm. Please notice that NO INDEXES AT ALL is present.
SET NOCOUNT ON
-- Create 100000 sample records
CREATE TABLE#Sample
(
Model CHAR(20) NOT NULL,
Metric1 INT,
Metric2 INT
)
INSERT#Sample
(
Model,
Metric1,
Metric2
)
SELECTLEFT(REPLACE(CAST(NEWID() AS CHAR(36)), '-', ''), 20),
ABS(CHECKSUM(NEWID())) % 600000,
ABS(CHECKSUM(NEWID())) % 600000
FROMmaster..spt_values AS v1
INNER JOINmaster..spt_values AS v2 ON v2.Type = 'p'
WHEREv1.Type = 'p'
AND v1.Number < 100
AND v1.Number < 1000
-- Mimic user supplied parameter
DECLARE@Model VARCHAR(20)
SELECT TOP 1@Model = Model
FROM#Sample
ORDER BYNEWID()
-- Timetest staging table
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
DECLARE @Time DATETIME
SET @Time = GETDATE()
-- Stage query result
DECLARE@Stage TABLE (Model VARCHAR(20), Metric INT, Ranking INT)
INSERT@Stage
(
Model,
Metric
)
SELECT TOP 15Model,
Metric1
FROM#Sample
ORDER BYMetric1 DESC
UPDATEs
SETs.Ranking = 1 + (SELECT COUNT(*) FROM @Stage AS x WHERE x.Metric > s.Metric)
FROM@Stage AS s
IF NOT EXISTS (SELECT * FROM @Stage WHERE Model = @Model)
BEGIN
INSERT@Stage
(
Model,
Metric
)
SELECTModel,
Metric1
FROM#Sample
WHEREModel = @Model
UPDATEs
SETs.Ranking = 1 + (SELECT COUNT(*) FROM #Sample AS x WHERE x.Metric1 > s.Metric)
FROM@Stage AS s
WHEREs.Model = @Model
END
SELECT*
FROM@Stage
-- Average at 80 ms
SELECT DATEDIFF(MILLISECOND, @Time, GETDATE())
-- Timetest CTE
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
SET @Time = GETDATE()
;WITH MyCTE (Model, Row, Ranking, Metric1)
AS (
SELECTModel,
ROW_NUMBER() OVER (ORDER BY Metric1 DESC),
RANK() OVER (ORDER BY Metric2 DESC),
Metric1
FROM#Sample
)
SELECTTOP 15 WITH TIES
Ranking,
Model,
Metric1
FROMMyCTE
WHERERow <= 15
OR Model = @Model
ORDER BYCASE WHEN Model = @Model AND Row > 15 THEN 15 ELSE Row END
-- Average at 983 ms
SELECT DATEDIFF(MILLISECOND, @Time, GETDATE())
N 56°04'39.16"
E 12°55'05.25"
February 5, 2008 at 7:40 am
Statistics IO for
"Staging algorithm"
Table '#2180FB33'. Scan count 0, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Sample_____________________________________________________________________________________________________________00000000001C'. Scan count 1, logical reads 940, physical reads 0, read-ahead reads 30, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#2180FB33'. Scan count 16, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#2180FB33'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#2180FB33'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
"CTE algorithm"
Table '#Sample_____________________________________________________________________________________________________________00000000001C'. Scan count 3, logical reads 940, physical reads 0, read-ahead reads 30, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
N 56°04'39.16"
E 12°55'05.25"
February 5, 2008 at 8:00 am
Matt Miller (2/5/2008)
Notice that my latest calls your Temp table solution, the CTE solution.You really should give some of this a whirl. Running RANK() to the whole set just to get 10 or 11 records back is ultimately a waste. It's the same as running a count to check for existence - you're using a bazooka to swat a fly.
Hey guys, thanks for all your suggestions. I was hoping we'd find some syntax or technique that could produce a rank() and allow the query engine to skip/ignore some of the candidate rows, but it looks like that's not possible.
Ranking is extremely valuable for certain uses. Raw figures aren't that useful unless they're put into context with the competition. For example, if sales dropped 15% last month, the first thing a sales manager might do is run a ranking report to figure out where they are relative to everyone else. So if their rank hasn't changed, there's reason to panic (since everyone else's sales dropped also). Top N Plus will let them see the top performers and themselves even if they aren't in the top cut and avoid having to run two seperate reports. This is valuable since once you go past the top 5 or 10, everyone is bunched up very closely.
Ranking also let's them put a spin on numbers and find out where their strengths are. So they've also asked for reports that will locate the categories in which they're ranked N or higher (we're #3 in the southeast and #2 in Michigan) and alerts that will notify them if their category rank changes.
February 5, 2008 at 8:15 am
antonio.collins (2/5/2008)
Hey guys, thanks for all your suggestions. I was hoping we'd find some syntax or technique that could produce a rank() and allow the query engine to skip/ignore some of the candidate rows, but it looks like that's not possible.
Huh?
What about the algorithm posted by me 34 minutes earlier?
http://www.sqlservercentral.com/Forums/FindPost451626.aspx
N 56°04'39.16"
E 12°55'05.25"
February 5, 2008 at 8:29 am
peso: you're trying to speed up the selection of items from the cte/temp table and that's not the issue. even in my original temp table approach, it took 150s to load the temp table and only 70ms to get the proper data out of it.
February 5, 2008 at 8:38 am
Well, in my temp table algorithm above it takes an average of 80 milliseconds to fill the temptable and get the rankings calculated...
Are you sure we are talking about the same thing?
In my previous post, the population of #Sample table is just to mimic your environment, to get the 100000 records to work with.
Have you checked the code below that?
From the "Timetest staging" line to before "Timetest CTE" ?
CREATE PROCEDURE dbo.uspGetMyModels
(
@Top TINYINT,
@Model VARCHAR(20)
)
AS
SET NOCOUNT ON
-- Stage query result
DECLARE @Stage TABLE
(
Model VARCHAR(20),
Metric INT,
Ranking INT
)
-- Populate stagin table
INSERT@Stage
(
Model,
Metric
)
SELECTTOP (@Top)
Model,
Metric1
FROM{YourTableNameHere}
ORDER BYMetric1 DESC
-- Update ranking for top n records
UPDATEs
SETs.Ranking = 1 + (SELECT COUNT(*) FROM @Stage AS x WHERE x.Metric > s.Metric)
FROM@Stage AS s
-- If user supplied model is not among the top n records, add it
IF NOT EXISTS (SELECT * FROM @Stage WHERE Model = @Model)
BEGIN
INSERT@Stage
(
Model,
Metric
)
SELECTModel,
Metric1
FROM{YourTableNameHere}
WHEREModel = @Model
UPDATEs
SETs.Ranking = 1 + (SELECT COUNT(*) FROM {YourTableNameHere} AS x WHERE x.Metric1 > s.Metric)
FROM@Stage AS s
WHEREs.Model = @Model
END
-- Show the expected result
SELECTModel,
Metric
FROM@Stage
N 56°04'39.16"
E 12°55'05.25"
Viewing 15 posts - 16 through 30 (of 41 total)
You must be logged in to reply to this topic. Login to reply