November 26, 2007 at 1:06 pm
I had wondered about that too. Bet Itzik has something up his immense sleeve to super-charge this particular puzzle. 🙂
Hey, what the heck are you doing playing on the forum when you are on vacation?? WHAT A GEEK!! A man after my own heart! LOL
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 26, 2007 at 1:12 pm
TheSQLGuru (11/26/2007)
I had wondered about that too. Bet Itzik has something up his immense sleeve to super-charge this particular puzzle. 🙂Hey, what the heck are you doing playing on the forum when you are on vacation?? WHAT A GEEK!! A man after my own heart! LOL
BWAHAHAHAHAHAHAHAHA (I know the feeling....)
sounds to me like Jeff's gone and hacked SQL Server directly onto his cell phone..... It's just that the keyboard is so %$#$@@#$%^ small....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
November 26, 2007 at 3:45 pm
Heh... yeah, I know... "Bus-man's holiday" 😛 The really wide code is pretty tough on the display...
Yeah... I'm a "geek"... love this stuff. Some folks do counted cross-stich for fun... I do countless cross-joins, instead :w00t:
--Jeff Moden
Change is inevitable... Change for the better is not.
November 26, 2007 at 9:44 pm
I had wondered about that too. Bet Itzik has something up his immense sleeve to super-charge this particular puzzle.
Heh... I'll just bet that Jeff Moden does, too! 😛
But, let's not test on a mere 7 rows... let's test on a million rows with an average of 20 people in each group...
--===== Create the temporary test table (NOT PART OF THE SOLUTION)
IF OBJECT_ID('TempDb..#Scores','U') IS NOT NULL
DROP TABLE #Scores
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
PersonID = ISNULL(ABS(CHECKSUM(NEWID()))%50000+1,0),
Grp = ISNULL(ABS(CHECKSUM(NEWID()))%50000+1,0),
Score = ABS(CHECKSUM(NEWID()))%100+1
INTO dbo.#Scores
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN
--===== Delete any dupes that may have occurred (expect about 200)
DELETE t1
FROM dbo.#Scores t1,
dbo.#Scores t2
WHERE t1.PersonID = t2.PersonID
AND t1.Grp = t2.Grp
AND t1.RowNum < t2.RowNum
--===== Add same primary key as original test data
ALTER TABLE dbo.#Scores
ADD PRIMARY KEY CLUSTERED (PersonID,Grp)
And, since there are 5 types of ranking according to Wiki-pedia... let's do the 3 most common (Ref: http://en.wikipedia.org/wiki/Ranking). Code deleted due to minor error... please see corrected code below...
Dunno how long it takes on your machine, but it takes 28 seconds on mine. Not bad, huh?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 26, 2007 at 10:45 pm
Was reviewing the results of the code above, and found a small boo-boo on the "Dense Percentile"... corrected in the following code... it did take it up from 28 to 35 seconds on the million row example... but it is calculating 3 different Ranks and the associated Percentile 😛
--===== If the working table already exists, drop it...
IF OBJECT_ID('TempDb..#Working','U') IS NOT NULL
DROP TABLE #Working
--===== Create and populate the working table on the fly
-- using the original data... makes a couple extra
-- columns to hold the Rank and Percentile values.
SELECT PersonID,
Grp,
Score,
ORank = CAST(NULL AS INT), --"Standard" Rank
OPercentile = CAST(NULL AS DECIMAL(3,2)),
CRank = CAST(NULL AS INT), --"Competative" Rank
CPercentile = CAST(NULL AS DECIMAL(3,2)),
DRank = CAST(NULL AS INT), --"Dense" Rank
DPercentile = CAST(NULL AS DECIMAL(3,2))
INTO #Working
FROM #Scores
--===== And the "magic" index that will control the sort order of
-- the whole process (doesn't need to be clustered, but is
-- lot's faster if it is).
CREATE CLUSTERED INDEX IXC_tmpWorking
ON #Working (Grp ASC, Score DESC) WITH FILLFACTOR = 100
--===== This next part has to be all dynamic SQL because the index above
-- does not exist at compile time and we'd get an error about the
-- index missing.
DECLARE @sql VARCHAR(8000)
SET @sql = '
--===== Declare a couple of variables and set them to values we know are
-- not in the table. Names make them self-documenting.
DECLARE @CRank INT,
@DRank INT,
@ORank INT,
@PrevGrp INT,
@PrevScore INT
SELECT @CRank = -1,
@DRank = -1,
@ORank = -1,
@PrevGrp = -1,
@PrevScore = -1
--===== This is the code that does all of the work. It uses SQL Server''s
-- very proprietary form of update. The order of the update is
-- controlled by the WITH (INDEX()) directive. Note, these are NOT
-- mere "hints" even though they are listed like that in BOL... they
-- are directives that "Instructs SQL Server to use the specified
-- indexes for a table." They even come with a warning that specifying
-- an index overrides the optimizer and may cause the query to work
-- more slowly because the index used may not be optimal for the query.
-- Not so in this case because... heh, we know what we''re doing ;-)
UPDATE #Working
SET @ORank = ORank = CASE
WHEN w.Grp = @PrevGrp
THEN @ORank + 1
ELSE 1
END,
OPercentile = (@ORank+0.0)/gc.GrpCount,
@CRank = CRank = CASE -- Grp same, Score changed, inc by 1
WHEN w.Grp = @PrevGrp
AND w.Score <> @PrevScore
THEN @ORank
-- Grp same, Score same, same rank
WHEN w.Grp = @PrevGrp
AND w.Score = @PrevScore
THEN @CRank
-- Grp changed
ELSE 1
END,
CPercentile = (@CRank+0.0)/gc.GrpCount,
@DRank = DRank = CASE -- Grp same, Score changed, inc by 1
WHEN w.Grp = @PrevGrp
AND w.Score <> @PrevScore
THEN @DRank + 1
-- Grp same, Score same, same rank
WHEN w.Grp = @PrevGrp
AND w.Score = @PrevScore
THEN @DRank
-- Grp changed, start Rank over
ELSE 1
END,
DPercentile = (@DRank+0.0)/gcs.ScoreCount,
@PrevGrp = w.Grp,
@PrevScore = w.Score
FROM #Working w WITH (INDEX(IXC_tmpWorking),TABLOCKX),
(--==== Derived table "gc" gets full counts for each group
SELECT Grp,
GrpCount = COUNT(*)
FROM #Working
GROUP BY Grp
)gc,
(--==== Derived table "gcs" gets count of distinct scores for each group
SELECT Grp,
ScoreCount = COUNT(DISTINCT Score)
FROM #Working
GROUP BY Grp
)gcs
WHERE gc.Grp = w.Grp
AND gcs.Grp = w.Grp'
--===== Execute the code above (including the index "hint")
-- Don't worry... Temp table #Working is "in scope"
EXEC (@SQL)
--===== Display the result
SELECT TOP 100 * FROM #Working ORDER BY Grp,SCORE Desc
Sorry for the "mistrake" :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
November 27, 2007 at 5:09 am
You know, you just GOTTA LOVE making single passes through the data. I have used that inline-variable update mechanism in the past to great effect. Almost like the characteristic functions of days gone by. Boy is THAT dating me!! 🙂 Nicely done Jeff - and the documentation provided should help a lot of forumites grok the essence so they can use the techniques shown for other processes!
My testing shows this code smokes the cursor even at 100K rows (2 sec vs 18 sec), which is what you expect when you are doing single-pass set-based logic.
Oh, one more thing. Does everyone else have the problem with copying out the 'code block' lines and pasting them in QA and they come out all on one line, unwrapped?? What gives with that?? Can we PLEASE have the forum-gods correct this?? Took me forever to get the code executable, and it still wasn't formatted nicely like Jeff took the time to do!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 27, 2007 at 5:17 am
I guess others have complained about the wrapping thing too. Just saw this very timely post:
http://www.sqlservercentral.com/articles/SQLServerCentral.com/61520/
I still think (very strongly) this should be fixed. Can we start a poll or campaign to get this in front of the person/people who can make it happen - or is it maybe technically impossible?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 27, 2007 at 6:13 am
The quick & dirty way to get the formatted code until the fix is in place is to paste the code into Word (or other editor) and replace the Manual Line Breaks with Paragraph marks - works like a charm!
...and another thing... I noticed the use of the object type parameter in the OBJECT_ID() system function in Jeff's code and wondered... It works in SQL2K but does not appear in BOL. It 'seemed' to be new funtionality in SQL2005 and is documented. It sure would be nice to know when/if supported [assumed] functionality like that is available.
November 29, 2007 at 12:06 pm
Oh, one more thing. Does everyone else have the problem with copying out the 'code block' lines and pasting them in QA and they come out all on one line, unwrapped?? What gives with that?? Can we PLEASE have the forum-gods correct this?? Took me forever to get the code executable, and it still wasn't formatted nicely like Jeff took the time to do!
Better way than what the article showed...
Put your cursor one line above the code box... click and drag to 1 line below the code box. Copy. That preserves all of the spacing for some reason. Then, do the paste to Word, replace ^l with ^p and copy to Query Analyzer. All formatting will be preserved.
Boy is THAT dating me!! Nicely done Jeff - and the documentation provided should help a lot of forumites grok the essence so they can use the techniques shown for other processes!
My testing shows this code smokes the cursor even at 100K rows
Heh... us old guys rule... 😛 Thank you for the compliment.
Someday, you're gonna have to tell me your first name.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 29, 2007 at 1:46 pm
Kevin Boles. Glad to meet you. You must be Jeff Moden, right?? 😉
BTW, do you LIVE on this forum or something?? I needed the link to this very thread to pass to someone on the MS forums and did a search for posts you did. Seemed like pages of returns PER DAY!! SHEESH!! 🙂 If I didn't know any better I would swear that answering forum posts was your full-time paid job!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 29, 2007 at 2:38 pm
TheSQLGuru (11/29/2007)
Kevin Boles. Glad to meet you. You must be Jeff Moden, right?? 😉BTW, do you LIVE on this forum or something?? I needed the link to this very thread to pass to someone on the MS forums and did a search for posts you did. Seemed like pages of returns PER DAY!! SHEESH!! 🙂 If I didn't know any better I would swear that answering forum posts was your full-time paid job!
Yeah... Jeff Moden is correct :hehe:
Glad to finally "meet" you, Kevin.
Heh... I'm just addicted 😀 ... kinda of like some people do counted cross-stitch, Sudoku, or Cross-Word puzzles... it requires a fair amount of detail but it's fun and maybe even relaxing...
... I think I'll try to convice Steve Jones that I should get a cot with "SQLServerCentral.com" embroidered on it for going over the 5k mark 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply