how to display ranking with total ranking in it

  • I have sorted the data and had ranking for highest sales by City. But I want to see ranking with highes rank in that city i.e. 1 of 3

    CREATE TABLE #Sample( Product varchar(10), City varchar(50), Sales decimal(12,2), Rank int)

    INSERT INTO #Sample VALUES('12345','NewYork','20.00',1)

    INSERT INTO #Sample VALUES('54321','NewYork','15.00',2)

    INSERT INTO #Sample VALUES('54321','NewYork','9.00',3)

    INSERT INTO #Sample VALUES('12345','NewHaven','8.00',1)

    SELECT * FROM #Sample

    DROP TABLE #Sample

    CREATE TABLE #INeed( Product varchar(10), City varchar(50), Sales decimal(12,2), Rank varchar(10))

    INSERT INTO #INeed VALUES('12345','NewYork','20.00','1 of 3')

    INSERT INTO #INeed VALUES('54321','NewYork','15.00','2 of 3')

    INSERT INTO #INeed VALUES('54321','NewYork','9.00','3 of 3')

    INSERT INTO #INeed VALUES('12345','NewHaven','8.00','1 of 1')

    SELECT * FROM #INeed

    DROP TABLE #INeed

  • Shree-903371 (4/23/2012)


    I have sorted the data and had ranking for highest sales by City. But I want to see ranking with highes rank in that city i.e. 1 of 3

    CREATE TABLE #Sample( Product varchar(10), City varchar(50), Sales decimal(12,2), Rank int)

    INSERT INTO #Sample VALUES('12345','NewYork','20.00',1)

    INSERT INTO #Sample VALUES('54321','NewYork','15.00',2)

    INSERT INTO #Sample VALUES('54321','NewYork','9.00',3)

    INSERT INTO #Sample VALUES('12345','NewHaven','8.00',1)

    SELECT * FROM #Sample

    DROP TABLE #Sample

    CREATE TABLE #INeed( Product varchar(10), City varchar(50), Sales decimal(12,2), Rank varchar(10))

    INSERT INTO #INeed VALUES('12345','NewYork','20.00','1 of 3')

    INSERT INTO #INeed VALUES('54321','NewYork','15.00','2 of 3')

    INSERT INTO #INeed VALUES('54321','NewYork','9.00','3 of 3')

    INSERT INTO #INeed VALUES('12345','NewHaven','8.00','1 of 1')

    SELECT * FROM #INeed

    DROP TABLE #INeed

    Looks like you need a window function with a partition clause. Check out my blog in my signature, as I just completed a post on this along with a link to Itzik Ben-Gan's fantastic blogs on this.

    Jared
    CE - Microsoft

  • I used ROW_NUMBER function only to get the rank with partition. But i could not get get total rank for each city.

    I figured a work around to get the MAX rank for each city and join it back the sales table on city and display rank of total rank.

    I thought may be anything simpler exists.

  • I would test the performance of MAX versus COUNT. Not sure offhand which would be better... Also, I hope you are not storing '1 of 3' in a column. Seems like a waste. Print that in your report, but don't store the value.

    Jared
    CE - Microsoft

  • Are you saying you just want to add "of x" to the Rank column, where x is the row count for the city in question? Do you want to update a table with this information, or just return a result set? This is something that should really be done in the presentation layer, but if you don't have that option, it's simple, if a little messy, to do it with T-SQL.

    John

  • I wouldn't say calculating the 'of x' part of the ranking is messy at all. Notice how I removed the rank column from your sample table. You can calculate the of part at the same time you calculate the rank.

    DECLARE @Sample TABLE ( Product varchar(10), City varchar(50), Sales decimal(12,2))

    INSERT INTO @Sample VALUES('12345','NewYork','20.00')

    INSERT INTO @Sample VALUES('54321','NewYork','15.00')

    INSERT INTO @Sample VALUES('54321','NewYork','9.00')

    INSERT INTO @Sample VALUES('12345','NewHaven','8.00')

    SELECT * FROM @Sample

    ;WITH cte AS (

    SELECT Product, City, Sales

    ,RANK() OVER (PARTITION BY City ORDER BY City, Sales DESC) as Rank

    ,COUNT(*) OVER (PARTITION BY City) as Total

    FROM @Sample

    )

    SELECT Product, City, Sales, CAST(Rank AS VARCHAR) + ' of ' + CAST(Total AS VARCHAR)

    FROM cte

    First Edit: Initially I omitted the DESC after Sales so that the highest sales is ranked as #1.

    Secondary Edit:

    I'm not sure that this is something that is desirable in the presentation layer. Clearly it is derived data and since it is easily done in T-SQL, why obfuscate the presentation layer code with it? In a report, the ranking/of n may be easier to do than in a web form, but given the ease with which it can be done in T-SQL, why bother? Especially when you have the choice to implement the above as a VIEW and then you can use it in both places without implementing it twice.

    Also, if you're concerned about data normalization (as you should be), neither rank nor rank of n should be stored as a column in the table.

    Finally, if you use my example, you'll need to make a choice between RANK() or DENSE_RANK() for handling of ties on Sales.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (4/23/2012)


    I wouldn't say calculating the 'of x' part of the ranking is messy at all.

    All that casting from int to to varchar looks messy to me - that's all I meant, and I admit that it's a matter of personal preference.

    I'm not sure that this is something that is desirable in the presentation layer. Clearly it is derived data and since it is easily done in T-SQL, why obfuscate the presentation layer code with it? In a report, the ranking/of n may be easier to do than in a web form, but given the ease with which it can be done in T-SQL, why bother? Especially when you have the choice to implement the above as a VIEW and then you can use it in both places without implementing it twice.

    Like most things, it depends. My view is that it is the job of the data layer to return the raw data (the rank and the row count) and of the presentation layer to make it look nice (put the "of" in). This applies especially if there will be several consumers of the same data, each wanting it presented in a slightly different way. Or maybe some wish to do further processing of the results and require numeric values. I wouldn't want to be writing a separate report for each. On the other hand, the original poster may not even have a presentation layer that is easily configured, in which case he'll go with the T-SQL method.

    John

  • John Mitchell-245523 (4/24/2012)


    dwain.c (4/23/2012)


    I wouldn't say calculating the 'of x' part of the ranking is messy at all.

    All that casting from int to to varchar looks messy to me - that's all I meant, and I admit that it's a matter of personal preference.

    I'm not sure that this is something that is desirable in the presentation layer. Clearly it is derived data and since it is easily done in T-SQL, why obfuscate the presentation layer code with it? In a report, the ranking/of n may be easier to do than in a web form, but given the ease with which it can be done in T-SQL, why bother? Especially when you have the choice to implement the above as a VIEW and then you can use it in both places without implementing it twice.

    Like most things, it depends. My view is that it is the job of the data layer to return the raw data (the rank and the row count) and of the presentation layer to make it look nice (put the "of" in). This applies especially if there will be several consumers of the same data, each wanting it presented in a slightly different way. Or maybe some wish to do further processing of the results and require numeric values. I wouldn't want to be writing a separate report for each. On the other hand, the original poster may not even have a presentation layer that is easily configured, in which case he'll go with the T-SQL method.

    John

    +1 In addition, you let the data be the data so that the same query, view, stored proc, etc. can be reused for different presentations. By bother the database server with the work (I know it is not much, but it can compound) to concatenate data when that can be done on the web/application server or the client's machine.

    Jared
    CE - Microsoft

  • Jared + John,

    I can see where you're headed and I'm almost there. The idea of doing away entirely with derived columns in SQL is revolutionary, not to mention imaginitive! I can see the advantages, the most obvious one being making the learning experience much simpler for newbies. Just a little more convincing and I'll probably be on board. Being a contrary thinker just makes it that much easier for me.

    Of course, there is the pesky issue of the ISO/ANSI standards allowing derived columns. Perhaps a write in campaign to the standards committee. You can write me in. If they make that change to the standards, it's likely to turn the SQL world topsy-turvy, not to mention causing Celko endless headaches in rewrites. But what the heck, the cause is just! 😀

    Seriously though guys, as you've both commented there are pros and cons in both directions. Individual scenarios will generally sway the argument in one direction or another. Personally I like SQL because it doesn't hide anything. I don't need to open Visual Studio and pore over hundreds of lines of obtuse C# code to understand the business logic. I believe that there are too many cases of developers not understanding SQL well enough to take advantage of what it can do, so spend inordinate amounts of time making the presentaiton layer do more than it needs to. But that's just generally my opinion and it has a lot to do with the work environment I'm iin. And I'm always willing to keep an open mind to a good argument for or against.

    I hope you realize I wasn't really trying to poke fun at your arguments. They were good ones. I'm just trying to have a little fun in general, which helps me to be enthusiastic in helping others on this forum.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (4/24/2012)


    I hope you realize I wasn't really trying to poke fun at your arguments.

    Of course I do. A good debate is a good debate, and if we can have a little fun in the process, so much the better!

    John

  • I don't think you were poking fun at anything. You have valid points. I think it really all comes down to the specific situation, but also personal preference. I have had One CEO want the presentation 1 way, the CFO another, the VP of Sales another, etc. So I personally try to keep the SQL general and then I can use the same stored proc for a number of reports. i.e. 1 of 3, page 1 of 3, 1 of 3 total, 1 of 3 sales, or even simply separate columns. I feel that the utility is greater when the presentation layer is handled out of SQL. Some people find that more complex. So for me, it may really be a personal preference.

    A great example contrary to my "preference" is an application we have at my current company. The developer had been loading data into cache and concatenating 2 columns with his application. We found it much faster to do it with SQL instead of having his program do it. Of course, this is not "presentation," but data used in the application (we are not an MS shop outside of SQL). So this kind of goes along with the situational aspect.

    I think, in general, it is best to teach people to NOT format data in SQL; i.e. adding $ to money, % for statistical data, etc. That can always be changed when the final design comes around, but should not be the starting point.

    Jared
    CE - Microsoft

  • dwain.c (4/23/2012)


    DECLARE @Sample TABLE ( Product varchar(10), City varchar(50), Sales decimal(12,2))

    INSERT INTO @Sample VALUES('12345','NewYork','20.00')

    INSERT INTO @Sample VALUES('54321','NewYork','15.00')

    INSERT INTO @Sample VALUES('54321','NewYork','9.00')

    INSERT INTO @Sample VALUES('12345','NewHaven','8.00')

    SELECT * FROM @Sample

    ;WITH cte AS (

    SELECT Product, City, Sales

    ,RANK() OVER (PARTITION BY City ORDER BY City, Sales DESC) as Rank

    ,COUNT(*) OVER (PARTITION BY City) as Total

    FROM @Sample

    )

    SELECT Product, City, Sales, CAST(Rank AS VARCHAR) + ' of ' + CAST(Total AS VARCHAR)

    FROM cte

    This is a bad idea. COUNT OVER is often worse than just joining to the pre-aggregated table, e.g.

    SELECT Product, a.City, Sales,

    CAST(RANK() OVER (PARTITION BY a.City ORDER BY a.City, Sales DESC) AS VARCHAR) + ' of ' + CAST(Total AS VARCHAR)

    FROM @Sample a

    INNER JOIN (SELECT City, COUNT(*) AS Total

    FROM @Sample

    GROUP BY City) b ON a.City = b.City;

    Here's some evidence to support my sweeping statement 😀

    SET NOCOUNT ON;

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    --1,000,000 Random rows of data

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,

    CHAR((ABS(CHECKSUM(NEWID())) % 24) + 65) AS randomChar

    INTO #testEnvironment

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;

    --Holder variable to take the display time out of the equation

    DECLARE @HOLDER AS VARCHAR(MAX);

    PRINT REPLICATE('-',80);

    PRINT 'COUNT(*) OVER';

    PRINT REPLICATE('-',80);

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    WITH cte AS (

    SELECT ID, randomChar, RANK() OVER (PARTITION BY randomChar ORDER BY randomChar, ID DESC) AS Rank,

    COUNT(*) OVER (PARTITION BY randomChar) AS Total

    FROM #testEnvironment)

    SELECT @HOLDER = CAST(Rank AS VARCHAR) + ' of ' + CAST(Total AS VARCHAR)

    FROM cte;

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    PRINT REPLICATE('-',80);

    PRINT 'INNER JOIN';

    PRINT REPLICATE('-',80);

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    SELECT @HOLDER = CAST(RANK() OVER (PARTITION BY a.randomChar ORDER BY a.randomChar, ID DESC) AS VARCHAR) + ' of ' + CAST(Total AS VARCHAR)

    FROM #testEnvironment a

    INNER JOIN (SELECT randomChar, COUNT(*) AS Total

    FROM #testEnvironment

    GROUP BY randomChar) b ON a.randomChar = b.randomChar;

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    --------------------------------------------------------------------------------

    COUNT(*) OVER

    --------------------------------------------------------------------------------

    Table '#testEnvironment____________________________________________________________________________________________________000000000178'. Scan count 5, logical reads 1732, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 12, logical reads 2022490, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 8188 ms, elapsed time = 5217 ms.

    --------------------------------------------------------------------------------

    INNER JOIN

    --------------------------------------------------------------------------------

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#testEnvironment____________________________________________________________________________________________________000000000178'. Scan count 10, logical reads 3464, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 6485 ms, elapsed time = 3580 ms.

    Even adding some indexes doesn't help the count over enough: -

    SET NOCOUNT ON;

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    --1,000,000 Random rows of data

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,

    CHAR((ABS(CHECKSUM(NEWID())) % 24) + 65) AS randomChar

    INTO #testEnvironment

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;

    CREATE CLUSTERED INDEX pk_test ON #testEnvironment (ID);

    CREATE NONCLUSTERED INDEX test ON #testEnvironment (randomChar);

    --Holder variable to take the display time out of the equation

    DECLARE @HOLDER AS VARCHAR(MAX);

    PRINT REPLICATE('-',80);

    PRINT 'COUNT(*) OVER';

    PRINT REPLICATE('-',80);

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    WITH cte AS (

    SELECT ID, randomChar, RANK() OVER (PARTITION BY randomChar ORDER BY randomChar, ID DESC) AS Rank,

    COUNT(*) OVER (PARTITION BY randomChar) AS Total

    FROM #testEnvironment)

    SELECT @HOLDER = CAST(Rank AS VARCHAR) + ' of ' + CAST(Total AS VARCHAR)

    FROM cte;

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    PRINT REPLICATE('-',80);

    PRINT 'INNER JOIN';

    PRINT REPLICATE('-',80);

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    SELECT @HOLDER = CAST(RANK() OVER (PARTITION BY a.randomChar ORDER BY a.randomChar, ID DESC) AS VARCHAR) + ' of ' + CAST(Total AS VARCHAR)

    FROM #testEnvironment a

    INNER JOIN (SELECT randomChar, COUNT(*) AS Total

    FROM #testEnvironment

    GROUP BY randomChar) b ON a.randomChar = b.randomChar;

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    --------------------------------------------------------------------------------

    COUNT(*) OVER

    --------------------------------------------------------------------------------

    Table '#testEnvironment____________________________________________________________________________________________________00000000017A'. Scan count 5, logical reads 1365, physical reads 0, read-ahead reads 21, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 12, logical reads 2022497, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 7704 ms, elapsed time = 4860 ms.

    --------------------------------------------------------------------------------

    INNER JOIN

    --------------------------------------------------------------------------------

    Table '#testEnvironment____________________________________________________________________________________________________00000000017A'. Scan count 25, logical reads 2824, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2359 ms, elapsed time = 2352 ms.

    Always make sure you test, because I have found some occasions where the OVER clause on an aggregate function is faster than the pre-aggregated table sub query but more often I find the sub-query is faster.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre,

    Funny you should mention performance of SUM() OVER. I just happen to use this for the first time today (where's the blushing smiley when you need him?) and got the exact opposite results. At least I think they're on the same basis as what you've done here.

    http://www.sqlservercentral.com/Forums/Topic1287374-1292-1.aspx#bm1288680

    I agree you should test alternatives, although in this case my post was for elegance.

    I'll need to look at what you've done here after I've had some sleep. Thanks for keeping me honest.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Cadavre,

    First some compliments for you:

    1. I love your @holder variable concept.

    2. I also like the PRINTs into the message pane - very effective for isolating the interesting results!

    Observations:

    1. The overhead of SUM() OVER appears to be caused by a sort being performed in the execution plan. I also saw this in the other thread I referenced. I am curious why a sort is required to do the SUM, but presumably SQL Server has its reasons.

    2. We're talking about an overhead of about 1.5 CPU seconds (out of 6), which may not be too high a price to pay for the added readability. Not that I'm trying to justify a poor performing solution mind you. It's just that I think for display purposes such as this, the original query is unlikely to return 1M rows to the presentation layer.

    3. There's also a nested loop join going on in the SUM() OVER query adding overhead.

    4. It appears that for some reason the INNER JOIN version is better able to use statistics, because when I add the following statement after the INSERTs to the temp table, the SUM() OVER query ends up being faster.

    UPDATE STATISTICS #testEnvironment WITH ROWCOUNT = 1, PAGECOUNT = 1

    Very interesting playing around with this example. I can find no flaw in your analysis (and I tried various indexing combinations and query hints).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply