Return the most frequent value

  • Christopher Stobbs (7/22/2008)


    HI Jeff,

    I think he doesn't mind which is returned if they are all different.

    I added an extra ordering to my rank query so that the first is returned but that is up to him really.

    Thanks

    Chris

    Thanks for that, Chris, but we need the OP to tell us so we can be sure. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think this works but gosh it's pretty horrible...

    DROP TABLE #Customers

    CREATE TABLE #Customers (AccountNum int, DM_Source_Id int, Address varchar(60))

    INSERT INTO #Customers (AccountNum, DM_Source_Id, Address)

    SELECT 46786, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 2, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 5, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 2, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 1, 'McMahn Road/N/n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 2, 'McDonald Road/N/46787 /n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 4, 'St. Ives Road/N /n/nDarlington/nDurham'

    SELECT d1.AccountNum, MIN(d1.Address) AS Address, d1.Copies AS CopyCount

    FROM (SELECT AccountNum, Address, COUNT(*) AS Copies

    FROM #Customers

    GROUP BY AccountNum, Address

    ) d1

    INNER JOIN (SELECT AccountNum, MAX(Copies) AS MaxCopies

    FROM (SELECT AccountNum, Address, COUNT(*) AS Copies

    FROM #Customers

    GROUP BY AccountNum, Address

    ) d3 GROUP BY AccountNum

    ) d2 ON d2.AccountNum = d1.AccountNum AND d2.MaxCopies = d1.Copies

    GROUP BY d1.AccountNum, d1.Copies

    It would take another aggregated SELECT to bring in MAX or MIN DM_Source_Id.

    Of course this is a SQL2k5 forum section...

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (7/22/2008)


    I think this works but gosh it's pretty horrible...

    It would take another aggregated SELECT to bring in MAX or MIN DM_Source_Id.

    Of course this is a SQL2k5 forum section...

    Thank you for setting up the test data, Chris... here's my take on it and it does bring out DM_Source_ID. I also broke out the two separate examples that Neal originally had by assigning them to a different AccountNum. Heh... ya just gotta love ROW_NUMBER with a descending sort in the OVER clause...

    --===== Create a test table with some test data

    -- This is NOT part of the solution.

    CREATE TABLE #Customers (AccountNum int, DM_Source_Id int, Address varchar(60))

    INSERT INTO #Customers (AccountNum, DM_Source_Id, Address)

    SELECT 46785, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46785, 2, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46785, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46785, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL

    SELECT 46785, 5, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 2, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 1, 'McMahn Road/N/n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 2, 'McDonald Road/N/46787 /n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 4, 'St. Ives Road/N /n/nDarlington/nDurham'

    ;WITH

    cteRankedAddresses AS

    (--==== Note the descending order in the OVER clause...

    SELECT ROW_NUMBER() OVER (PARTITION BY q.AccountNum ORDER BY q.Qty DESC) AS MyRank,

    AccountNum, Address, Qty

    FROM (--==== Get a count for each address by account number

    SELECT AccountNum, Address, COUNT(*) AS Qty

    FROM #Customers

    GROUP BY AccountNum, Address)q

    )

    SELECT c.AccountNum, MAX(c.DM_Source_ID) AS MaxDM_Source_ID, c.Address

    FROM #Customers c

    INNER JOIN cteRankedAddresses ra

    ON c.AccountNum = ra.AccountNum

    AND c.Address = ra.Address

    AND ra.MyRank = 1

    GROUP BY c.AccountNum,c.Address

    Neal... as a bit of a sidebar... the code above shows one of the better ways to post data. See the link in my signature line below for how and why. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Top work, Jeff! It looks so much nicer in SQL2k5.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (7/22/2008)


    Top work, Jeff! It looks so much nicer in SQL2k5.

    Top work yourself, Chris... it was a real bugger to pull off using 2k code, but you did it! 🙂 Nicely done!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • HI Jeff,

    I think I might have found another solumtion

    [Code]

    ;WITH MyCTE AS

    (

    SELECT

    RANK() OVER (PARTITION BY [AccountNum] ORDER BY COUNT([Address]) DESC)

    + ROW_NUMBER() OVER (PARTITION BY [AccountNum] ORDER BY COUNT([Address]) DESC) as [Rank]

    ,[AccountNum]

    ,[Address]

    FROM #customers

    GROUP BY [AccountNum],[Address] )

    SELECT DISTINCT * FROM MyCTE WHERE [Rank] = 2

    [/Code]

    I add the Row number to the Rank in case everything is different.

    I've run yours and mine aagainst a 1mil rows and it seems

    Results

    Mine:

    SQL Server Execution Times:

    CPU time = 4219 ms, elapsed time = 9937 ms.

    Yours:

    SQL Server Execution Times:

    CPU time = 7156 ms, elapsed time = 15779 ms.

    However I'm not sure what affect using the RANK and ROW_Number will have on a server?

    Any idea?

    thanks

    Chris

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Jeff Moden (7/21/2008)


    Jack Corbett (7/21/2008)


    I think this may do what you want (I created some dummy data):

    Jack... after you get done building it, change all the Addresses to 0 for where the DM_Source_ID = 1... see what happens... 🙂 I think the data you built is great but I also think it leads to an incorrect result.

    Jeff,

    I could make excuse, but stupidity is no excuse (I obviously needed to read up on Order By):D. Here is what I came up with to fix it.

    [font="Courier New"]DECLARE @customers TABLE(AccountNum INT, DM_Source_Id INT, Address VARCHAR(10))

    DECLARE @i INT,

                 @j INT

    SET @i = 1

    SET @j = 1

    WHILE @i <= 10

        BEGIN

            WHILE @j <= 10

                BEGIN

                    IF @i = 1

                        BEGIN

                            INSERT INTO @customers

                                SELECT

                                    @i,

                                    @j,

                                    @j

                        END

                    ELSE

                        BEGIN

                            IF @j % 3 = 0

                                BEGIN

                                    INSERT INTO @customers

                                        SELECT

                                            @i,

                                            @j,

                                            @j

                                END

                            ELSE

                                BEGIN

                                       INSERT INTO @customers

                                            SELECT

                                                @i,

                                                @j,

                                                @i

                                END

                        END

                    SET @j = @j+ 1

                END

            

            SET @i = @i + 1

            SET @j = 1

        END

    UPDATE @customers

        SET Address = 0

    WHERE

        DM_Source_ID = 1

    ;WITH cteTopAddresses AS

        (

        SELECT

            Row_Number() OVER(Partition BY AccountNum ORDER BY COUNT(*) DESC) AS row_Id,

             AccountNum,

            Address,

            COUNT(*) AS AddressCount

        FROM

            @customers

        GROUP BY

            AccountNum,

            Address

        )

        SELECT * FROM cteTopAddresses WHERE row_id = 1[/font]

    Of course while I was working on this there were other solutions offered that are similar.

  • Here's another one to add to the pile...

    ; WITH cteRankedAddresses AS (

    SELECT AccountNum, MAX(DM_Source_ID) AS MaxDM_Source_ID, Address,

    ROW_NUMBER() OVER (PARTITION BY AccountNum ORDER BY count(*) DESC, MAX(DM_Source_ID) DESC) AS MyRank

    FROM #Customers GROUP BY AccountNum, Address)

    SELECT AccountNum, MaxDM_Source_ID, Address FROM cteRankedAddresses WHERE MyRank = 1 ORDER BY AccountNum

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Christopher Stobbs (7/22/2008)


    I've run yours and mine aagainst a 1mil rows and it seems

    Results

    Mine:

    SQL Server Execution Times:

    CPU time = 4219 ms, elapsed time = 9937 ms.

    Yours:

    SQL Server Execution Times:

    CPU time = 7156 ms, elapsed time = 15779 ms.

    However I'm not sure what affect using the RANK and ROW_Number will have on a server?

    Any idea?

    The difference between your code and mine is that I need the extra code to return the DM_Source_ID. Without returning that, your extra ROW_NUMBER causes two extra steps in the execution plan... the execution plan says that they don't amount to a hill of beans, though, so no problems there.

    Here's my code modified to only return what you've returned...

    ;WITH

    cteRankedAddresses AS

    (--==== Note the descending order in the OVER clause...

    SELECT ROW_NUMBER() OVER (PARTITION BY q.AccountNum ORDER BY q.Qty DESC) AS MyRank,

    AccountNum, Address, Qty

    FROM (--==== Get a count for each address by account number

    SELECT AccountNum, Address, COUNT(*) AS Qty

    FROM Customers

    GROUP BY AccountNum, Address)q

    )

    SELECT * FROM cteRankedAddresses WHERE MyRank = 1

    Lemme know how that does for performance against your test code.

    A couple of favors please, Chris. Would you edit your post to break the long line at some mathematical operators so the line isn't so long? Those types of long lines make it so you have to pan right just to read a paragraph.

    Also, would you mind posting your million row code generator? I'd like to test against the same data you are.

    By the way, very nice shortcut in the CTE... I've gotta study that one. Looks like a good trick to know.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • HI All,

    Wow this has been fun:

    I have tested all the solutions with 1Mil rows of Jack's Data and here is what happened:

    [Code]

    DECLARE @customers TABLE(AccountNum INT, DM_Source_Id INT, Address VARCHAR(10))

    DECLARE @i INT,

    @j-2 INT

    SET @i = 1

    SET @j-2 = 1

    WHILE @i <= 1000

    BEGIN

    WHILE @j-2 <= 1000

    BEGIN

    IF @i = 1

    BEGIN

    INSERT INTO @customers

    SELECT

    @i,

    @j-2,

    @j-2

    END

    ELSE

    BEGIN

    IF @j-2 % 3 = 0

    BEGIN

    INSERT INTO @customers

    SELECT

    @i,

    @j-2,

    @j-2

    END

    ELSE

    BEGIN

    INSERT INTO @customers

    SELECT

    @i,

    @j-2,

    @i

    END

    END

    SET @j-2 = @j-2+ 1

    END

    SET @i = @i + 1

    SET @j-2 = 1

    END

    UPDATE @customers

    SET Address = 0

    WHERE

    DM_Source_ID = 1

    SELECT COUNT(*)

    FROM @customers

    SET STATISTICS TIME ON

    PRINT '----------Chris-----------'

    ;WITH MyCTE AS

    (

    SELECT

    RANK() OVER (PARTITION BY [AccountNum] ORDER BY COUNT([Address]) DESC)

    + ROW_NUMBER() OVER (PARTITION BY [AccountNum] ORDER BY COUNT([Address])DESC ,MAX(DM_Source_ID) ASC) as [Rank]

    ,[AccountNum]

    ,[Address]

    FROM @customers

    GROUP BY [AccountNum],[Address] )

    SELECT DISTINCT * FROM MyCTE WHERE [Rank] = 2

    PRINT '--------------------------'

    PRINT '---------Jeff-------------'

    ;WITH

    cteRankedAddresses AS

    (--==== Note the descending order in the OVER clause...

    SELECT ROW_NUMBER() OVER (PARTITION BY q.AccountNum ORDER BY q.Qty DESC) AS MyRank,

    AccountNum, Address, Qty

    FROM (--==== Get a count for each address by account number

    SELECT AccountNum, Address, COUNT(*) AS Qty

    FROM @customers

    GROUP BY AccountNum, Address)q

    )

    SELECT c.AccountNum, MAX(c.DM_Source_ID) AS MaxDM_Source_ID, c.Address

    FROM @customers c

    INNER JOIN cteRankedAddresses ra

    ON c.AccountNum = ra.AccountNum

    AND c.Address = ra.Address

    AND ra.MyRank = 1

    GROUP BY c.AccountNum,c.Address

    ORDER BY AccountNum

    PRINT '--------------------------'

    PRINT '------------Ryan----------'

    ; WITH cteRankedAddresses2 AS (

    SELECT AccountNum, MAX(DM_Source_ID) AS MaxDM_Source_ID, Address,

    ROW_NUMBER() OVER (PARTITION BY AccountNum ORDER BY count(*) DESC, MAX(DM_Source_ID) DESC) AS MyRank

    FROM @customers GROUP BY AccountNum, Address)

    SELECT AccountNum, MaxDM_Source_ID, Address FROM cteRankedAddresses2 WHERE MyRank = 1 ORDER BY AccountNum

    PRINT '--------------------------'

    PRINT '-----_-----Jack-----------'

    ;WITH cteTopAddresses AS

    (

    SELECT

    Row_Number() OVER(Partition BY AccountNum ORDER BY COUNT(*) DESC) AS row_Id,

    AccountNum,

    Address,

    COUNT(*) AS AddressCount

    FROM

    @customers

    GROUP BY

    AccountNum,

    Address

    )

    SELECT * FROM cteTopAddresses WHERE row_id = 1

    PRINT '--------------------------'

    SET STATISTICS TIME OFF

    [/Code]

    WELLDONE Jack:

    Your is the fastest, here are all the results:

    ----------Chris-----------

    SQL Server Execution Times:

    CPU time = 6938 ms, elapsed time = 14458 ms.

    --------------------------

    ---------Jeff-------------

    SQL Server Execution Times:

    CPU time = 8953 ms, elapsed time = 17680 ms.

    --------------------------

    ------------Ryan----------

    SQL Server Execution Times:

    CPU time = 5672 ms, elapsed time = 11342 ms.

    --------------------------

    -----------Jack-----------

    SQL Server Execution Times:

    CPU time = 5484 ms, elapsed time = 9636 ms.

    --------------------------

    Just a last note, that Ryan your AccountNum one does not bring back the First DM_Source_ID address.

    Otherwise thanks all for sharing I know I'm not the OP on this one but have learn't alot 🙂

    Chris

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Jack Corbett (7/22/2008)


    I could make excuse, but stupidity is no excuse (I obviously needed to read up on Order By):D. Here is what I came up with to fix it.

    Heh... No, not stupid, Jack... You create some data, write some code, get the answer you want,and think you're done. I don't know anyone who hasn't done a similar thing at one time or another. I sure do appreciate the feedback!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Christopher Stobbs (7/22/2008)


    HI All,

    Wow this has been fun:

    I have tested all the solutions with 1Mil rows of Jack's Data and here is what happened:

    Not trying to take anything away form one, but if you really want to have some fun, make it so you're testing just apples... right now, you're testing apples and oranges... some of the code returns DM_Source_ID (like mine does and Ryan's do) and some doesn't (like Chris' and Jack's).

    You'll also find that some of the returns are not correct...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Chris, if you still have your test environment handy, would you mind comparing the SQL2k version against the 2k5 versions?

    Cheers

    ChrisM

    DROP TABLE #Customers

    CREATE TABLE #Customers (AccountNum int, DM_Source_Id int, Address varchar(60))

    INSERT INTO #Customers (AccountNum, DM_Source_Id, Address)

    SELECT 46786, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 2, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 5, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 2, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46786, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 1, 'McMahn Road/N/n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 2, 'McDonald Road/N/46787 /n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL

    SELECT 46787, 4, 'St. Ives Road/N /n/nDarlington/nDurham'

    SELECT c.AccountNum, MIN(c.DM_Source_Id) AS DM_Source_Id, c.Address

    FROM #Customers c

    INNER JOIN (SELECT d1.AccountNum, MIN(d1.Address) AS Address, d1.Copies AS CopyCount

    FROM (SELECT AccountNum, Address, COUNT(*) AS Copies

    FROM #Customers

    GROUP BY AccountNum, Address

    ) d1

    INNER JOIN (SELECT AccountNum, MAX(Copies) AS MaxCopies

    FROM (SELECT AccountNum, Address, COUNT(*) AS Copies

    FROM #Customers

    GROUP BY AccountNum, Address

    ) d3 GROUP BY AccountNum

    ) d2 ON d2.AccountNum = d1.AccountNum AND d2.MaxCopies = d1.Copies

    GROUP BY d1.AccountNum, d1.Copies

    ) d4 ON d4.AccountNum = c.AccountNum AND d4.Address = c.Address

    GROUP BY c.AccountNum, c.Address

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • HI Chris,

    I have run 2k version on 10000 rows and it takes 3000ms where the 2k5 versions run in sub 50ms

    I tried to run it on a 1Mil but it took for ever.

    Thanks

    Chris

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Cheers Chris - it's pretty much what you'd expect really.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply