Work out which column value is repeated the most

  • CREATE Table #Fred(OrganisationID int, LocationID int)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,2)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,2)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,7)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,7)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,7)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,5)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,6)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(2,18)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(2,19)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(2,19)

    SELECT * FROM #Fred

    DROP Table #Fred

    The code above produces data that looks like:

    1,2

    1,2

    1,7

    1,7

    1,7

    1,5

    1,6

    2,18

    2,19

    2,19

    In the second column - for any distinct value in the first column - how can I find out which value in the second column occurs most often - and update all the values in Column 2 to that value which occurs most often.

    So, in the example above - how do I establish that, for value 1 in the first column, 7 occurs more often than 2, 5 and 6 - and update all rows where the value in the first column is 1 so that the value in the second column is 7.

    And ... that, for value 2 in the first column, 19 occurs more often than 18 - and update all rows where the value in the first column is 2 so that the value in the second column is 19.

    Etc.

    I haven't shown it above, but each row has a Primary Key.

    Thanks for any help.

  • THis?

    SELECT * FROM #Fred;

    ; WITH CTE AS

    (

    SELECT OrganisationID , LocationID ,CT = COUNT(*) --, Cnt = COUNT(*) OVER ( PARTITION BY OrganisationID , LocationID)

    FROM #Fred

    GROUP BY OrganisationID , LocationID

    )

    , Rankd AS

    (

    SELECT OrganisationID , LocationID

    , RN = ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY CT DESC)

    FROM CTE

    )

    , TopPerformer AS

    (

    SELECT OrganisationID , LocationID

    FROM Rankd

    WHERE RN = 1

    )

    UPDATE F

    SET F.LocationID = TP.LocationID

    FROM #Fred F

    INNER JOIN TopPerformer TP

    ON TP.OrganisationID = F.OrganisationID

    SELECT * FROM #Fred;

  • Wow, thanks very much. Yes, that's exactly what I need.

    Seeing your code makes me realise what a novice I am at this lark.

    Can I ask ...

    Using that 'With' statement, you seem to be creating several temporary tables but you don't have to drop them - is that right?

    And you start that statement with a semi-colon - and use commas to separate the statements that create the temporary tables?

    Cheers

  • sku370870 (3/1/2012)


    Wow, thanks very much. Yes, that's exactly what I need.

    Seeing your code makes me realise what a novice I am at this lark.

    Can I ask ...

    Using that 'With' statement, you seem to be creating several temporary tables but you don't have to drop them - is that right?

    And you start that statement with a semi-colon - and use commas to separate the statements that create the temporary tables?

    Cheers

    Does it make it easier for you to understand if I rewrite his query like this?

    UPDATE F

    SET F.LocationID = updateValues.LocationID

    FROM #Fred F

    INNER JOIN (SELECT OrganisationID, LocationID

    FROM (SELECT OrganisationID, LocationID, ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY Cnt DESC)

    FROM (SELECT OrganisationID, LocationID, COUNT(*)

    FROM #Fred

    GROUP BY OrganisationID, LocationID) originalDataCount(OrganisationID, LocationID, Cnt)

    ) dataRanked(OrganisationID, LocationID, Rn)

    WHERE RN = 1) updateValues ON updateValues.OrganisationID = F.OrganisationID;

    Basically, a CTE is a derived table that can be referenced in your query. ColdCoffee has created 3 CTEs (named CTE, Rankd and TopPerformer) then used each one to build up the data he needed to update your table.

    The semi-colon at the start of his query is actually bad when demonstrating the method used for creating a CTE. It's there to terminate the previous statement but is often written next to the "WITH" instead.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thanks for your help in trying to get me to understand this.

    I realise in your code you are doing up update query joining #fred to a table created by this statement.

    SELECT OrganisationID, LocationID

    FROM (SELECT OrganisationID, LocationID, ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY Cnt DESC)

    FROM (SELECT OrganisationID, LocationID, COUNT(*) FROM #Fred

    GROUP BY OrganisationID, LocationID)

    originalDataCount(OrganisationID, LocationID, Cnt))

    dataRanked(OrganisationID, LocationID, Rn)

    WHERE RN = 1

    Although I get the idea ... I don't understand the syntax at all ... there seem to be three consecutive 'FROMS'

    I have only ever written

    SELECT FieldA FROM TableB

    I don't get .. SELECT something FROM something FROM something FROM #Fred

    I'd be grateful for any further explanation - I'd really like to understand this stuff. Cheers.

    Edit: Just as an aside - I spend most of time writing in C# in a .net environment ... where every line ends with a semi-colon or it won't compile.

    In SQL Server I never use semi-colons to finish lines ... but you do?

  • OK, I'll give it a go but I've been told before that my teaching method comes across as abrasive.

    First, we want to figure out the counts for each organisationID and locationID. We do this by doing a simple count:

    SELECT OrganisationID, LocationID, COUNT(*)

    FROM #Fred

    GROUP BY OrganisationID, LocationID

    Which returns:

    OrganisationID LocationID

    -------------- ----------- -----------

    1 2 2

    1 5 1

    1 6 1

    1 7 3

    2 18 1

    2 19 2

    We need to know which is the highest count for each orgnanisationID, so we use a ranking function, but we want to base it from the data above so we use a derived table.

    SELECT OrganisationID, LocationID, ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY Cnt DESC)

    FROM (

    --This is your derived table

    SELECT OrganisationID, LocationID, COUNT(*)

    FROM #Fred

    GROUP BY OrganisationID, LocationID

    ) derivedTable(OrganisationID, LocationID, Cnt) --alias for derived table and column names

    Which returns:

    OrganisationID LocationID

    -------------- ----------- --------------------

    1 7 1

    1 2 2

    1 5 3

    1 6 4

    2 19 1

    2 18 2

    Now, those with a ROW_NUMBER of 1 are the highest counts for each organisationID, so we want to use another derived table to grab the above data and filter where ROW_NUMBER is 1.

    SELECT OrganisationID, LocationID

    FROM (

    --This is your derived table

    SELECT OrganisationID, LocationID, ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY Cnt DESC)

    FROM (

    --This is your derived table

    SELECT OrganisationID, LocationID, COUNT(*)

    FROM #Fred

    GROUP BY OrganisationID, LocationID

    ) derivedTable(OrganisationID, LocationID, Cnt) --alias for derived table and column names

    ) derivedTable2(OrganisationID, LocationID, Rnk) --alias for derived table and column names

    WHERE Rnk = 1

    Which returns:

    OrganisationID LocationID

    -------------- -----------

    1 7

    2 19

    Now we can join this back to #Fred based on the organisationID, allowing us to use the highest locationID.

    SELECT f.OrganisationID AS originalOrganisationID, f.LocationID AS originalLocationID,

    derivedTable3.OrganisationID AS derivedOrganisationID, derivedTable3.LocationID AS derivedLocationID

    FROM #Fred f

    INNER JOIN (--This is your derived table

    SELECT OrganisationID, LocationID

    FROM (

    --This is your derived table

    SELECT OrganisationID, LocationID, ROW_NUMBER() OVER (PARTITION BY OrganisationID ORDER BY Cnt DESC)

    FROM (

    --This is your derived table

    SELECT OrganisationID, LocationID, COUNT(*)

    FROM #Fred

    GROUP BY OrganisationID, LocationID

    ) derivedTable(OrganisationID, LocationID, Cnt) --alias for derived table and column names

    ) derivedTable2(OrganisationID, LocationID, Rnk) --alias for derived table and column names

    WHERE Rnk = 1

    ) derivedTable3 ON f.OrganisationID = derivedTable3.OrganisationID

    Which results in:

    originalOrganisationID originalLocationID derivedOrganisationID derivedLocationID

    ---------------------- ------------------ --------------------- -----------------

    1 2 1 7

    1 2 1 7

    1 7 1 7

    1 7 1 7

    1 7 1 7

    1 5 1 7

    1 6 1 7

    2 18 2 19

    2 19 2 19

    2 19 2 19

    Does that make it clearer? Derived tables are sort of essential to almost all SQL development. . .

    sku370870 (3/1/2012)


    Edit: Just as an aside - I spend most of time writing in C# in a .net environment ... where every line ends with a semi-colon or it won't compile.

    In SQL Server I never use semi-colons to finish lines ... but you do?

    No, I don't. But I should 😀

    CTE is one of the first things in SQL that require the previous statement to have been terminated correctly.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre (3/1/2012)


    OK, I'll give it a go but I've been told before that my teaching method comes across as abrasive.

    Not at all Cadavre! Awesome explanation! Thanks for that. Hope the OP sees the code clearer now.

  • I must also ask a question here. What should happen if there are 2 LocationIDs with same number of appearances for a particular OrgID ?

  • The semi-colon at the start of his query is actually bad when demonstrating the method used for creating a CTE. It's there to terminate the previous statement but is often written next to the "WITH" instead.

    To the OP, although the semi-colon is used to terminate a preceding statement... a CTE must ALWAYS begin with one. This is why many people put it before the WITH. If there is no preceding statement, it still requires the semi-colon. So it has become practice to just place it before the WITH as demonstrated.

    Jared
    CE - Microsoft

  • SQLKnowItAll (3/1/2012)


    The semi-colon at the start of his query is actually bad when demonstrating the method used for creating a CTE. It's there to terminate the previous statement but is often written next to the "WITH" instead.

    To the OP, although the semi-colon is used to terminate a preceding statement... a CTE must ALWAYS begin with one. This is why many people put it before the WITH. If there is no preceding statement, it still requires the semi-colon. So it has become practice to just place it before the WITH as demonstrated.

    That's not fully accurate.

    A CTE in a View definition not only doesn't require a semicolon, it won't work with one.

    CREATE VIEW dbo.MyCTEView

    AS

    WITH CTE

    AS (SELECT *

    FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))

    SELECT *

    FROM CTE ;

    GO

    SELECT *

    FROM dbo.MyCTEView ;

    A CTE at the beginning of a batch doesn't require a preceding semicolon.

    WITH CTE

    AS (SELECT *

    FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))

    SELECT *

    FROM CTE

    A CTE as the first item in a stored procedure doesn't require one.

    CREATE PROC dbo.MyCTEProc

    AS

    WITH CTE

    AS (SELECT *

    FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))

    SELECT *

    FROM CTE

    GO

    EXEC dbo.MyCTEProc

    A CTE in the definition of an inline-select UDF doesn't require one and won't work with one (just like a View).

    They're only required when there's a prior DML or DDL statement in a script/batch/procedure/function, and the use is to terminate that item, not to start the CTE. The reason for this is that With is an overloaded word, which can be used to add options to some commands, or to define a CTE.

    select *

    from MyTable

    with CTEName as (CTE definition)

    Is that a syntax error in defining a table hint, or is it a CTE? The dev can tell, but the SQL compiler can't. Hence, a semicolon is needed to terminate the prior statement.

    If it's the first item in a batch, use one or not, at your option. I generally do, out of habit, but not out of necessity.

    Edit: Note, all scripts except the last one work. The last one deliberately does NOT work, and will generate an error, which is the point for it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (3/1/2012)


    SQLKnowItAll (3/1/2012)


    The semi-colon at the start of his query is actually bad when demonstrating the method used for creating a CTE. It's there to terminate the previous statement but is often written next to the "WITH" instead.

    To the OP, although the semi-colon is used to terminate a preceding statement... a CTE must ALWAYS begin with one. This is why many people put it before the WITH. If there is no preceding statement, it still requires the semi-colon. So it has become practice to just place it before the WITH as demonstrated.

    That's not fully accurate.

    A CTE in a View definition not only doesn't require a semicolon, it won't work with one.

    CREATE VIEW dbo.MyCTEView

    AS

    WITH CTE

    AS (SELECT *

    FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))

    SELECT *

    FROM CTE ;

    GO

    SELECT *

    FROM dbo.MyCTEView ;

    A CTE at the beginning of a batch doesn't require a preceding semicolon.

    WITH CTE

    AS (SELECT *

    FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))

    SELECT *

    FROM CTE

    A CTE as the first item in a stored procedure doesn't require one.

    CREATE PROC dbo.MyCTEProc

    AS

    WITH CTE

    AS (SELECT *

    FROM ( VALUES ( 1), ( 2) ) MyTVF (Col))

    SELECT *

    FROM CTE

    GO

    EXEC dbo.MyCTEProc

    A CTE in the definition of an inline-select UDF doesn't require one and won't work with one (just like a View).

    They're only required when there's a prior DML or DDL statement in a script/batch/procedure/function, and the use is to terminate that item, not to start the CTE. The reason for this is that With is an overloaded word, which can be used to add options to some commands, or to define a CTE.

    select *

    from MyTable

    with CTEName as (CTE definition)

    Is that a syntax error in defining a table hint, or is it a CTE? The dev can tell, but the SQL compiler can't. Hence, a semicolon is needed to terminate the prior statement.

    If it's the first item in a batch, use one or not, at your option. I generally do, out of habit, but not out of necessity.

    Edit: Note, all scripts except the last one work. The last one deliberately does NOT work, and will generate an error, which is the point for it.

    Well heck! That's good to know! I was "taught" that a CTE must always have the semi-colon. Those who teach... 🙂 Thanks G!

    Jared
    CE - Microsoft

  • To the question about the nested From statements:

    Those are what are called "derived tables".

    You can do those in the body of the "outer query":

    select *

    from

    (select *

    from dbo.MyTable) as DerivedTable ;

    or in a "Common Table Expression" (CTE):

    with CTE (ColumnNames) as

    (select *

    from dbo.MyTable)

    select *

    from CTE ;

    The advantage to CTEs is that they can be recursive. That means a CTE can query itself. This is used to query hierarchies, where each level in the hierarchy needs to query each level below it, till there are no more levels left, for example. There are other uses, but that's the most common good one. (There are a LOT of bad ways to use recursive CTEs, mainly because they can become serious performance killers.)

    The other advantage to CTEs is that many people find them easier to read than the inline ones in the From clause. Not everyone benefits from this, but it is common.

    Anything you can do in a regular query, you can do in a derived table. This can be very advantageous in breaking code up to make it more readable, without having to do things like nest views inside each other (which does the same thing as a derived table, but hides it, also often hiding performance-inhibiting code at the same time).

    Does that help?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thank you very much for taking the time to explain that - and - I understand it!

    I will read, re-read and write if for myself with different tables and columns so that I can learn how to apply it without getting a pen and paper out and drawing tables all over it.

    I've been using SQL Server for years now - and although I often join to what I call 'tables created on the fly with a select statement' - which, I think, is a 'derived table'(?), I had no idea you could use derived tables as you have - that reference the one above etc.

    I rarely use cursors in stored procedures - seem to manage set based for most things. But, sometimes, when needing to manipulate data on a one off basis - I would usually create a temp table - bung it in cursor, loop through, create a second temp table, loop through etc. - gradually working my way down to get the result set I need - so I can do an update or whatever.

    But your code has opened my eyes - I really must learn this stuff properly. It's a lot more interesting than boring .net front end stuff.

    Edit: this reply was supposed to be a reply to Cadavre for his explanation ... and I want to thank Cold Coffee again for his original response and further explanations .. and SQLKnowItAll and GSQuared (wow, beginning to sound like an acceptance speech at the Oscars!) - anyway many thanks to all.

    My real world example is a table with about 40,000 contacts belonging to about 3000 organisations where each organisation has between 1 and 10 locations - with the organisations / locations / contacts belong to one of about 200 projects. In the table (imported from another database) many of the contacts are assigned to lots of different locations for the same organisation. I need to update the data so that each contact is only assigned to one location for each organisation - and I want to assign them to the one that appears most often for that organisation.

    Now, I can do it without a severe headache and about 4 nested cursors and temp tables.

  • ColdCoffee (3/1/2012)


    I must also ask a question here. What should happen if there are 2 LocationIDs with same number of appearances for a particular OrgID ?

    Hey there folks. ColdCoffee asked a really valuable question here. If you think not, try using the following data and tell me whether or not the "1"s should be updated to 1,5 or 1,7.

    CREATE Table #Fred(OrganisationID int, LocationID int)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,2)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,2)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,7)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,7)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,7)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,5)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,5)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,5)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(1,6)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(2,18)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(2,19)

    INSERT INTO #Fred(OrganisationID, LocationID)

    VALUES(2,19)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQLKnowItAll (3/1/2012)


    To the OP, although the semi-colon is used to terminate a preceding statement... a CTE must ALWAYS begin with one. This is why many people put it before the WITH. If there is no preceding statement, it still requires the semi-colon. So it has become practice to just place it before the WITH as demonstrated.

    That's incorrect. If there is no preceding statement the CTE does not need a semi-colon. The reason it has become common practise is because people are misinformed.

    Jeff Moden (3/1/2012)


    ColdCoffee (3/1/2012)


    I must also ask a question here. What should happen if there are 2 LocationIDs with same number of appearances for a particular OrgID ?

    Hey there folks. ColdCoffee asked a really valuable question here. If you think not, try using the following data and tell me whether or not the "1"s should be updated to 1,5 or 1,7.

    Most definitely, I guess the OP missed it due to the multitude of posts.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply