Ranking number of occurrences

  • I am trying to add a column to a table that ranks the number of occurrences of repeated data. All I can currently find is how to count the repeated data, but I want the first occurrence to return 1, the second to return 2, and the third to return 3

    ID   Subtype    Column I am Trying to Create

    1       A                 1

    2      B                 1

    3       A                 1

    3       A                2

    4      B                 1

    5      A                 1

    5      B                 1

    6       A                 1

    6       A                2

    6     A                3

    7       A                 1

    8       A                 1

    9      B                 1

  • IF OBJECT_ID('tempdb..#Data', 'U') IS NOT NULL
    DROP TABLE #Data
    GO
    select *
    into #Data
    from (values
    (1,'A',1),
    (2,'B',1),
    (3,'A',1),
    (3,'A',2),
    (4,'B',1),
    (5,'A',1),
    (5,'B',1),
    (6,'A',1),
    (6,'A',2),
    (6,'A',3),
    (7,'A',1),
    (8,'A',1),
    (9,'B',1)) x(Id,SubType,Ref)

    GO

    select Id, SubType, Ref,
    ROW_NUMBER() OVER (PARTITION BY Id, SubType ORDER BY Id, SubType) ColumnIAmTryingToCreate
    from #Data
  • >> I am trying to add a column to a table that ranks the number of occurrences of repeated data. All I can currently find is how to count the repeated data, but I want the first occurrence to return 1, the second to return 2, and the third to return 3 <<

    The whole goal of normalization and RDBMS is to remove duplicated rows. Actually, that was the goal of all the databases RDBMS, network, hierarchical. We hate redundancy

    Since tables in SQL are based on sets, there is no such thing as a first, second, third, etc. occurrence because there is no ordering. I think what you are trying to do was implement Dr. Codd's "degree of duplication" concept.

    Please read a book on basic data modeling and RDBMS. You don't seem to know that by definition, a table must have a key. What you posted. Here is a bunch of punchcards written in SQL! There is no such thing as a generic "_id" in RDBMS; it has to be the identifier of something in particular. Since identifier is measured on a nominal scale, it can never be an integer. It is not a quantity or magnitude. The belief in a magic universal identifier is called Kabbalah magic has no place in RDBMS. Likewise, there is no such thing as a generic magic "subtype"; in fact, that's not how you would do type encoding in SQL at all. And what the heck is a "ref"? That's a pointer chain concept, and SQL doesn't use pointers. Here's an attempt at cleaning up your non-table.

    CREATE TABLE Foobars

    (foobar_id CHAR(2) NOT NULL,

    foobar_type CHAR(1) NOT NULL,

    dup_cnt INTEGER NOT NULL

    CHECK (dup_cnt >= 1)

    DEFAULT 1,

    PRIMARY KEY (foobar_id, foobar_type)); -- required, not optional

    Instead of repeating punchcards, your table should have looked like this in a properly designed schema:

    INSERT INTO Foobars

    VALUES

    ('01', 'A', 1),

    ('02', 'B', 1),

    ('03', 'A', 2),

    ('04', 'B', 1),

    ('05', 'A', 1),

    ('05', 'B', 1),

    ('06', 'A', 3),

    ('07', 'A', 1),

    ('08', 'A', 1),

    ('09', 'B', 1));

    Instead of materializing a whole new row for your redundant information, just increment the count of duplicates.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 wrote:

    >> The whole goal of normalization and RDBMS is to remove duplicated rows. Actually, that was the goal of all the databases RDBMS, network, hierarchical. We hate redundancy

    Since tables in SQL are based on sets, there is no such thing as a first, second, third, etc. occurrence because there is no ordering. I think what you are trying to do was implement Dr. Codd's "degree of duplication" concept.

    Please read a book on basic data modeling and RDBMS. You don't seem to know that by definition, a table must have a key. What you posted. Here is a bunch of punchcards written in SQL! There is no such thing as a generic "_id" in RDBMS; it has to be the identifier of something in particular. Since identifier is measured on a nominal scale, it can never be an integer. It is not a quantity or magnitude. The belief in a magic universal identifier is called Kabbalah magic has no place in RDBMS. Likewise, there is no such thing as a generic magic "subtype"; in fact, that's not how you would do type encoding in SQL at all. And what the heck is a "ref"? That's a pointer chain concept, and SQL doesn't use pointers. Here's an attempt at cleaning up your non-table.

    Thanks for the patronizing response. The response I marked as the Answer was able to provide exactly what I needed without making me feel like an idiot. I am new to SQL and trying to learn, sorry for inconveniencing you!

    • This reply was modified 3 years, 10 months ago by  zoeooo.
    • This reply was modified 3 years, 10 months ago by  zoeooo.
  • I've been teaching SQL and RDBMS for a few decades now. I spent 10 years of my life. The standards committee for the language and have 10 books on it that are still in print. I have done a few thousand articles in the trade press over the decades on this topic

    I am not trying to make you "feel like an idiot." I am trying to educate you. When is the last time you took a class from someone who actually created the tool you're working with the very beginning of its existence? Hey! That would be me. And instead of paying $$$$ for one of my classes, you're getting it for free. I am still kicking myself in the ass after all these years because I had a chance to go to the University of Texas, sneak into classrooms and learn from Ed Dykstra. I learned a teaching trick from him; when people are doing something wrong, tell them, and do not excuse them. I sometimes use his actual quote, "you're doing it completely wrong!" Followed by a detailed brutal explanation. Decades ago, I used to give people little apologies (that's not quite right, you're almost there, that's not considered the best way to do it, etc.) And I found it gave them the feeling that they were learning something it had done something right. So they kept repeating their mistakes over and over. I also used it. Just post solutions to problems without any comments. And people kept repeating their mistakes because they didn't know they were mistakes. Nobody told them.

    >> I am new to SQL and trying to learn, sorry for inconveniencing you! <<

    I wasn't inconvenienced. My code is still up to ANSI/ISO standards, follows industry conventions, and can stand scrutiny. Your code is what's inconvenient. Your code, as you're writing it now, is going to bite you in the ass. You probably got five more years of education in order to be a functional RDBMS/SQL programmer. When you get in the habit of writing good code, normalization will come without thought, pick the right data types for a column, and you'll be able to design a schema, the way a Master Mason can lay a straight course of brick automatically. Yes, it will probably take you that long to get to be competent. Did you expect to be an engineer after freshman year? To learn to speak Chinese in a week? Good, competent IT is the result of decades of learning and experience.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • I blocked him a long time ago and it has definitely improved my experience on this very valuable site.  I can't even read his response to your excellent reply to him.  Your question was a great one and the answer from Jonathan has armed you with the knowledge of the very cool partition function.  Carry on, welcome aboard and happy coding!  🙂

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply