Adding and totaling duplicate data from a single column

  • I have a large column with >1 million rows with 8 digit #'s in it (in varchar format). I need to sum all of the same #'s together and then take the top 300 totals afterwards. Like this:

    data set -

    column x column y

    a 3

    b 4

    a 3

    a 3

    c 5

    c 5

    output-

    a occurred 3 times

    b occurred 1 time

    c occurred 2 times

    (assume this repeats x ~5000, and then take only the top 300 occurrence #'s only)

    This is on SS2005. I'm a newb, but really want to learn more. That's why I joined SSC~! Help would be great. Thanks,

    Josh

  • You're going to have to do something like:

    DECLARE @test-2 TABLE (ColumnX char(1), ColumnY int);

    INSERT INTO @test-2

    SELECT 'a', 3 UNION ALL

    SELECT 'b', 4 UNION ALL

    SELECT 'a', 3 UNION ALL

    SELECT 'a', 3 UNION ALL

    SELECT 'c', 5 UNION ALL

    SELECT 'c', 5 UNION ALL

    SELECT 'd', 1 UNION ALL

    SELECT 'd', 2 UNION ALL

    SELECT 'd', 3 UNION ALL

    SELECT 'd', 4;

    SELECT TOP (3)

    ColumnX, Qty = count(*)

    FROM @test-2

    GROUP BY ColumnX

    ORDER BY Qty DESC;

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • but i cannot do "select 'a' and then 'union all' for millions of rows!

    is there no other way?

  • pharmboy4u (9/12/2010)


    but i cannot do "select 'a' and then 'union all' for millions of rows!

    is there no other way?

    The portion of Wayne's code that you are referencing is only there to populate test data to show you that the subsequent select statement works.

    Regards,

    Jason P. Burnett
    Senior DBA

  • Well, you guys are much better than me at this stuff. I need to query the table given to me, not create one, and I kinda get lost in the examples when you create it like that. Perhaps this will help, here is where I'm at currently with the code:

    select

    top(30) rx.BillCode, av.AccountNumber, av.Name, rx.RxNumber, rx.EnterUserID, qty = count(*)

    from PhaRx rx

    LEFT JOIN AdmVisits AS av

    ON rx.VisitID = av.VisitID

    where rx.BillCode is not null

    group by rx.RxNumber, rx.BillCode, rx.EnterUserID, av.AccountNumber, av.Name

    order by BillCode

    This gives me this type of response (well I took out the name column for privacy's sake):

    11010561Y0000025482001086858AWADKSHE1

    11010561Y0000026342001108246AGRAYLIN1

    11010561Y0000026988101136912APILLMEG1

    11010561Y0000036405501555991AGUIKELI 1

    11010561Y0000039795801702786AHAVEBAR1

    11010561Y00000077784U00854320AMILNMAR1

    11010561Y0000025575401065419ASNYDCHR1

    11010561Y0000022511301075360AGREENIK1

    11010561Y0000027415301151983AMCKNSAR1

    11010561Y0000027644201161643ABERBSAM1

    Now the first column is the billing code. As I've said above, there are MANY billing codes, but they all repeat from multiple ordering accounts. What I need to do is count how many times each billing code is present (in a given timeframe which I will tackle later) and take the top 300 or so totals of billing codes. eg- there are 2187 times billing code '11010561' was present, 2003 times billing code... etc, for the top 300 codes.

    Currently I'm stuck in that I can't even get them counted. In the above data, I would like it to say "10" for that billing #, as that is all that is present. But all I get is '1' for each row, and I cannot figure out how to count them.

    I've checked into the 'count', 'add', 'total' functions etc., and these do not seem to be what I need, at least not in the ways I've seen examples and tried to use them. There must be a way to do this, but I am stuck. Thank you for taking the time to help.

    Josh

  • So you want the total count by billcode?

    Try this:

    select

    top(30) rx.BillCode, av.AccountNumber, av.Name, rx.RxNumber, rx.EnterUserID, qty = count(*) OVER (PARTITON BY BillCode)

    from PhaRx rx

    LEFT JOIN AdmVisits AS av

    ON rx.VisitID = av.VisitID

    where rx.BillCode is not null

    order by BillCode

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • CELKO (9/14/2010)


    Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Please learn to follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect.

    Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. If you want to learn how to ask a question on a Newsgroup, look at: http://www.catb.org/~esr/faqs/smart-questions.html

    Celko,

    I apologize that I do not understand much here. Actually, I did read the smart-questions post here on a link I followed from GilaMonster, but it has many things in it I do not understand! Ha, so much for my smart-style questions.

    I am a Pharmacist actually. At our hospital I've been given access to our "Data Repository" (DR) with all the info therin and SQL Server 2005 to query it. I'm Brand New at this stuff. I took 2x4hr classes to just learn the basic "select *" stuff and some other small items. I have no rights to do anything but query in this system for basic pharmacy info. Fortunately I've been a computer geek my whole life, and I do love this stuff! It's addicting and I'm totally self-learning.

    You mentioned using "DDL" and "ISO" standard formats- I have absolutely no clue what that is. I am in the "newbie" section, right? This is my 3rd post ever. As far as standard code vs local dialect, I have a vague idea of some differences. Mainly because when I go google something to try and learn where I'm stuck, there are 5 version of the same thing out there, and I don't know how to parse this info! It is very confusion for a beginner, especially when I don't even know which "dialect" I can focus on for my system!

    (fyi- IT is no help in our hospital, the group that does the database stuff is not at our local system and does not respond to mere loser newbies like me for help.)

    I tried to research this issue for at least 3-4 hours online in several attempts trying different ways to use fuctions that I do not understand because I like problem solving. SQL Server "help" in the program has been absolutely no help for me, and you can get lost (or bad info) online.

    At any rate, I appreciate your help. And yours too Wayne! I will try to keep learning and try not to 'offend' anyone with my improper posts, but honestly I'm doing all I know to do. I will try these two examples and see what happens! Thanks,

    Josh

    ps- Given the little info I just related about myself and background, what would you suggest is the best possible learning source for someone like me who is self-learned and self-paced in doing so?

  • pharmboy4u (9/14/2010)


    CELKO (9/14/2010)


    Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Please learn to follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect.

    Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. If you want to learn how to ask a question on a Newsgroup, look at: http://www.catb.org/~esr/faqs/smart-questions.html

    Celko,

    I apologize that I do not understand much here. Actually, I did read the smart-questions post here on a link I followed from GilaMonster, but it has many things in it I do not understand! Ha, so much for my smart-style questions.

    I am a Pharmacist actually. At our hospital I've been given access to our "Data Repository" (DR) with all the info therin and SQL Server 2005 to query it. I'm Brand New at this stuff. I took 2x4hr classes to just learn the basic "select *" stuff and some other small items. I have no rights to do anything but query in this system for basic pharmacy info. Fortunately I've been a computer geek my whole life, and I do love this stuff! It's addicting and I'm totally self-learning.

    You mentioned using "DDL" and "ISO" standard formats- I have absolutely no clue what that is. I am in the "newbie" section, right? This is my 3rd post ever. As far as standard code vs local dialect, I have a vague idea of some differences. Mainly because when I go google something to try and learn where I'm stuck, there are 5 version of the same thing out there, and I don't know how to parse this info! It is very confusion for a beginner, especially when I don't even know which "dialect" I can focus on for my system!

    (fyi- IT is no help in our hospital, the group that does the database stuff is not at our local system and does not respond to mere loser newbies like me for help.)

    I tried to research this issue for at least 3-4 hours online in several attempts trying different ways to use fuctions that I do not understand because I like problem solving. SQL Server "help" in the program has been absolutely no help for me, and you can get lost (or bad info) online.

    At any rate, I appreciate your help. And yours too Wayne! I will try to keep learning and try not to 'offend' anyone with my improper posts, but honestly I'm doing all I know to do. I will try these two examples and see what happens! Thanks,

    Josh

    ps- Given the little info I just related about myself and background, what would you suggest is the best possible learning source for someone like me who is self-learned and self-paced in doing so?

    Josh,

    DDL (Data Definition Language) builds things in SQL. What Joe is looking for is the CREATE TABLE statements for the tables involved in the queries. ISO (I think) stands for International Standards Organization - there are a LOT of ISO standards for all kinds of data things, from currency, gender, etc. Sorta like the HIPAA standards for how to transfer data between the insurance companies and your pharmacy.

    For self-learning, everything is in "Books Online" (BOL). It's accessible by pressing the F1 key in any query window. However, finding what you're looking for isn't so easy. So, feel free to ask.

    Don't worry about your lack of SQL experience... we'll help anyway. I do recommend that you include that 4 sentence blurb starting with "I am a pharmacist" in your posts - this will clue people in what level you're at, and they will respond more, well, gentle to you. Plus, with this, they will give you step-by-step help in how to do things like generate that DDL if they really need it.

    Best of luck,

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • TY guys for the help. Celko, your code worked for what I was looking for. I do not understand the "DENSE_RANK() OVER" portion or how you referenced a little table inside another (is what is looks like), but this will give me another good opportunity to learn.

    If you can just whip this sort of thing out after looking at the bit of info you got in a few minutes, that is where I need to be!

    The hardest part I have in finding answers is knowing what to ask or look for! There's lots of stuff online, but I'm wasting a lot of time in the wrong places. Now the 'count' function is what I needed, I just didn't know how to use it right. I didn't come across any examples of using it like this online, so I keep wondering how I can learn these things.

    I will check into your reference material. TY both.

    Josh

    (ps- I will be back since this is the place to get my questions answered. I'll need a posting lesson too, besides the one I read and didn't understand!)

  • pharmboy4u (9/15/2010)


    TY guys for the help. Celko, your code worked for what I was looking for. I do not understand the "DENSE_RANK() OVER" portion or how you referenced a little table inside another (is what is looks like), but this will give me another good opportunity to learn.

    When we refer to ordinal functions, we're referring to a process that orders items by importance. Dense_Rank() gives the result set a rank, or priority, based on the returned result set. It's more complicated than this, but think of it like this:

    You have a bunch of anti-inflamatories on your counter that all look the same, but are different doses. You sort them by dose size, by whether they're OTC or prescription, etc. That's ranking.

    As far as the table within a table, that's circular referencing or recursion. If you want to drive a car, you need to have a license, but if you want to get a license, you need to know how to drive a car. <- Notice it's similar to, but not quite, an infinite loop. It has a natural break in the "need to know" part.

    Does that help you understand what you're looking at?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • It does help me, thank you both.

    I can look at the complete code and break it down backwards and understand fairly well what it is doing. It's like a picture you reverse engineer, and that's not too hard. The hard part for me is, when I don't have the code given to me of course, I don't know how to get where I need to go, or the little place I'm wrong, even if I'm soooo close.

    I have two huge problems so far in my experience:

    1. Syntax- if I miss a comma or don't know the placement of or when to use parenthesis, I'm cooked. I could be so close but I'll never get there, and the error messages in SS are worthless practically. (eg- if you have a select statement in line 1 of code with 3 selected columns but accidentally put a comma at the end of the 3rd thing, the error message will say the error is in LINE 2! No no SS, it is line 1!) Single quotes, double quotes, commas, parenthesis- I haven't seen any good info on explaining proper syntax with these things, and it all changes with different fucntions.

    (Celko- I'm gonna look at that reading material sometime, and if it isn't up to snuff, I'm going to write my own dang book for newbies! I know exactly what they need at the moment, and I'm not advanced enough to lose them in the code jargon knowing that this will only hurt the understanding of the poor souls!)

    2. Functions- I don't know the many kinds there are or what many do. In this instance I looked at ~15 examples of the "count" function figuring that's what I want to do with this problem. Duh! I want to count them! And I was right in the end about that, but not knowing the different ways to utilize the function cooked me. I kept getting "1" for every row, because as Celko said, each row is unique, and I couldn't get past that mental road block which lead me to try looking at other functions to get what I needed, which went even worse.

    At any rate, I am enjoying the learning. Now I'm trying to get a description of all the billing codes in a different row associated with the totaled #'s (b/c each billing code its own description), and I'm getting all kinds of trouble. Ha! Another fine example of just adding 1 more thing which probably necessitates changing a lot of what I currently have in code, and I just don't know how or why yet. But I don't want help on this one yet, I'm going to keep playing with it for a while.

    If I come back in 3-4 weeks with my tail between my legs, you'll know I failed!

    I appreciate all your time and info guys, this is a great place to be...

    Josh

  • Regarding syntax & such, check out a book called "SQL Queries for Mere Mortals." It's a wonderful reference to have handy.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply