grouping/rollup help

  • I am trying to a non aggregate rollup... sort of.

    I have columns for student, term1, term2, term3, etc.

    The data under term is actually a course that is being taken that term so there can be multiple rows for a student. I create the query dynamically based on a date parameter. All of this works fine except I am getting duplicated rows that I can't get rid of using DISTINCT. I am POSITIVE every one of the columns is identical to another row. I have tried using isnull around every data column to make sure i have a blank string in the column (in the back of my head I always have the little voice that says "a NULL may not be equal to another NULL").

    Now the REALLY UGLY part. Typically I'm going beyond 32 columns so SQL throws up with the can't have more than 32 groups error. I can do this manually in Excel but thats manual. I don't like doing things twice or even having to use Excel to manipulate data.

    Any ideas? I can't really post the code as the company I work for is really paranoid about things getting posted publicly.

    TIA,

    jb

  • Completely duplicate rows, where all columns are the same, will be eliminted by Distinct (at a cost to performance, of course). So there's got to be something different about the rows, or something misdefined in the query.

    Without seeing the query, I really can't help much.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • OK, I've gotten rid of the duplicates using a case but now I have data that looks like this:

    ID COL1 COL2 COL3 COL4 COL5

    10 AA NULL NULL NULL NULL

    10 NULL NULL NULL DD NULL

    10 NULL NULL CC NULL NULL

    11 NULL NULL NULL AA NULL

    11 NULL NULL DD NULL NULL

    11 NULL NULL NULL BB NULL

    10 BB NULL NULL DD NULL

    10 NULL EE NULL NULL NULL

    What I NEED to show is

    10 AA EE CC DD NULL

    10 BB NULL NULL NULL NULL

    11 NULL NULL DD AA NULL

    11 NULL NULL NULL BB NULL

    Major pain in the _ _ _ _

    Any ideas?????

    TIA

    jb

  • How does is the system supposed to tell that the rows that start with 10, some of them go with the AA first value, others go with the BB first value?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Get the first column value for the row. If the first column has the same value in the next row,move to the next row, do parse else do nextid

    parse

    Start at the second column if the value in the current row is not null and the value in the same column in the previous row is null copy the not null value to the previous column

    repeat the above until no more columns or a not null value is found and the column in the previous row is not null

    if no more columns then delete the current row move to the next row.

    As you can see I can do it in code but I'm looking at 48-250K rows to step thru.

    Make sense?

  • (I've been out for a week, and nobody else picked this one up, so maybe you've already resolved this or don't need help on it any more.)

    Is there a column available that tells SQL what sequence the rows are in? Like an ID or a date-and-time-inserted? Without something like that, SQL Server can't know which rows are "next" and "previous".

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi JBANKO,

    I'm afraid your description is quite messy and inconclusive.

    It does not explain the logic whaich must be used to get the desired set.

    You say:

    What I NEED to show is

    10 AA EE CC DD NULL

    10 BB NULL NULL NULL NULL

    11 NULL NULL DD AA NULL

    11 NULL NULL NULL BB NULL

    But why this way?

    Would it be correct if it would be

    10 AA NULL NULL NULL NULL

    10 BB EE CC DD NULL

    11 NULL NULL AA NULL NULL

    11 NULL NULL DD BB NULL

    ?

    Or:

    10 AA CC DD EE NULL

    10 BB NULL NULL NULL NULL

    11 AA DD NULL NULL NULL

    11 BB NULL NULL NULL NULL

    ?

    Or:

    10 AA BB CC DD EE

    11 AA BB DD NULL NULL

    ?

    _____________
    Code for TallyGenerator

  • thanks for all the responses. yes it's random when the data gets changed (at the whim of an academic advisor) so unfortunately I had to do it in excel, works with a parser as well but in either case, YUK. Oh well, on to the next silly task.

    Thanks again,

    Joe B

  • whoops in response to the messy inconclusive.... yes it is. The first column is an id and no it doesn't matter, the ordering going down as long as the ids are grouped, but is critical going across. If u think what I presented is ugly you ought to see the real data. Think of it this way, you have classes (represented by the AA,BB,etc). Across the top you have the school term a class can be given (which is why AA is in the first and fourth columns). For student 10 you want to see his courses for a particular term, for some terms he has more than one class, for another term he has one, and in another he's a slacker and doesn't take anything so he can go to the south of France with his family. The ugliness comes when he makes a decision to change something while in France and the school has to move things around because his family donated a new wing to the school and he has priority over other students. So if in one class there are too many students someone gets bumped to allow for him. Real reasons would be to keep him with his graduate research group, or a specific course advisor or to take advantage of a scholarship, or 20 other reasons. Of course what happens to the bumped student. Can't hire another professor or use resources for one student (yes there are rules for class min/max and how flexible those limits are). So based on his major you try to find another place to put him or find if there are other students who got bumped from the same class and now there are enough to meet a minimum.

    PHEWWWwwwwwww......

    That whole decision process uses these lists to optimize the schedule for a term. And of course if an instructor decides to take some time off he fires the whole process off again. Like I said, ugly but a fact of life and one that requires the lists to be processed as quickly as possible and of course automatically.

    But then again thats what I get paid to figure out.

    Thanks again for the responses!

    Joe B

  • Yeah, that's what you get whet you try to sort out French...

    🙂

    If seriously,

    in your data sample:

    10, 11 are "students",

    Col1, Col2, Col3 contain list of courses taken.

    Where are terms?

    _____________
    Code for TallyGenerator

  • Col1, Col2,.... Are the terms.

    Classes for the student for each term go under the term they are scheduled to be taken in and of course you can take more than 1 in a particular term.

    :w00t::w00t::w00t::w00t:

    Joe B

  • Normalize your data and all your problems will go away.

    🙂

    Put the data into:

    (

    TermID,

    StudentID,

    CourseID

    )

    Then you're not gonna have a problem running queries for:

    - list of courses for any student on any term,

    - number of students taken any course on any term;

    - courses with sufficient and insufficient number of students on any term;

    - you name it.

    You may even build a report in the format you've indicated in the initial post.

    But I bet you won't need it anymore.

    _____________
    Code for TallyGenerator

  • In reality

    Student FALL2012 WINT2012 SPRN2012

    10 Bio Chem Math

    10 Hist Gov NULL

    11 Gov Hist Chem

    11 NULL NULL Math

    12 Math Chem Gov

    10 is taking Bio and Hist in the fall, Chem and Gov in the winter, and just Math in the spring

    11 is taking Gov in the fall, Hist in the winter, Chem and Math in the spring, etc, etc

    Make sense now?

  • Two things kinda negate that. The tables are inviolate. They are what they are and can't be changed, been there since the 5th day of creation. The people who have to look at this data want it formatted that way

    :crazy::crazy::crazy::crazy::crazy::crazy:

    jb

  • JBANKO (8/30/2011)


    But then again thats what I get paid to figure out.

    Joe B

    ok...forgive me if I have missed some posts, but I do not see any set up scripts or test data that will help us help you...(pls see link in my sig )

    neither do I see any attempts at coding from yourself ???

    .....so you are asking us to help you so that you can get paid :-):-D:-P

    <rant over>...I see you are new around here...so for future posts, please please post some scripts that make it easy for us

    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    pls try following script....this may give you a few ideas ...but only you know the number of possible terms/courses/students

    it has no indexes and can probably be optimised ...but for now

    USE [tempdb]

    GO

    IF OBJECT_ID('tempdb..jbanko','U') IS NOT NULL DROP TABLE jbanko;

    CREATE TABLE [dbo].[jbanko](

    [ID] [int] NULL,

    [Term1] [varchar](2) NULL,

    [Term2] [varchar](2) NULL,

    [Term3] [varchar](2) NULL,

    [Term4] [varchar](2) NULL

    ) ON [PRIMARY]

    GO

    INSERT INTO [dbo].[jbanko]([ID], [Term1], [Term2], [Term3], [Term4])

    SELECT 10, 'AA', NULL, NULL, NULL UNION ALL

    SELECT 10, NULL, NULL, NULL, 'DD' UNION ALL

    SELECT 10, NULL, NULL, 'CC', NULL UNION ALL

    SELECT 11, NULL, NULL, NULL, 'AA' UNION ALL

    SELECT 11, NULL, NULL, 'DD', NULL UNION ALL

    SELECT 11, NULL, NULL, NULL, 'BB' UNION ALL

    SELECT 10, 'BB', NULL, NULL, NULL UNION ALL

    SELECT 10, NULL, 'EE', NULL, NULL

    ;with cte as

    (

    SELECT ID,[term],[course],

    ROW_NUMBER () OVER (PARTITION BY ID,[term] ORDER BY ID, course) rn

    from

    (

    SELECT ID

    ,ca.*

    FROM jbanko

    CROSS APPLY ( VALUES ('Term1', term1),

    ('Term2', term2),

    ('Term3', term3),

    ('Term4', term4)

    )

    ca ([term], [course])

    WHERE course <> 'NULL'

    ) x

    )

    SELECT ID,

    MAX(case when term = 'Term1' then [course] else '' end) as T1,

    MAX(case when term = 'Term2' then [course] else '' end) as T2,

    MAX(case when term = 'Term3' then [course] else '' end) as T3,

    MAX(case when term = 'Term4' then [course] else '' end) as T4

    FROM CTE

    GROUP BY ID, rn

    ORDER BY ID, rn

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply