August 18, 2011 at 11:04 am
I am trying to a non aggregate rollup... sort of.
I have columns for student, term1, term2, term3, etc.
The data under term is actually a course that is being taken that term so there can be multiple rows for a student. I create the query dynamically based on a date parameter. All of this works fine except I am getting duplicated rows that I can't get rid of using DISTINCT. I am POSITIVE every one of the columns is identical to another row. I have tried using isnull around every data column to make sure i have a blank string in the column (in the back of my head I always have the little voice that says "a NULL may not be equal to another NULL").
Now the REALLY UGLY part. Typically I'm going beyond 32 columns so SQL throws up with the can't have more than 32 groups error. I can do this manually in Excel but thats manual. I don't like doing things twice or even having to use Excel to manipulate data.
Any ideas? I can't really post the code as the company I work for is really paranoid about things getting posted publicly.
TIA,
jb
August 18, 2011 at 11:12 am
Completely duplicate rows, where all columns are the same, will be eliminted by Distinct (at a cost to performance, of course). So there's got to be something different about the rows, or something misdefined in the query.
Without seeing the query, I really can't help much.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 18, 2011 at 1:48 pm
OK, I've gotten rid of the duplicates using a case but now I have data that looks like this:
ID COL1 COL2 COL3 COL4 COL5
10 AA NULL NULL NULL NULL
10 NULL NULL NULL DD NULL
10 NULL NULL CC NULL NULL
11 NULL NULL NULL AA NULL
11 NULL NULL DD NULL NULL
11 NULL NULL NULL BB NULL
10 BB NULL NULL DD NULL
10 NULL EE NULL NULL NULL
What I NEED to show is
10 AA EE CC DD NULL
10 BB NULL NULL NULL NULL
11 NULL NULL DD AA NULL
11 NULL NULL NULL BB NULL
Major pain in the _ _ _ _
Any ideas?????
TIA
jb
August 19, 2011 at 6:45 am
How does is the system supposed to tell that the rows that start with 10, some of them go with the AA first value, others go with the BB first value?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 19, 2011 at 8:14 am
Get the first column value for the row. If the first column has the same value in the next row,move to the next row, do parse else do nextid
parse
Start at the second column if the value in the current row is not null and the value in the same column in the previous row is null copy the not null value to the previous column
repeat the above until no more columns or a not null value is found and the column in the previous row is not null
if no more columns then delete the current row move to the next row.
As you can see I can do it in code but I'm looking at 48-250K rows to step thru.
Make sense?
August 29, 2011 at 6:12 am
(I've been out for a week, and nobody else picked this one up, so maybe you've already resolved this or don't need help on it any more.)
Is there a column available that tells SQL what sequence the rows are in? Like an ID or a date-and-time-inserted? Without something like that, SQL Server can't know which rows are "next" and "previous".
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 29, 2011 at 6:54 pm
Hi JBANKO,
I'm afraid your description is quite messy and inconclusive.
It does not explain the logic whaich must be used to get the desired set.
You say:
What I NEED to show is
10 AA EE CC DD NULL
10 BB NULL NULL NULL NULL
11 NULL NULL DD AA NULL
11 NULL NULL NULL BB NULL
But why this way?
Would it be correct if it would be
10 AA NULL NULL NULL NULL
10 BB EE CC DD NULL
11 NULL NULL AA NULL NULL
11 NULL NULL DD BB NULL
?
Or:
10 AA CC DD EE NULL
10 BB NULL NULL NULL NULL
11 AA DD NULL NULL NULL
11 BB NULL NULL NULL NULL
?
Or:
10 AA BB CC DD EE
11 AA BB DD NULL NULL
?
_____________
Code for TallyGenerator
August 30, 2011 at 8:36 am
thanks for all the responses. yes it's random when the data gets changed (at the whim of an academic advisor) so unfortunately I had to do it in excel, works with a parser as well but in either case, YUK. Oh well, on to the next silly task.
Thanks again,
Joe B
August 30, 2011 at 9:02 am
whoops in response to the messy inconclusive.... yes it is. The first column is an id and no it doesn't matter, the ordering going down as long as the ids are grouped, but is critical going across. If u think what I presented is ugly you ought to see the real data. Think of it this way, you have classes (represented by the AA,BB,etc). Across the top you have the school term a class can be given (which is why AA is in the first and fourth columns). For student 10 you want to see his courses for a particular term, for some terms he has more than one class, for another term he has one, and in another he's a slacker and doesn't take anything so he can go to the south of France with his family. The ugliness comes when he makes a decision to change something while in France and the school has to move things around because his family donated a new wing to the school and he has priority over other students. So if in one class there are too many students someone gets bumped to allow for him. Real reasons would be to keep him with his graduate research group, or a specific course advisor or to take advantage of a scholarship, or 20 other reasons. Of course what happens to the bumped student. Can't hire another professor or use resources for one student (yes there are rules for class min/max and how flexible those limits are). So based on his major you try to find another place to put him or find if there are other students who got bumped from the same class and now there are enough to meet a minimum.
PHEWWWwwwwwww......
That whole decision process uses these lists to optimize the schedule for a term. And of course if an instructor decides to take some time off he fires the whole process off again. Like I said, ugly but a fact of life and one that requires the lists to be processed as quickly as possible and of course automatically.
But then again thats what I get paid to figure out.
Thanks again for the responses!
Joe B
August 31, 2011 at 4:49 am
Yeah, that's what you get whet you try to sort out French...
🙂
If seriously,
in your data sample:
10, 11 are "students",
Col1, Col2, Col3 contain list of courses taken.
Where are terms?
_____________
Code for TallyGenerator
August 31, 2011 at 11:58 am
Col1, Col2,.... Are the terms.
Classes for the student for each term go under the term they are scheduled to be taken in and of course you can take more than 1 in a particular term.
:w00t::w00t::w00t::w00t:
Joe B
August 31, 2011 at 4:01 pm
Normalize your data and all your problems will go away.
🙂
Put the data into:
(
TermID,
StudentID,
CourseID
)
Then you're not gonna have a problem running queries for:
- list of courses for any student on any term,
- number of students taken any course on any term;
- courses with sufficient and insufficient number of students on any term;
- you name it.
You may even build a report in the format you've indicated in the initial post.
But I bet you won't need it anymore.
_____________
Code for TallyGenerator
September 5, 2011 at 11:51 am
In reality
Student FALL2012 WINT2012 SPRN2012
10 Bio Chem Math
10 Hist Gov NULL
11 Gov Hist Chem
11 NULL NULL Math
12 Math Chem Gov
10 is taking Bio and Hist in the fall, Chem and Gov in the winter, and just Math in the spring
11 is taking Gov in the fall, Hist in the winter, Chem and Math in the spring, etc, etc
Make sense now?
September 5, 2011 at 12:02 pm
Two things kinda negate that. The tables are inviolate. They are what they are and can't be changed, been there since the 5th day of creation. The people who have to look at this data want it formatted that way
:crazy::crazy::crazy::crazy::crazy::crazy:
jb
September 5, 2011 at 6:23 pm
JBANKO (8/30/2011)
But then again thats what I get paid to figure out.Joe B
ok...forgive me if I have missed some posts, but I do not see any set up scripts or test data that will help us help you...(pls see link in my sig )
neither do I see any attempts at coding from yourself ???
.....so you are asking us to help you so that you can get paid :-):-D:-P
<rant over>...I see you are new around here...so for future posts, please please post some scripts that make it easy for us
http://www.sqlservercentral.com/articles/Best+Practices/61537/
pls try following script....this may give you a few ideas ...but only you know the number of possible terms/courses/students
it has no indexes and can probably be optimised ...but for now
USE [tempdb]
GO
IF OBJECT_ID('tempdb..jbanko','U') IS NOT NULL DROP TABLE jbanko;
CREATE TABLE [dbo].[jbanko](
[ID] [int] NULL,
[Term1] [varchar](2) NULL,
[Term2] [varchar](2) NULL,
[Term3] [varchar](2) NULL,
[Term4] [varchar](2) NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[jbanko]([ID], [Term1], [Term2], [Term3], [Term4])
SELECT 10, 'AA', NULL, NULL, NULL UNION ALL
SELECT 10, NULL, NULL, NULL, 'DD' UNION ALL
SELECT 10, NULL, NULL, 'CC', NULL UNION ALL
SELECT 11, NULL, NULL, NULL, 'AA' UNION ALL
SELECT 11, NULL, NULL, 'DD', NULL UNION ALL
SELECT 11, NULL, NULL, NULL, 'BB' UNION ALL
SELECT 10, 'BB', NULL, NULL, NULL UNION ALL
SELECT 10, NULL, 'EE', NULL, NULL
;with cte as
(
SELECT ID,[term],[course],
ROW_NUMBER () OVER (PARTITION BY ID,[term] ORDER BY ID, course) rn
from
(
SELECT ID
,ca.*
FROM jbanko
CROSS APPLY ( VALUES ('Term1', term1),
('Term2', term2),
('Term3', term3),
('Term4', term4)
)
ca ([term], [course])
WHERE course <> 'NULL'
) x
)
SELECT ID,
MAX(case when term = 'Term1' then [course] else '' end) as T1,
MAX(case when term = 'Term2' then [course] else '' end) as T2,
MAX(case when term = 'Term3' then [course] else '' end) as T3,
MAX(case when term = 'Term4' then [course] else '' end) as T4
FROM CTE
GROUP BY ID, rn
ORDER BY ID, rn
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply