December 20, 2013 at 5:00 am
I've one scenario:
create table dbo.T_Master(StudentId int, StudentName varchar(50), StudentAge int)
go
create table dbo.T_Detail(StudentId int, ColX varchar(20), ColY varchar(20), Subject varchar(20), Marks int)
go
Then I populated the data like this:
insert into dbo.T_Master select 1001, 'James', 14
insert into dbo.T_Master select 1002, 'Raghav', 15
insert into dbo.T_Master select 1003, 'Maxx', 14
insert into dbo.T_Master select 1004, 'Aaron', 14
insert into dbo.T_Master select 1005, 'Jamie', 15
insert into dbo.T_Detail select 1001, 'ColXdata A', 'ColYdata A', 'English', 60
insert into dbo.T_Detail select 1001, 'ColXdata A', 'ColYdata A', 'Maths', 80
insert into dbo.T_Detail select 1001, 'ColXdata A', 'ColYdata A', 'Science', 45
insert into dbo.T_Detail select 1002, 'ColXdata B', 'ColYdata B', 'English', 65
insert into dbo.T_Detail select 1002, 'ColXdata B', 'ColYdata B', 'Maths', 75
insert into dbo.T_Detail select 1002, 'ColXdata B', 'ColYdata B', 'Science', 60
insert into dbo.T_Detail select 1003, 'ColXdata C', 'ColYdata C', 'English', 55
insert into dbo.T_Detail select 1003, 'ColXdata C', 'ColYdata C', 'Maths', 90
insert into dbo.T_Detail select 1003, 'ColXdata C', 'ColYdata C', 'Science', 65
insert into dbo.T_Detail select 1004, 'ColXdata D', 'ColYdata D', 'English', 70
insert into dbo.T_Detail select 1004, 'ColXdata D', 'ColYdata D', 'Maths', 60
insert into dbo.T_Detail select 1004, 'ColXdata D', 'ColYdata D', 'Science', 60
insert into dbo.T_Detail select 1005, 'ColXdata E', 'ColYdata E', 'English', 75
insert into dbo.T_Detail select 1005, 'ColXdata E', 'ColYdata E', 'Maths', 55
insert into dbo.T_Detail select 1005, 'ColXdata E', 'ColYdata E', 'Science', 80
Now I want to have the data by joining above two tables like this:
A.StudentIdA.StudentNameA.StudentAgeB.ColXB.ColYSubject1Marks1Subject2Marks2Subject3Marks3
1001James14ColXdata AColYdata AEnglish60Maths80Science45
I am trying but not getting success. Also please consider that this is a parallel demo scenario created for student scenario. In actual, both the tables contain huge number of records (in millions).
December 20, 2013 at 7:53 am
Thanks for the ddl and sample data!!! What you are describing is a cross tab or pivot table. I generally use the cross tab approach because I find the syntax easier to read and the performance is more consistent. There are two different types of cross tabs. There is the static cross tab which you use when you know how many sets you have. The other is the dynamic cross tab which you use when you don't know how many sets you have.
I put together one version of doing this with a static cross tab.
with SortedData as
(
select m.StudentId, m.StudentName, m.StudentAge, d.ColX, d.ColY, d.Subject, d.Marks, ROW_NUMBER() over(partition by m.StudentID order by d.Subject) as RowNum
from dbo.T_Master m
join dbo.T_Detail d on m.StudentID = d.StudentId
)
select StudentId, StudentName, StudentAge,
MAX(case when RowNum = 1 then Subject end) as Subject1,
MAX(case when RowNum = 1 then Marks end) as Marks1,
MAX(case when RowNum = 2 then Subject end) as Subject2,
MAX(case when RowNum = 2 then Marks end) as Marks2,
MAX(case when RowNum = 3 then Subject end) as Subject3,
MAX(case when RowNum = 3 then Marks end) as Marks3
from SortedData s
group by StudentId, StudentName, StudentAge
If you need the dynamic version I will leave that one up to you to start working on. You can read more about both types of cross tabs by following the links in my signature.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 20, 2013 at 8:20 am
Sean;
Interesting solution... I had something much like yours getting the same results the only difference is I split out the 3 different subjects into separate CTEs then joining them back much like you did in your query. One thing you forgot (and so did I) were the ColX & ColY columns.
To solve that it can be done this way:
with SortedData as
(
select m.StudentId, m.StudentName, m.StudentAge, d.ColX, d.ColY, d.Subject, d.Marks, ROW_NUMBER() over(partition by m.StudentID order by d.Subject) as RowNum
from @T_Master m
join @T_Detail d on m.StudentID = d.StudentId
)
select StudentId, StudentName, StudentAge,
MAX(case when RowNum = 1 then ColX end) as ColX,
MAX(case when RowNum = 1 then ColY end) as ColY,
MAX(case when RowNum = 1 then Subject end) as Subject1,
MAX(case when RowNum = 1 then Marks end) as Marks1,
MAX(case when RowNum = 2 then Subject end) as Subject2,
MAX(case when RowNum = 2 then Marks end) as Marks2,
MAX(case when RowNum = 3 then Subject end) as Subject3,
MAX(case when RowNum = 3 then Marks end) as Marks3
from SortedData s
group by StudentId, StudentName, StudentAge
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
December 20, 2013 at 8:25 am
Kurt W. Zimmerman (12/20/2013)
Sean;Interesting solution... I had something much like yours getting the same results the only difference is I split out the 3 different subjects into separate CTEs then joining them back much like you did in your query. One thing you forgot (and so did I) were the ColX & ColY columns.
To solve that it can be done this way:
with SortedData as
(
select m.StudentId, m.StudentName, m.StudentAge, d.ColX, d.ColY, d.Subject, d.Marks, ROW_NUMBER() over(partition by m.StudentID order by d.Subject) as RowNum
from @T_Master m
join @T_Detail d on m.StudentID = d.StudentId
)
select StudentId, StudentName, StudentAge,
MAX(case when RowNum = 1 then ColX end) as ColX,
MAX(case when RowNum = 1 then ColY end) as ColY,
MAX(case when RowNum = 1 then Subject end) as Subject1,
MAX(case when RowNum = 1 then Marks end) as Marks1,
MAX(case when RowNum = 2 then Subject end) as Subject2,
MAX(case when RowNum = 2 then Marks end) as Marks2,
MAX(case when RowNum = 3 then Subject end) as Subject3,
MAX(case when RowNum = 3 then Marks end) as Marks3
from SortedData s
group by StudentId, StudentName, StudentAge
Kurt
I included ColX and ColY in my output. I wasn't quite sure what to do with those actually because in the OP stated output they were only listed once. I just decided to use MAX and figure I would let the OP figure out what to do with that one.
You should post your solution. It may be we can come up with something between us better than either of us could do solo.
--EDIT--
I just looked at my post again and I didn't have those two columns in there. I had them in the SortedData cte but not in the final result. 😉
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 20, 2013 at 8:36 am
Here is my solution:
with details_English_cte (StudentID, Subject, Marks) as (
select StudentId
,Subject
,Marks
from dbo.T_Detail
where Subject = 'English'
),
details_maths_cte (StudentID, Subject, Marks) as (
select StudentId
,Subject
,Marks
from dbo.T_Detail
where Subject = 'Maths'
),
details_Science_cte (StudentID, Subject, Marks) as (
select StudentId
,Subject
,Marks
from dbo.T_Detail
where Subject = 'Science'
),
ColXY_cte (StudentID, Colx, Coly) as (
select distinct StudentID
,ColX
,ColY
from dbo.T_Detail
)
select mstr.StudentId
, mstr.StudentName
, mstr.StudentAge
, ColXY.Colx
, ColXY.Coly
, English.Subject as Subject1
, English.Marks as Marks1
, Maths.Subject as Subject2
, Maths.Marks as Marks2
, Science.Subject as Subject3
, Science.Marks as Marks3
from dbo.T_Master mstr inner join
ColXY_cte ColXY on mstr.StudentID = ColXY.StudentID inner join
details_English_cte English on mstr.StudentId = English.StudentID inner join
details_maths_cte Maths on mstr.StudentId = Maths.StudentID inner join
details_Science_cte Science on mstr.StudentId = Science.StudentID;
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
December 20, 2013 at 8:58 am
Thanks Sean/Kurt for the wonderful query.
One update to the requirement and one query.
Modification is that there might be some students who have only one or two or all of the three subjects (No of subjkects may vary). In that case, will it work too ? ColX & ColY will have same values so no issues with that.
Query is performance wise. I've created this scenario only as example. In reality it will be required to fetch data from the join where both the tables has around 10-20 billion records. In that case, will we need to tune this query ? Though i'll check at my end as well.
December 20, 2013 at 9:10 am
sqlnaive (12/20/2013)
Thanks Sean/Kurt for the wonderful query.One update to the requirement and one query.
Modification is that there might be some students who have only one or two or all of the three subjects (No of subjkects may vary). In that case, will it work too ? ColX & ColY will have same values so no issues with that.
Query is performance wise. I've created this scenario only as example. In reality it will be required to fetch data from the join where both the tables has around 10-20 billion records. In that case, will we need to tune this query ? Though i'll check at my end as well.
The one I posted will handle any number of t_Detail values and there is no requirement to have 3.
Try this.
insert T_Master
select 1006, 'Sean', 45
insert T_Detail
select 1006, 'ColX', 'ColY', 'Writing', 33
Then rerun the query I posted. It will just return NULL for the second and third groups.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 20, 2013 at 9:25 am
Sean Lange (12/20/2013)
sqlnaive (12/20/2013)
Thanks Sean/Kurt for the wonderful query.One update to the requirement and one query.
Modification is that there might be some students who have only one or two or all of the three subjects (No of subjkects may vary). In that case, will it work too ? ColX & ColY will have same values so no issues with that.
Query is performance wise. I've created this scenario only as example. In reality it will be required to fetch data from the join where both the tables has around 10-20 billion records. In that case, will we need to tune this query ? Though i'll check at my end as well.
The one I posted will handle any number of t_Detail values and there is no requirement to have 3.
Try this.
insert T_Master
select 1006, 'Sean', 45
insert T_Detail
select 1006, 'ColX', 'ColY', 'Writing', 33
Then rerun the query I posted. It will just return NULL for the second and third groups.
It is important to understand the scope of your data so that you can determine what solution will be the one that works. If you know there will be a finite number of subjects then either solution will work. If the subjects vary then developing a more robust solution, such as Sean has pointed out is the direction you need to go.
These are the kinds of nuances you will learn to ask for as you are given requirements to develop solutions. Often times you may develop a solution thinking it is the correct solution only to find out that someone failed to mention something that may be subtle to them but change the entire design of your solution. Happens all the time.
More experienced developers know the right questions to ask to get as much if not all of the requirements to develop a solution.
As Sean and I have pointed out there are multiple ways to solve a single solution. You will see that both solutions gives you the correct answer and yet are developed in a way that from a developers standpoint is somewhat different.
Once you have a solution it is important to insure the solution will work in accordance to the requirements. The solution will need to be tested to insure the end result is what is to be expected. Quite often a solution will have to be reworked because the end result is not what is expected. That is the cycle of development & unit testing and quality control which is commonplace in any development shop.
All the best and good luck.
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
December 23, 2013 at 2:03 am
The solution provided looks good. However in my actual scenario, there are around 15-20 columns in master table (all have to be included) and around 30 columns in details (of which 15 have to be included). Both the tables contain around 10-20 million records. I'm doing a normal join like this:
insert into table3
(col1, col2....coln)
select a.StudentId, a.col1, a.col2... a.col15,
b.col1 as col16, b.col2 as col17.... b.col15 as col30,
b.subject as Subject1, b.marks as sub1_marks,
c.subject as Subject2, c.marks as sub2_marks,
d.subject as Subject3, d.marks as sub3_marks
from dbo.t_master a
left outer join dbo.t_detail b on a.StudentId = b.StudentId and b.Subject = 'English'
left outer join dbo.t_detail c on a.StudentId = c.StudentId and c.Subject = 'Maths'
left outer join dbo.t_detail d on a.StudentId = d.StudentId and d.Subject = 'Science'
Is it possible to do it in efficient way ? With the number of records i have in both master and details table, it's taking too much time and the query you provided is taking around 15-20 mins more than this query. Any help would be appreciated. We have clustered indexes on Studentid on both tables.
December 23, 2013 at 9:09 pm
sqlnaive (12/23/2013)
The solution provided looks good. However in my actual scenario, there are around 15-20 columns in master table (all have to be included) and around 30 columns in details (of which 15 have to be included). Both the tables contain around 10-20 million records. I'm doing a normal join like this:
insert into table3
(col1, col2....coln)
select a.StudentId, a.col1, a.col2... a.col15,
b.col1 as col16, b.col2 as col17.... b.col15 as col30,
b.subject as Subject1, b.marks as sub1_marks,
c.subject as Subject2, c.marks as sub2_marks,
d.subject as Subject3, d.marks as sub3_marks
from dbo.t_master a
left outer join dbo.t_detail b on a.StudentId = b.StudentId and b.Subject = 'English'
left outer join dbo.t_detail c on a.StudentId = c.StudentId and c.Subject = 'Maths'
left outer join dbo.t_detail d on a.StudentId = d.StudentId and d.Subject = 'Science'
Is it possible to do it in efficient way ? With the number of records i have in both master and details table, it's taking too much time and the query you provided is taking around 15-20 mins more than this query. Any help would be appreciated. We have clustered indexes on Studentid on both tables.
Did you try the approach I posted? With that amount of data it is more likely to perform better because you don't have so many table scans. Read the two articles I suggested about cross tabs in my signature.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 24, 2013 at 8:01 am
SELECT T1.StudentID,StudentName, StudentAge,ColX,Coly,
MAX(CASE WHEN Subject = 'English' THEN 'English' ELSE '' END) AS Subject1,
MAX(CASE WHEN Subject = 'English' THEN Marks ELSE 0 END) AS Marks1,
MAX(CASE WHEN Subject = 'English' THEN 'Maths' ELSE '' END) AS Subject2,
MAX(CASE WHEN Subject = 'Maths' THEN Marks ELSE 0 END) AS Marks2,
MAX(CASE WHEN Subject = 'English' THEN 'Science' ELSE '' END) AS Subject3,
MAX(CASE WHEN Subject = 'Science' THEN Marks ELSE 0 END) AS Marks3
FROM dbo.T_Master T1
INNER JOIN dbo.T_Detail T2
ON T1.StudentID = T2.StudentID
GROUP BY T1.StudentID,StudentName, StudentAge,ColX,Coly
Regards,
Mitesh OSwal
+918698619998
January 7, 2014 at 6:03 am
Hi Sean/Kurt, Highly appreciate your help on this. It was a good query. However some conditions changed and there was some extra requirements and I lost focus in between so created new thread instead of going on with this one.
The other thread is here:
http://www.sqlservercentral.com/Forums/Topic1528153-392-2.aspx?Update=1
Kindly provide your valuable feedback there.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply