What are TSQL Challenges?
TSQL Challenges aim at helping people enhance their SET based query writing skills by presenting real life TSQL challenges. TSQL Challenges encourages people to think differently and come up with better solutions than what they are used do. It also provides a platform to see the solutions of other experts around and learn from their techniques.
TSQL Challenges is managed by ‘TSQL Challenge Committee’, a group of SQL Server Experts from around the world. Many of them are SQL Server MVPs, some are SQL Server consultants and others are developers working with SQL Server for a number of years.
What is this new series of articles all about?
This is the first of a series of articles that the TSQL Challenge Team brings to the SQL Server community. In this series, we will look at a number of TSQL Challenges and review the different ways of solving the given challenge.
TSQL Challenge #1
TSQL Challenge #1 presents a simple but interesting problem. TSQL Challenges are all about solving a given TSQL problem using a SET based query. Most of the times, the task is to format a set of data from one shape to another using a single SET operation. The Challenge in TSQL Challenge #1 is to write a query that takes data from three tables into a given shape. Here are the three tables.
Table A
code aname ----------- ---------- 1 Cat 2 Dog 3 Bird
Table B
code bname
----------- ----------
1 aaa
1 bbb
2 ccc
2 ddd
Table C
code cname
----------- ----------
1 xxx
1 yyy
1 zzz
2 www
The task is to write a query that produces the following output from the above tables. The query should run on SQL Server 2000 as well as on SQL Server 2005.
Output
code aname bname cname
----------- ---------- ---------- ----------
1 Cat aaa xxx
1 Cat bbb yyy
1 Cat NULL zzz
2 Dog ccc www
2 Dog ddd NULL
3 Bird NULL NULL
Sample Data
DECLARE @a TABLE (code INT, aname VARCHAR(10)) INSERT INTO @a(code, aname) SELECT 1,'Cat' INSERT INTO @a(code, aname) SELECT 2,'Dog' INSERT INTO @a(code, aname) SELECT 3,'Bird' DECLARE @b TABLE (code INT, bname VARCHAR(10)) INSERT INTO @b(code, bname) SELECT 1,'aaa' INSERT INTO @b(code, bname) SELECT 1,'bbb' INSERT INTO @b(code, bname) SELECT 2,'ccc' INSERT INTO @b(code, bname) SELECT 2,'ddd' DECLARE @c TABLE (code INT, cname VARCHAR(10)) INSERT INTO @c(code, cname) SELECT 1,'xxx' INSERT INTO @c(code, cname) SELECT 1,'yyy' INSERT INTO @c(code, cname) SELECT 1,'zzz' INSERT INTO @c(code, cname) SELECT 2,'www'
The Challenge
Is there really a challenge? Well, there is. At first glance one might think that a simple query joining the three tables will do the trick. However, that is not true. If you join the three tables it will produce a number of duplicate rows because the ‘code’ column in Table B and Table C are not unique.
A join between @b and @c will produce duplicate rows. The "code" column in @b contains 2 records having value "1". Similarly the "code" column in @c contains 3 records having value "1". If you try to join them with a FULL JOIN you will receive 12 records. But our query should produce only 3 rows. Similarly, our query should produce only 2 records for rows having "2" in the column "code". The total number of rows we are looking for is 5 where as a FULL JOIN between @b and @c currently produces 16 records. See this example:
SELECT * FROM @b b
FULL JOIN @c c ON b.code = b.code
/*
code bname code cname
----------- ---------- ----------- ----------
1 aaa 1 xxx
1 bbb 1 xxx
2 ccc 1 xxx
2 ddd 1 xxx
1 aaa 1 yyy
1 bbb 1 yyy
2 ccc 1 yyy
2 ddd 1 yyy
1 aaa 1 zzz
1 bbb 1 zzz
2 ccc 1 zzz
2 ddd 1 zzz
1 aaa 2 www
1 bbb 2 www
2 ccc 2 www
2 ddd 2 www
*/
Now it looks like a challenge! isn’t it?
Solving the Challenge
How do we remove those duplicate rows? Well, the join that we apply should do the following. It should take the first row in @b with value "1" and join it with the first row in @c having "1". Then it should take the second row in @b having "1" and join it with the second row in @c having value "1". It should then take the third row in @c having value "1". This does not have a matching row in @b, so it will return the information from @c and will return NULL from @b. If we can apply such a join, we can receive back exactly 3 records.
To join the rows in the manner described above, we need to generate a sequence number for each row. The sequence number should reset for each distinct value in the "code" column. So the trick here is to generate this sequence number.
SQL Server 2005 introduced ROW_NUMBER() that can be used to generate a sequence number. Here is an example:
SELECT bname, code, ROW_NUMBER() OVER(PARTITION BY code ORDER BY code) AS row FROM @b /* bname code row ---------- ----------- -------------------- aaa 1 1 bbb 1 2 ccc 2 1 ddd 2 2 */
It is little trickier in SQL Server 2000. In SQL Server 2000, this can be achieved by doing a self join and counting the number of records smaller than equal to the current row. This trick is demonstrated in the below example.
SELECT b1.bname, b1.code, COUNT(*) AS row FROM @b b1 INNER JOIN @b b2 ON b1.code = b2.code AND b2.bname <= b1.bname GROUP BY b1.bname, b1.code /* bname code row ---------- ----------- ----------- aaa 1 1 bbb 1 2 ccc 2 1 ddd 2 2 */
This trick is going to help us solve the problem. Using this trick, we can join @b and @c on Code + row and it will make each row unique and will eliminate the duplicate values from the results. A join of @b and @c using this method will produce the following result.
SELECT * FROM ( SELECT b1.bname, b1.code, COUNT(*) AS row FROM @b b1 INNER JOIN @b b2 ON b1.code = b2.code AND b2.bname <= b1.bname GROUP BY b1.bname, b1.code ) b FULL JOIN ( SELECT c1.cname, c1.code, COUNT(*) AS row FROM @c c1 INNER JOIN @c c2 ON c1.code = c2.code AND c2.cname <= c1.cname GROUP BY c1.cname, c1.code ) c ON b.Code = c.Code AND b.row = c.row /* bname code row cname code row ---------- ----------- ----------- ---------- ----------- ----------- ccc 2 1 www 2 1 aaa 1 1 xxx 1 1 bbb 1 2 yyy 1 2 NULL NULL NULL zzz 1 3 ddd 2 2 NULL NULL NULL */
This leads us to the final solution. The final result can be achieved by simply joining this result with @a.
SELECT a.*, b.bname, c.cname FROM @a a LEFT JOIN ( ( SELECT b1.bname, b1.code, COUNT(*) AS row FROM @b b1 JOIN @b b2 ON b1.code = b2.code AND b2.bname <= b1.bname GROUP BY b1.bname, b1.code ) b FULL JOIN ( SELECT c1.cname, c1.code, COUNT(*) AS row FROM @c c1 JOIN @c c2 ON c1.code = c2.code AND c2.cname <= c1.cname GROUP BY c1.cname, c1.code ) c ON b.code = c.code AND b.row = c.row ) ON a.code IN (b.code, c.code) /* code aname bname cname ----------- ---------- ---------- ---------- 1 Cat aaa xxx 1 Cat bbb yyy 1 Cat NULL zzz 2 Dog ccc www 2 Dog ddd NULL 3 Bird NULL NULL */
Here is the SQL Server 2005 version of the code, which is much simpler.
SELECT a.*, b.bname, c.cname FROM @a a LEFT JOIN ( ( SELECT bname, code, ROW_NUMBER() OVER(PARTITION BY code order by code) AS row FROM @b ) b FULL JOIN ( SELECT cname, code, ROW_NUMBER() OVER(PARTITION BY code order by code) AS row FROM @c ) c ON b.code = c.code AND b.row = c.row ) ON a.code IN (b.code, c.code) /* code aname bname cname ----------- ---------- ---------- ---------- 1 Cat aaa xxx 1 Cat bbb yyy 1 Cat NULL zzz 2 Dog ccc www 2 Dog ddd NULL 3 Bird NULL NULL */
Winning Solutions
We just saw a quick tutorial that shows how to solve TSQL Challenge #1. The solution I presented above is certainly one of the simplest, but not one of the best in terms of performance. I choose a simple solution for the purpose of this tutorial. Now it is time to see a few elegant solutions from some of the TOP SQL Server guys in the SQL Server community.
Solution by Adam Machanic
Adam Machanic is a Boston-based independent database consultant, writer, and speaker.He is a Microsoft Most Valuable Professional (MVP) for SQL Server, Microsoft Certified IT Professional (MCITP), and a member of the INETA North American Speakers.
SELECT a.*, b.bname, c.cname FROM @a a LEFT OUTER JOIN ( ( SELECT b1.bname, b1.code, count(*) AS row FROM @b b1 JOIN @b b2 ON b1.code = b2.code AND b2.bname <= b1.bname GROUP BY b1.bname, b1.code ) b FULL OUTER JOIN ( SELECT c1.cname, c1.code, COUNT(*) AS row FROM @c c1 JOIN @c c2 ON c1.code = c2.code AND c2.cname <= c1.cname GROUP BY c1.cname, c1.code ) c ON b.code = c.code AND b.row = c.row ) ON a.code IN (b.code, c.code)
Solution by Ashish Gilhotra
Ashish Gilhotra a Web Developer from Jaipur, Rajasthan. He is an MCAD in .net with Sql Server.
SELECT T1.code, T1.aname, D.bname, D.cname FROM @a T1 LEFT JOIN ( SELECT COALESCE(D2.code, D3.code) AS code ,D2.bname ,D3.cname FROM ( SELECT e.bname, e.code, ( SELECT COUNT(*) FROM @b e2 WHERE e2.bname <= e.bname ) AS rownumber FROM @b e ) D2 FULL JOIN ( SELECT e.cname, e.code, replace( ( SELECT COUNT(*) FROM @c e2 WHERE e2.cname <= e.cname )-1,0,4 ) AS rownumber FROM @c e ) D3 ON D2.code = D3.code AND D2.rownumber = D3.rownumber ) D ON T1.code = D.code
Solution by Leonid Koyfman
Leonid Koyfman is a Senior Developer with Razorfish ( San Francisco). He is in software development over 10 years. His focus is data visualization and delivering reporting solutions from various data sources, but spending most of the time with SQL Server.
SELECT a.code, aname, bname, cname FROM @a a LEFT JOIN ( SELECT COALESCE(t1.code,t2.code) AS code, bname, cname FROM( SELECT code, bname, ( SELECT count(9) FROM @b b2 WHERE b1.code=b2.code and b1.bname>=b2.bname ) AS ord FROM @b b1 )t1 FULL OUTER JOIN( SELECT code, cname, ( SELECT count(9) FROM @c c2 WHERE c1.code=c2.code and c1.cname>=c2.cname ) AS ord FROM @c c1 )t2 on t1.code=t2.code and t1.ord=t2.ord )t ON a.code=t.code ORDER BY a.code
Download solutions
All the solutions listed above are available for download at codeplex.
Inspired? Write a better solution or take the next challenge!
If you have got a better idea and come up with a different solution, please let me know. I invite you to share your ideas in the discussion thread below. If your solution is as elegant or better than the solutions already listed above, we can publish them in the codeplex project.
If TSQL Challenges interests you, take the next challenge or try to play with one of the existing challenges.
About the Author
Jacob Sebastian is a SQL Server MVP and is working with SQL Server for over 11 years. He is a Moderator of MSDN / Technet Forums, Regional Mentor for PASS Asia and a regular columnist at SQLServerCentral. He is the president of Ahmedabad SQL Server User Group and a regular speaker in the UG events. Jacob is the author of The Art of XSD - SQL Server XML SChema Collections, a contributing author in "SQL Server 2008 Bible". He blogs at http://beyondrelational.com/blogs/jacob/.