June 11, 2015 at 2:02 am
I have Database Library, which has a lot of tables and we need 3 tables for quary:
Table Librarians: field ID, field Surname;
Table StudentCard: field ID, foreign key on table Librarians and other fields, which we don't use
Table TeacherCard: field ID, foreign key on table Librarians and other fields, which we don't use
Query: Select the librarian's surname, which gave the most count of books.
I know, how to resolve, when i took datas only from one table, e. g. TeacherCard
SELECT TOP 1 WITH TIES Librarians.LastName, MAX(Librarians.CountOfBooks) AS Books FROM
(SELECT L.LastName, COUNT(*) AS CountOfBooks FROM Libs L, T_Cards T
WHERE T.Id_Lib IN (SELECT L.Id)
GROUP BY L.LastName) AS Librarians
GROUP BY Librarians.LastName
ORDER BY MAX(Librarians.CountOfBooks) DESC
GO
I dont know, how to use datas from TeacherCard and from StudetnCard at the same time. Please, help to write this quary.
June 11, 2015 at 2:45 am
Would you be able to provide some DDL and sample data for this?
What data are you looking to use from the other 2 tables?
June 11, 2015 at 2:48 am
A resolved yet !
June 11, 2015 at 2:51 am
koles2004 (6/11/2015)
A resolved yet !
You've found a solution? If so, please share it.
June 11, 2015 at 9:40 am
I have 2 right variants: big and small.
Small variant:
SELECT TOP 1 WITH TIES LastName, COUNT (*) AS [count] FROM
(SELECT LastName FROM Libs L, S_Cards S
WHERE S.id_lib = L.id
UNION ALL
SELECT LastName FROM Libs L, T_Cards T
WHERE T.id_lib = L.id) As Res
GROUP By LastName
ORDER BY [count] DESC
June 11, 2015 at 12:39 pm
I would encourage you to change the way you write JOINs. That syntax is the old way and can become confusing with more complex queries and develop into unwanted cartesian products.
Here's an alternative using a more recent format for JOINs.
SELECT TOP 1 WITH TIES
LastName,
COUNT (*) AS [count]
FROM
(SELECT LastName
FROM Libs L
JOIN S_Cards S ON S.id_lib = L.id
UNION ALL
SELECT LastName
FROM Libs L
JOIN T_Cards T ON T.id_lib = L.id) As Res
GROUP By LastName
ORDER BY [count] DESC
After stating that, I would reduce the number of times the Librarians table is read by implementing something like this:
SELECT TOP 1 WITH TIES
L.LastName,
SUM(Res.CountOfBooks) AS [count]
FROM Libs L
JOIN (SELECT S.id_lib, COUNT(*) CountOfBooks
FROM S_Cards S
GROUP BY S.id_lib
UNION ALL
SELECT T.id_lib, COUNT(*) CountOfBooks
FROM T_Cards T
GROUP BY T.id_lib) As Res ON L.id = Res.id_lib
GROUP By LastName
ORDER BY [count] DESC
June 11, 2015 at 1:43 pm
I wanna specify: the second your variant is faster than first ?
June 11, 2015 at 1:48 pm
koles2004 (6/11/2015)
I wanna specify: the second your variant is faster than first ?
I can't assure that it's faster or better. The only way to ensure that is by testing over a significant number of rows. That said, I have better expectations on the performance of the second query.
June 11, 2015 at 1:54 pm
Both variants are right for my task )
I like more my variant and your first variant.
Your second variant i dont like )
June 11, 2015 at 1:56 pm
koles2004 (6/11/2015)
Both variants are right for my task )I like more my variant and your first variant.
Your second variant i dont like )
Any reason to that?
June 11, 2015 at 1:57 pm
Luis Cazares (6/11/2015)
koles2004 (6/11/2015)
I wanna specify: the second your variant is faster than first ?I can't assure that it's faster or better. The only way to ensure that is by testing over a significant number of rows. That said, I have better expectations on the performance of the second query.
Both of them are inefficient. This should perform better than either of the other two, because you are only aggregating once instead of once in the subquery and again in the main query.
SELECT TOP 1 WITH TIES
L.LastName,
COUNT(Res.id_lib) AS [count]
FROM Libs L
JOIN (
SELECT S.id_lib
FROM S_Cards S
UNION ALL
SELECT T.id_lib
FROM T_Cards T
) As Res ON L.id = Res.id_lib
GROUP By LastName
ORDER BY [count] DESC
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 11, 2015 at 2:40 pm
drew.allen (6/11/2015)
Luis Cazares (6/11/2015)
koles2004 (6/11/2015)
I wanna specify: the second your variant is faster than first ?I can't assure that it's faster or better. The only way to ensure that is by testing over a significant number of rows. That said, I have better expectations on the performance of the second query.
Both of them are inefficient. This should perform better than either of the other two, because you are only aggregating once instead of once in the subquery and again in the main query.
Drew
When doing such strong and conclusive statements, you should show proof of that. I can't give a conclusive statement even after some testing that shows that one is particularly better than the other as they have very similar execution times and the fastest isn't constant.
I'm sure after testing that the original query is slower than the other two, almost twice as slower.
Here's my testing harness.
SET NOCOUNT ON;
IF OBJECT_ID( 'Librarians') IS NOT NULL DROP TABLE Librarians;
CREATE TABLE Librarians(
LibrarianId int IDENTITY PRIMARY KEY,
Surname varchar(100)
);
--Using AdvertureWorks to have significant names
INSERT INTO Librarians( Surname)
SELECT TOP 100
LastName
FROM AdventureWorks2012.Person.Person;
IF OBJECT_ID( 'StudentCards') IS NOT NULL DROP TABLE StudentCards;
CREATE TABLE StudentCards(
StudentCardId int IDENTITY PRIMARY KEY,
LibrarianId int);
INSERT INTO StudentCards( LibrarianId)
SELECT TOP 1000000
ABS( CHECKSUM(NEWID())) % 100 + 1
FROM sys.all_columns, sys.all_columns b;
CREATE INDEX IX_StudentCardsLibrarian ON StudentCards(LibrarianId);
IF OBJECT_ID( 'TeacherCards') IS NOT NULL DROP TABLE TeacherCards;
CREATE TABLE TeacherCards(
TeacherCardId int IDENTITY PRIMARY KEY,
LibrarianId int);
INSERT INTO TeacherCards( LibrarianId)
SELECT TOP 1000000
ABS( CHECKSUM(NEWID())) % 100 + 1
FROM sys.all_columns, sys.all_columns b;
CREATE INDEX IX_TeacherCardsLibrarian ON TeacherCards(LibrarianId)
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
GO
--Use variables to measure only execution times discarding any results
DECLARE
@LibrarianId int,
@LastName varchar(100),
@Count int
--Clean the data stored in buffer
CHECKPOINT;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
PRINT 'Original Query';
SET STATISTICS TIME ON; --Allow the engine to show times
SELECT TOP 1 WITH TIES
@LastName = Surname,
@Count = COUNT(*)
FROM
(SELECT Surname
FROM Librarians L
JOIN StudentCards S ON S.LibrarianId = L.LibrarianId
UNION ALL
SELECT Surname
FROM Librarians L
JOIN TeacherCards T ON T.LibrarianId = L.LibrarianId) As Res
GROUP By Surname
ORDER BY 2 DESC;
SET STATISTICS TIME OFF; --Turn it off to clean buffers
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
PRINT 'Single Aggregation';
SET STATISTICS TIME ON;
SELECT TOP 1 WITH TIES
@LibrarianId = L.LibrarianId,
@LastName = L.Surname,
@Count = COUNT(Res.LibrarianId)
FROM Librarians L
JOIN (
SELECT S.LibrarianId
FROM StudentCards S
UNION ALL
SELECT T.LibrarianId
FROM TeacherCards T
) As Res ON L.LibrarianId = Res.LibrarianId
GROUP By L.LibrarianId, L.Surname
ORDER BY 3 DESC;
SET STATISTICS TIME OFF;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
PRINT 'Pre-Aggregation';
SET STATISTICS TIME ON;
SELECT TOP 1 WITH TIES
@LibrarianId = L.LibrarianId,
@LastName = L.Surname,
@Count = SUM(Res.BookCount)
FROM Librarians L
JOIN (
SELECT S.LibrarianId, COUNT(*) AS BookCount
FROM StudentCards S
GROUP BY S.LibrarianId
UNION ALL
SELECT T.LibrarianId, COUNT(*) AS BookCount
FROM TeacherCards T
GROUP BY T.LibrarianId
) As Res ON L.LibrarianId = Res.LibrarianId
GROUP By L.Surname, L.LibrarianId
ORDER BY 3 DESC;
SET STATISTICS TIME OFF;
GO 5 --Run the queries five times
EDIT: The variant posted by koles is exactly the same as the one presented here as "original", SQL Server will interpret both the same way but the one presented here is a preferred syntax.
June 11, 2015 at 2:45 pm
You can't add Librarian id into the GROUP BY and accurately answer the query, since multiple librarians could have the same surname. You could pre-aggregate, but you must still determine the final total using only the surname, not the lib id.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 11, 2015 at 3:05 pm
In the morning I dont have right variant.
Now I have 4 right variants:
SELECT TOP 1 WITH TIES LastName, COUNT (*) AS CountOfBooks FROM
(SELECT LastName FROM Libs L, S_Cards S
WHERE S.id_lib = L.id
UNION ALL
SELECT LastName FROM Libs L, T_Cards T
WHERE T.id_lib = L.id) AS Result
GROUP By LastName
ORDER BY CountOfBooks DESC
GO
/*second */
SELECT TOP 1 WITH TIES LastName, COUNT (*) AS [count] FROM
(SELECT LastName FROM Libs L JOIN S_Cards S
ON S.id_lib = L.id
UNION ALL
SELECT LastName FROM Libs L JOIN T_Cards T
ON T.id_lib = L.id) As Res
GROUP By LastName
ORDER BY [count] DESC
GO
/*third*/
SELECT TOP 1 WITH TIES L.LastName, COUNT(Res.id_lib) AS [count] FROM Libs L
JOIN (
SELECT S.id_lib FROM S_Cards S
UNION ALL
SELECT T.id_lib FROM T_Cards T
) As Res ON L.id = Res.id_lib
GROUP By LastName
ORDER BY [count] DESC
GO
/*fourth*/
SELECT TOP 1 WITH TIES L.LastName, SUM(Res.CountOfBooks) AS [count] FROM Libs L
JOIN (SELECT S.id_lib, COUNT(*) CountOfBooks FROM S_Cards S
GROUP BY S.id_lib
UNION ALL
SELECT T.id_lib, COUNT(*) FROM T_Cards T
GROUP BY T.id_lib) AS Res ON L.id = Res.id_lib
GROUP By LastName
ORDER BY [count] DESC
GO
June 11, 2015 at 3:17 pm
Luis Cazares (6/11/2015)
When doing such strong and conclusive statements, you should show proof of that.
You're right, of course.
I can't give a conclusive statement even after some testing that shows that one is particularly better than the other as they have very similar execution times and the fastest isn't constant.
After doing some testing of my own, it's obvious that joining before doing the union performs much worse than either of the other two, because it requires an additional table/index scan.
It also appears that the performance of doing only one aggregate vs doing an aggregate in both the main and sub queries is not as large as I expected. That being said, it does appear that doing only one aggregate does slightly edge out doing two. This is born out by the execution (not included) where the extra aggregate has an additional Hash Match (Aggregate) and two addition Compute Scalars, although the costs of these pieces amounts to about the same.
If the results are too close to call, I recommend going for the simpler one.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply