Select Maximum, using filelds from 2 tables

Question

Post reply

Select Maximum, using filelds from 2 tables

koles2004

Old Hand

Points: 336
More actions
June 11, 2015 at 2:02 am

#318190

I have Database Library, which has a lot of tables and we need 3 tables for quary:
Table Librarians: field ID, field Surname;
Table StudentCard: field ID, foreign key on table Librarians and other fields, which we don't use
Table TeacherCard: field ID, foreign key on table Librarians and other fields, which we don't use
Query: Select the librarian's surname, which gave the most count of books.
I know, how to resolve, when i took datas only from one table, e. g. TeacherCard
SELECT TOP 1 WITH TIES Librarians.LastName, MAX(Librarians.CountOfBooks) AS Books FROM
(SELECT L.LastName, COUNT(*) AS CountOfBooks FROM Libs L, T_Cards T
WHERE T.Id_Lib IN (SELECT L.Id)
GROUP BY L.LastName) AS Librarians
GROUP BY Librarians.LastName
ORDER BY MAX(Librarians.CountOfBooks) DESC
GO
I dont know, how to use datas from TeacherCard and from StudetnCard at the same time. Please, help to write this quary.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Dohsan SSCrazy Points: 2972 More actions · Answer 1

Would you be able to provide some DDL and sample data for this?

What data are you looking to use from the other 2 tables?

koles2004 Old Hand Points: 336 More actions · Answer 2

koles2004

Old Hand

Points: 336

June 11, 2015 at 2:48 am

#1804614

A resolved yet !

Dohsan SSCrazy Points: 2972 More actions · Answer 3

koles2004 (6/11/2015)
A resolved yet !

You've found a solution? If so, please share it.

koles2004 Old Hand Points: 336 More actions · Answer 4

I have 2 right variants: big and small.

Small variant:

SELECT TOP 1 WITH TIES LastName, COUNT (*) AS [count] FROM
(SELECT LastName FROM Libs L, S_Cards S
WHERE S.id_lib = L.id
UNION ALL
SELECT LastName FROM Libs L, T_Cards T
WHERE T.id_lib = L.id) As Res
GROUP By LastName
ORDER BY [count] DESC

Luis Cazares SSC Guru Points: 183706 More actions · Answer 5

I would encourage you to change the way you write JOINs. That syntax is the old way and can become confusing with more complex queries and develop into unwanted cartesian products.

Here's an alternative using a more recent format for JOINs.

SELECT TOP 1 WITH TIES

LastName,

COUNT (*) AS [count]

FROM

(SELECT LastName

FROM Libs L

JOIN S_Cards S ON S.id_lib = L.id

UNION ALL

SELECT LastName

FROM Libs L

JOIN T_Cards T ON T.id_lib = L.id) As Res

GROUP By LastName

ORDER BY [count] DESC

After stating that, I would reduce the number of times the Librarians table is read by implementing something like this:

SELECT TOP 1 WITH TIES

L.LastName,

SUM(Res.CountOfBooks) AS [count]

FROM Libs L

JOIN (SELECT S.id_lib, COUNT(*) CountOfBooks

FROM S_Cards S

GROUP BY S.id_lib

UNION ALL

SELECT T.id_lib, COUNT(*) CountOfBooks

FROM T_Cards T

GROUP BY T.id_lib) As Res ON L.id = Res.id_lib

GROUP By LastName

ORDER BY [count] DESC

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

koles2004 Old Hand Points: 336 More actions · Answer 6

I wanna specify: the second your variant is faster than first ?

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

koles2004 (6/11/2015)
I wanna specify: the second your variant is faster than first ?

I can't assure that it's faster or better. The only way to ensure that is by testing over a significant number of rows. That said, I have better expectations on the performance of the second query.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

koles2004 Old Hand Points: 336 More actions · Answer 8

Both variants are right for my task )

I like more my variant and your first variant.

Your second variant i dont like )

Luis Cazares SSC Guru Points: 183706 More actions · Answer 9

koles2004 (6/11/2015)
Both variants are right for my task )
I like more my variant and your first variant.
Your second variant i dont like )

Any reason to that?

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

drew.allen SSC Guru Points: 77014 More actions · Answer 10

Luis Cazares (6/11/2015)
koles2004 (6/11/2015)
I wanna specify: the second your variant is faster than first ?
I can't assure that it's faster or better. The only way to ensure that is by testing over a significant number of rows. That said, I have better expectations on the performance of the second query.

Both of them are inefficient. This should perform better than either of the other two, because you are only aggregating once instead of once in the subquery and again in the main query.

SELECT TOP 1 WITH TIES

L.LastName,

COUNT(Res.id_lib) AS [count]

FROM Libs L

JOIN (

SELECT S.id_lib

FROM S_Cards S

UNION ALL

SELECT T.id_lib

FROM T_Cards T

) As Res ON L.id = Res.id_lib

GROUP By LastName

ORDER BY [count] DESC

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Luis Cazares SSC Guru Points: 183706 More actions · Answer 11

drew.allen (6/11/2015)
Luis Cazares (6/11/2015)
koles2004 (6/11/2015)
I wanna specify: the second your variant is faster than first ?
I can't assure that it's faster or better. The only way to ensure that is by testing over a significant number of rows. That said, I have better expectations on the performance of the second query.
Both of them are inefficient. This should perform better than either of the other two, because you are only aggregating once instead of once in the subquery and again in the main query.
Drew

When doing such strong and conclusive statements, you should show proof of that. I can't give a conclusive statement even after some testing that shows that one is particularly better than the other as they have very similar execution times and the fastest isn't constant.

I'm sure after testing that the original query is slower than the other two, almost twice as slower.

Here's my testing harness.

SET NOCOUNT ON;

IF OBJECT_ID( 'Librarians') IS NOT NULL DROP TABLE Librarians;

CREATE TABLE Librarians(

LibrarianId int IDENTITY PRIMARY KEY,

Surname varchar(100)

);

--Using AdvertureWorks to have significant names

INSERT INTO Librarians( Surname)

SELECT TOP 100

LastName

FROM AdventureWorks2012.Person.Person;

IF OBJECT_ID( 'StudentCards') IS NOT NULL DROP TABLE StudentCards;

CREATE TABLE StudentCards(

StudentCardId int IDENTITY PRIMARY KEY,

LibrarianId int);

INSERT INTO StudentCards( LibrarianId)

SELECT TOP 1000000

ABS( CHECKSUM(NEWID())) % 100 + 1

FROM sys.all_columns, sys.all_columns b;

CREATE INDEX IX_StudentCardsLibrarian ON StudentCards(LibrarianId);

IF OBJECT_ID( 'TeacherCards') IS NOT NULL DROP TABLE TeacherCards;

CREATE TABLE TeacherCards(

TeacherCardId int IDENTITY PRIMARY KEY,

LibrarianId int);

INSERT INTO TeacherCards( LibrarianId)

SELECT TOP 1000000

ABS( CHECKSUM(NEWID())) % 100 + 1

FROM sys.all_columns, sys.all_columns b;

CREATE INDEX IX_TeacherCardsLibrarian ON TeacherCards(LibrarianId)

DBCC FREEPROCCACHE WITH NO_INFOMSGS;

GO

--Use variables to measure only execution times discarding any results

DECLARE

@LibrarianId int,

@LastName varchar(100),

@Count int

--Clean the data stored in buffer

CHECKPOINT;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

PRINT 'Original Query';

SET STATISTICS TIME ON; --Allow the engine to show times

SELECT TOP 1 WITH TIES

@LastName = Surname,

@Count = COUNT(*)

FROM

(SELECT Surname

FROM Librarians L

JOIN StudentCards S ON S.LibrarianId = L.LibrarianId

UNION ALL

SELECT Surname

FROM Librarians L

JOIN TeacherCards T ON T.LibrarianId = L.LibrarianId) As Res

GROUP By Surname

ORDER BY 2 DESC;

SET STATISTICS TIME OFF; --Turn it off to clean buffers

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

PRINT 'Single Aggregation';

SET STATISTICS TIME ON;

SELECT TOP 1 WITH TIES

@LibrarianId = L.LibrarianId,

@LastName = L.Surname,

@Count = COUNT(Res.LibrarianId)

FROM Librarians L

JOIN (

SELECT S.LibrarianId

FROM StudentCards S

UNION ALL

SELECT T.LibrarianId

FROM TeacherCards T

) As Res ON L.LibrarianId = Res.LibrarianId

GROUP By L.LibrarianId, L.Surname

ORDER BY 3 DESC;

SET STATISTICS TIME OFF;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

PRINT 'Pre-Aggregation';

SET STATISTICS TIME ON;

SELECT TOP 1 WITH TIES

@LibrarianId = L.LibrarianId,

@LastName = L.Surname,

@Count = SUM(Res.BookCount)

FROM Librarians L

JOIN (

SELECT S.LibrarianId, COUNT(*) AS BookCount

FROM StudentCards S

GROUP BY S.LibrarianId

UNION ALL

SELECT T.LibrarianId, COUNT(*) AS BookCount

FROM TeacherCards T

GROUP BY T.LibrarianId

) As Res ON L.LibrarianId = Res.LibrarianId

GROUP By L.Surname, L.LibrarianId

ORDER BY 3 DESC;

SET STATISTICS TIME OFF;

GO 5 --Run the queries five times

EDIT: The variant posted by koles is exactly the same as the one presented here as "original", SQL Server will interpret both the same way but the one presented here is a preferred syntax.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

ScottPletcher SSC Guru Points: 101235 More actions · Answer 12

You can't add Librarian id into the GROUP BY and accurately answer the query, since multiple librarians could have the same surname. You could pre-aggregate, but you must still determine the final total using only the surname, not the lib id.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

koles2004 Old Hand Points: 336 More actions · Answer 13

In the morning I dont have right variant.

Now I have 4 right variants:

SELECT TOP 1 WITH TIES LastName, COUNT (*) AS CountOfBooks FROM
(SELECT LastName FROM Libs L, S_Cards S
WHERE S.id_lib = L.id
UNION ALL
SELECT LastName FROM Libs L, T_Cards T
WHERE T.id_lib = L.id) AS Result
GROUP By LastName
ORDER BY CountOfBooks DESC
GO
/*second */
SELECT TOP 1 WITH TIES LastName, COUNT (*) AS [count] FROM
(SELECT LastName FROM Libs L JOIN S_Cards S
ON S.id_lib = L.id
UNION ALL
SELECT LastName FROM Libs L JOIN T_Cards T
ON T.id_lib = L.id) As Res
GROUP By LastName
ORDER BY [count] DESC
GO
/*third*/
SELECT TOP 1 WITH TIES L.LastName, COUNT(Res.id_lib) AS [count] FROM Libs L
JOIN (
SELECT S.id_lib FROM S_Cards S
UNION ALL
SELECT T.id_lib FROM T_Cards T
) As Res ON L.id = Res.id_lib
GROUP By LastName
ORDER BY [count] DESC
GO
/*fourth*/
SELECT TOP 1 WITH TIES L.LastName, SUM(Res.CountOfBooks) AS [count] FROM Libs L
JOIN (SELECT S.id_lib, COUNT(*) CountOfBooks FROM S_Cards S
GROUP BY S.id_lib
UNION ALL
SELECT T.id_lib, COUNT(*) FROM T_Cards T
GROUP BY T.id_lib) AS Res ON L.id = Res.id_lib
GROUP By LastName
ORDER BY [count] DESC
GO

drew.allen SSC Guru Points: 77014 More actions · Answer 14

Luis Cazares (6/11/2015)
When doing such strong and conclusive statements, you should show proof of that.

You're right, of course.

I can't give a conclusive statement even after some testing that shows that one is particularly better than the other as they have very similar execution times and the fastest isn't constant.

After doing some testing of my own, it's obvious that joining before doing the union performs much worse than either of the other two, because it requires an additional table/index scan.

It also appears that the performance of doing only one aggregate vs doing an aggregate in both the main and sub queries is not as large as I expected. That being said, it does appear that doing only one aggregate does slightly edge out doing two. This is born out by the execution (not included) where the extra aggregate has an additional Hash Match (Aggregate) and two addition Compute Scalars, although the costs of these pieces amounts to about the same.

If the results are too close to call, I recommend going for the simpler one.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA