Select Maximum, using filelds from 2 tables

  • I have Database Library, which has a lot of tables and we need 3 tables for quary:

    Table Librarians: field ID, field Surname;

    Table StudentCard: field ID, foreign key on table Librarians and other fields, which we don't use

    Table TeacherCard: field ID, foreign key on table Librarians and other fields, which we don't use

    Query: Select the librarian's surname, which gave the most count of books.

    I know, how to resolve, when i took datas only from one table, e. g. TeacherCard

    SELECT TOP 1 WITH TIES Librarians.LastName, MAX(Librarians.CountOfBooks) AS Books FROM

    (SELECT L.LastName, COUNT(*) AS CountOfBooks FROM Libs L, T_Cards T

    WHERE T.Id_Lib IN (SELECT L.Id)

    GROUP BY L.LastName) AS Librarians

    GROUP BY Librarians.LastName

    ORDER BY MAX(Librarians.CountOfBooks) DESC

    GO

    I dont know, how to use datas from TeacherCard and from StudetnCard at the same time. Please, help to write this quary.

  • Would you be able to provide some DDL and sample data for this?

    What data are you looking to use from the other 2 tables?

  • A resolved yet !

  • koles2004 (6/11/2015)


    A resolved yet !

    You've found a solution? If so, please share it.

  • I have 2 right variants: big and small.

    Small variant:

    SELECT TOP 1 WITH TIES LastName, COUNT (*) AS [count] FROM

    (SELECT LastName FROM Libs L, S_Cards S

    WHERE S.id_lib = L.id

    UNION ALL

    SELECT LastName FROM Libs L, T_Cards T

    WHERE T.id_lib = L.id) As Res

    GROUP By LastName

    ORDER BY [count] DESC

  • I would encourage you to change the way you write JOINs. That syntax is the old way and can become confusing with more complex queries and develop into unwanted cartesian products.

    Here's an alternative using a more recent format for JOINs.

    SELECT TOP 1 WITH TIES

    LastName,

    COUNT (*) AS [count]

    FROM

    (SELECT LastName

    FROM Libs L

    JOIN S_Cards S ON S.id_lib = L.id

    UNION ALL

    SELECT LastName

    FROM Libs L

    JOIN T_Cards T ON T.id_lib = L.id) As Res

    GROUP By LastName

    ORDER BY [count] DESC

    After stating that, I would reduce the number of times the Librarians table is read by implementing something like this:

    SELECT TOP 1 WITH TIES

    L.LastName,

    SUM(Res.CountOfBooks) AS [count]

    FROM Libs L

    JOIN (SELECT S.id_lib, COUNT(*) CountOfBooks

    FROM S_Cards S

    GROUP BY S.id_lib

    UNION ALL

    SELECT T.id_lib, COUNT(*) CountOfBooks

    FROM T_Cards T

    GROUP BY T.id_lib) As Res ON L.id = Res.id_lib

    GROUP By LastName

    ORDER BY [count] DESC

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I wanna specify: the second your variant is faster than first ?

  • koles2004 (6/11/2015)


    I wanna specify: the second your variant is faster than first ?

    I can't assure that it's faster or better. The only way to ensure that is by testing over a significant number of rows. That said, I have better expectations on the performance of the second query.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Both variants are right for my task )

    I like more my variant and your first variant.

    Your second variant i dont like )

  • koles2004 (6/11/2015)


    Both variants are right for my task )

    I like more my variant and your first variant.

    Your second variant i dont like )

    Any reason to that?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (6/11/2015)


    koles2004 (6/11/2015)


    I wanna specify: the second your variant is faster than first ?

    I can't assure that it's faster or better. The only way to ensure that is by testing over a significant number of rows. That said, I have better expectations on the performance of the second query.

    Both of them are inefficient. This should perform better than either of the other two, because you are only aggregating once instead of once in the subquery and again in the main query.

    SELECT TOP 1 WITH TIES

    L.LastName,

    COUNT(Res.id_lib) AS [count]

    FROM Libs L

    JOIN (

    SELECT S.id_lib

    FROM S_Cards S

    UNION ALL

    SELECT T.id_lib

    FROM T_Cards T

    ) As Res ON L.id = Res.id_lib

    GROUP By LastName

    ORDER BY [count] DESC

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (6/11/2015)


    Luis Cazares (6/11/2015)


    koles2004 (6/11/2015)


    I wanna specify: the second your variant is faster than first ?

    I can't assure that it's faster or better. The only way to ensure that is by testing over a significant number of rows. That said, I have better expectations on the performance of the second query.

    Both of them are inefficient. This should perform better than either of the other two, because you are only aggregating once instead of once in the subquery and again in the main query.

    Drew

    When doing such strong and conclusive statements, you should show proof of that. I can't give a conclusive statement even after some testing that shows that one is particularly better than the other as they have very similar execution times and the fastest isn't constant.

    I'm sure after testing that the original query is slower than the other two, almost twice as slower.

    Here's my testing harness.

    SET NOCOUNT ON;

    IF OBJECT_ID( 'Librarians') IS NOT NULL DROP TABLE Librarians;

    CREATE TABLE Librarians(

    LibrarianId int IDENTITY PRIMARY KEY,

    Surname varchar(100)

    );

    --Using AdvertureWorks to have significant names

    INSERT INTO Librarians( Surname)

    SELECT TOP 100

    LastName

    FROM AdventureWorks2012.Person.Person;

    IF OBJECT_ID( 'StudentCards') IS NOT NULL DROP TABLE StudentCards;

    CREATE TABLE StudentCards(

    StudentCardId int IDENTITY PRIMARY KEY,

    LibrarianId int);

    INSERT INTO StudentCards( LibrarianId)

    SELECT TOP 1000000

    ABS( CHECKSUM(NEWID())) % 100 + 1

    FROM sys.all_columns, sys.all_columns b;

    CREATE INDEX IX_StudentCardsLibrarian ON StudentCards(LibrarianId);

    IF OBJECT_ID( 'TeacherCards') IS NOT NULL DROP TABLE TeacherCards;

    CREATE TABLE TeacherCards(

    TeacherCardId int IDENTITY PRIMARY KEY,

    LibrarianId int);

    INSERT INTO TeacherCards( LibrarianId)

    SELECT TOP 1000000

    ABS( CHECKSUM(NEWID())) % 100 + 1

    FROM sys.all_columns, sys.all_columns b;

    CREATE INDEX IX_TeacherCardsLibrarian ON TeacherCards(LibrarianId)

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    GO

    --Use variables to measure only execution times discarding any results

    DECLARE

    @LibrarianId int,

    @LastName varchar(100),

    @Count int

    --Clean the data stored in buffer

    CHECKPOINT;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    PRINT 'Original Query';

    SET STATISTICS TIME ON; --Allow the engine to show times

    SELECT TOP 1 WITH TIES

    @LastName = Surname,

    @Count = COUNT(*)

    FROM

    (SELECT Surname

    FROM Librarians L

    JOIN StudentCards S ON S.LibrarianId = L.LibrarianId

    UNION ALL

    SELECT Surname

    FROM Librarians L

    JOIN TeacherCards T ON T.LibrarianId = L.LibrarianId) As Res

    GROUP By Surname

    ORDER BY 2 DESC;

    SET STATISTICS TIME OFF; --Turn it off to clean buffers

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    PRINT 'Single Aggregation';

    SET STATISTICS TIME ON;

    SELECT TOP 1 WITH TIES

    @LibrarianId = L.LibrarianId,

    @LastName = L.Surname,

    @Count = COUNT(Res.LibrarianId)

    FROM Librarians L

    JOIN (

    SELECT S.LibrarianId

    FROM StudentCards S

    UNION ALL

    SELECT T.LibrarianId

    FROM TeacherCards T

    ) As Res ON L.LibrarianId = Res.LibrarianId

    GROUP By L.LibrarianId, L.Surname

    ORDER BY 3 DESC;

    SET STATISTICS TIME OFF;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    PRINT 'Pre-Aggregation';

    SET STATISTICS TIME ON;

    SELECT TOP 1 WITH TIES

    @LibrarianId = L.LibrarianId,

    @LastName = L.Surname,

    @Count = SUM(Res.BookCount)

    FROM Librarians L

    JOIN (

    SELECT S.LibrarianId, COUNT(*) AS BookCount

    FROM StudentCards S

    GROUP BY S.LibrarianId

    UNION ALL

    SELECT T.LibrarianId, COUNT(*) AS BookCount

    FROM TeacherCards T

    GROUP BY T.LibrarianId

    ) As Res ON L.LibrarianId = Res.LibrarianId

    GROUP By L.Surname, L.LibrarianId

    ORDER BY 3 DESC;

    SET STATISTICS TIME OFF;

    GO 5 --Run the queries five times

    EDIT: The variant posted by koles is exactly the same as the one presented here as "original", SQL Server will interpret both the same way but the one presented here is a preferred syntax.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • You can't add Librarian id into the GROUP BY and accurately answer the query, since multiple librarians could have the same surname. You could pre-aggregate, but you must still determine the final total using only the surname, not the lib id.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • In the morning I dont have right variant.

    Now I have 4 right variants:

    SELECT TOP 1 WITH TIES LastName, COUNT (*) AS CountOfBooks FROM

    (SELECT LastName FROM Libs L, S_Cards S

    WHERE S.id_lib = L.id

    UNION ALL

    SELECT LastName FROM Libs L, T_Cards T

    WHERE T.id_lib = L.id) AS Result

    GROUP By LastName

    ORDER BY CountOfBooks DESC

    GO

    /*second */

    SELECT TOP 1 WITH TIES LastName, COUNT (*) AS [count] FROM

    (SELECT LastName FROM Libs L JOIN S_Cards S

    ON S.id_lib = L.id

    UNION ALL

    SELECT LastName FROM Libs L JOIN T_Cards T

    ON T.id_lib = L.id) As Res

    GROUP By LastName

    ORDER BY [count] DESC

    GO

    /*third*/

    SELECT TOP 1 WITH TIES L.LastName, COUNT(Res.id_lib) AS [count] FROM Libs L

    JOIN (

    SELECT S.id_lib FROM S_Cards S

    UNION ALL

    SELECT T.id_lib FROM T_Cards T

    ) As Res ON L.id = Res.id_lib

    GROUP By LastName

    ORDER BY [count] DESC

    GO

    /*fourth*/

    SELECT TOP 1 WITH TIES L.LastName, SUM(Res.CountOfBooks) AS [count] FROM Libs L

    JOIN (SELECT S.id_lib, COUNT(*) CountOfBooks FROM S_Cards S

    GROUP BY S.id_lib

    UNION ALL

    SELECT T.id_lib, COUNT(*) FROM T_Cards T

    GROUP BY T.id_lib) AS Res ON L.id = Res.id_lib

    GROUP By LastName

    ORDER BY [count] DESC

    GO

  • Luis Cazares (6/11/2015)

    When doing such strong and conclusive statements, you should show proof of that.

    You're right, of course.

    I can't give a conclusive statement even after some testing that shows that one is particularly better than the other as they have very similar execution times and the fastest isn't constant.

    After doing some testing of my own, it's obvious that joining before doing the union performs much worse than either of the other two, because it requires an additional table/index scan.

    It also appears that the performance of doing only one aggregate vs doing an aggregate in both the main and sub queries is not as large as I expected. That being said, it does appear that doing only one aggregate does slightly edge out doing two. This is born out by the execution (not included) where the extra aggregate has an additional Hash Match (Aggregate) and two addition Compute Scalars, although the costs of these pieces amounts to about the same.

    If the results are too close to call, I recommend going for the simpler one.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply