Query Help - Slow Performance

  • Greetings all!

    Using the query below I am able to return about 6500 results in roughly 7 seconds.  This seems to me to be entirely too long.   I would expect the query to run in less than a second.   Full disclosure, I am doing the experiments on a home server, basically new DELL T130.    The processor is not super powerful, but I have the tempdb AND data files on a RamDisk, so I am sure it is not a memory bottleneck.   

    I also tried moving them onto an SSD mirrored array.  Neither seemed to affect the peformance of the query.  The entire DB is only about 5GB.   I've attached, code, table sizes and execution plan.  Can anyone help me figure out how I can reduce the speed?  I'm a bit of a SQL newbie when it comes to performance tuning!Please help!


    SELECT     DISTINCT
            DistrictTeachers.LastName,
            DistrictTeachers.FirstName,
            DistrictTeachers.TeacherId,
            SchoolBuilding.SchoolName,
            SchoolBuilding.SchoolCode,
            DistrictStudentSchedule.SchoolYearEnding,
            DistrictStudentSchedule.Semester,
            DistrictStudentSchedule.CountyDistrict
    FROM    DistrictStudentSchedule INNER JOIN
            DistrictTeachers ON DistrictStudentSchedule.CountyDistrict = DistrictTeachers.CountyDistrict AND DistrictStudentSchedule.TeacherId = DistrictTeachers.TeacherId INNER JOIN
            SchoolBuilding ON DistrictStudentSchedule.CountyDistrict = SchoolBuilding.CountyDistrict AND DistrictStudentSchedule.SchoolCode = SchoolBuilding.SchoolCode
            WHERE     DistrictStudentSchedule.CountyDistrict = '111111'
            AND    EXISTS (
                SELECT * FROM
                DistrictStudents
                WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId
                AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID
                AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict
                AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding
                AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool
            )

    The table sizes are as follows:
    District Students: 51,000 records, 20 columns
    District Student Schedule: 750,000 records, 17  columns
    District Teachers : 10,000 records, 7 columns
    SchoolBuilding: 2,000 records, 4 columns

  • 1) Always compare estimated rows and actual rows. when they are off by 6 order of magnitude, bad stuff going to be happening in the plan. You need to find out why that is happening and fix it to be successful here.

    2) Possible cartesian product? I didn't dig into your stuff well but the row counts explode as you progress through the data.

    3) Massively expensive SORT/DISTINCT, partly from row estimation errors

    4) Heap tables can have negative consequences.

    6) What are the schemas, including indexes, of the various tables?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Like Kevin said, we need the DDL for the tables and please be sure to include the index definitions.  I saw no references to NCIs at all in your plan - are they all heaps?

  • Understood.  When I ran the query analyzer it gave 0 suggestions for indexes.  As the data was copied from another machine (just tables and data) there are 0 indexes.  

    I have attached the schemes for the tables in question.

  • bateman_aaron - Sunday, September 10, 2017 4:49 PM

    Understood.  When I ran the query analyzer it gave 0 suggestions for indexes.  As the data was copied from another machine (just tables and data) there are 0 indexes.  

    I have attached the schemes for the tables in question.

    Can you setup your scripting options to include all keys (PK, FK) and indexes and then generate the table scripts again, please?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I don't have time to dig into it, but this part of the code just smells bad:

    AND  EXISTS (
        SELECT * FROM
        DistrictStudents
        WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId
        AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID
        AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict
        AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding
        AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool
       )

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Jeff Moden - Sunday, September 10, 2017 6:41 PM

    Can you setup your scripting options to include all keys (PK, FK) and indexes and then generate the table scripts again, please?

    There are no primary or foreign keys on the tables and there are no indexes on the machine that I am testing on.  The production server does have indexes on all of the tables, but none of them seem to be helping, as both the production and test machine run the same query in about 6-7  seconds.

    I believe when it was setup, the au_id columns were used just so they could delete duplicates quickly.  New files get imported nightly and duplicates of the same students, teachers, buildings, etc are removed.  

    I could setup primary keys, but I do know that a table like Teachers or Students can have the same student exists multiple times, but in different schools or years.  So I guess you would have to setup some kind of multi-column key?   I "think" it would be impractical to recode everything to work based on the au_id's in each table.

  • TheSQLGuru - Sunday, September 10, 2017 7:24 PM

    I don't have time to dig into it, but this part of the code just smells bad:

    AND  EXISTS (
        SELECT * FROM
        DistrictStudents
        WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId
        AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID
        AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict
        AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding
        AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool
       )

    At a glance, I don't see why it would cause an issue.  I did create some indexes even though they were not recommended and I was able to shave off several seconds.  
    I do still wonder why it would take so long to run without giving a suggested index though.

    I appreciate the time spent digging into it thus far.

  • bateman_aaron - Sunday, September 10, 2017 8:38 PM

    Jeff Moden - Sunday, September 10, 2017 6:41 PM

    Can you setup your scripting options to include all keys (PK, FK) and indexes and then generate the table scripts again, please?

    There are no primary or foreign keys on the tables and there are no indexes on the machine that I am testing on.  The production server does have indexes on all of the tables, but none of them seem to be helping, as both the production and test machine run the same query in about 6-7  seconds.

    I believe when it was setup, the au_id columns were used just so they could delete duplicates quickly.  New files get imported nightly and duplicates of the same students, teachers, buildings, etc are removed.  

    I could setup primary keys, but I do know that a table like Teachers or Students can have the same student exists multiple times, but in different schools or years.  So I guess you would have to setup some kind of multi-column key?   I "think" it would be impractical to recode everything to work based on the au_id's in each table.

    You need to setup your test environment to be the same as your production environment, including the PKs and Indexes.  And, we need the full monty DDL for your production environment and for you to also make sure that the Actual Execution Plan is from either the properly setup identical-to-production test database or from the production database itself.

    As for the query itself, I believe a little Divide'n'Conquer in the form of "Pre-aggregation" is in order but I can't tell without the correct full monty DDL.  None of us can.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • bateman_aaron - Sunday, September 10, 2017 8:39 PM

    TheSQLGuru - Sunday, September 10, 2017 7:24 PM

    I don't have time to dig into it, but this part of the code just smells bad:

    AND  EXISTS (
        SELECT * FROM
        DistrictStudents
        WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId
        AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID
        AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict
        AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding
        AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool
       )

    At a glance, I don't see why it would cause an issue.  I did create some indexes even though they were not recommended and I was able to shave off several seconds.  
    I do still wonder why it would take so long to run without giving a suggested index though.

    I appreciate the time spent digging into it thus far.

    Not sure but trying converting this correlated subquery to inner can help you

    AND  EXISTS (
        SELECT * FROM
        DistrictStudents 
        WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId 
        AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID
        AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict 
        AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding
        AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool
       )

    Regards
    VG

  • Considering the desired output, the real problem is that bloody student schedule table.  It needs to be pre-aggregated to only the information required which will remove thousands and thousands of unnecessary row joins throughout the pipe.  I expect it'll whittle down to being not much larger than the teacher table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Assuming you have multiple CountyDistrict values, and the table isn't clustered already, try clustering the tables:

    CREATE CLUSTERED INDEX DistrictTeachers__CL ON dbo.DistrictTeachers ( CountyDistrict, TeacherId ) WITH ( FILLFACTOR = 98 ) ON [PRIMARY];
    CREATE CLUSTERED INDEX DistrictStudentSchedule__CL ON dbo.DistrictStudentSchedule ( CountyDistrict, SchoolCode, SchoolYearEnding, StudentId ) WITH ( FILLFACTOR = 98 ) ON [PRIMARY];
    CREATE CLUSTERED INDEX SchoolBuilding__CL ON dbo.SchoolBuilding ( CountyDistrict, SchoolCode ) WITH ( FILLFACTOR = 98 ) ON [PRIMARY];

    As noted previously, the DISTINCT is a huge drag on performance.  You should be able to get rid of it with more specific joins between tables and/or CROSS APPLY using a TOP (1), such as for Teachers, rather than returning all matches (presumably the most recent name is good for the teacher even if the name did change).

    It won't hurt to explicitly limit county district in the other tables, although SQL should already be doing that implicitly/automatically:

      WHERE  DistrictStudentSchedule.CountyDistrict = '111111'
       AND  DistrictTeachers.CountyDistrict = '111111'
       AND  SchoolBuilding.CountyDistrict = '111111'
       AND ...

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply