Query Help - Slow Performance

Question

Query Help - Slow Performance

bateman_aaron

Valued Member

Points: 59
More actions
September 9, 2017 at 5:02 pm

#339698

Greetings all!
Using the query below I am able to return about 6500 results in roughly 7 seconds. This seems to me to be entirely too long. I would expect the query to run in less than a second. Full disclosure, I am doing the experiments on a home server, basically new DELL T130. The processor is not super powerful, but I have the tempdb AND data files on a RamDisk, so I am sure it is not a memory bottleneck.
I also tried moving them onto an SSD mirrored array. Neither seemed to affect the peformance of the query. The entire DB is only about 5GB. I've attached, code, table sizes and execution plan. Can anyone help me figure out how I can reduce the speed? I'm a bit of a SQL newbie when it comes to performance tuning!Please help!
SELECT DISTINCT DistrictTeachers.LastName, DistrictTeachers.FirstName, DistrictTeachers.TeacherId, SchoolBuilding.SchoolName, SchoolBuilding.SchoolCode, DistrictStudentSchedule.SchoolYearEnding, DistrictStudentSchedule.Semester, DistrictStudentSchedule.CountyDistrict FROM DistrictStudentSchedule INNER JOIN DistrictTeachers ON DistrictStudentSchedule.CountyDistrict = DistrictTeachers.CountyDistrict AND DistrictStudentSchedule.TeacherId = DistrictTeachers.TeacherId INNER JOIN SchoolBuilding ON DistrictStudentSchedule.CountyDistrict = SchoolBuilding.CountyDistrict AND DistrictStudentSchedule.SchoolCode = SchoolBuilding.SchoolCode WHERE DistrictStudentSchedule.CountyDistrict = '111111' AND EXISTS ( SELECT * FROM DistrictStudents WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool )
The table sizes are as follows:
District Students: 51,000 records, 20 columns
District Student Schedule: 750,000 records, 17 columns
District Teachers : 10,000 records, 7 columns
SchoolBuilding: 2,000 records, 4 columns

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

1) Always compare estimated rows and actual rows. when they are off by 6 order of magnitude, bad stuff going to be happening in the plan. You need to find out why that is happening and fix it to be successful here.

2) Possible cartesian product? I didn't dig into your stuff well but the row counts explode as you progress through the data.

3) Massively expensive SORT/DISTINCT, partly from row estimation errors

4) Heap tables can have negative consequences.

6) What are the schemas, including indexes, of the various tables?

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Ed Wagner SSC Guru Points: 287026 More actions · Answer 2

Like Kevin said, we need the DDL for the tables and please be sure to include the index definitions. I saw no references to NCIs at all in your plan - are they all heaps?

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

bateman_aaron Valued Member Points: 59 More actions · Answer 3

Understood. When I ran the query analyzer it gave 0 suggestions for indexes. As the data was copied from another machine (just tables and data) there are 0 indexes.

I have attached the schemes for the tables in question.

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 4

bateman_aaron - Sunday, September 10, 2017 4:49 PM
Understood. When I ran the query analyzer it gave 0 suggestions for indexes. As the data was copied from another machine (just tables and data) there are 0 indexes.
I have attached the schemes for the tables in question.

Can you setup your scripting options to include all keys (PK, FK) and indexes and then generate the table scripts again, please?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

I don't have time to dig into it, but this part of the code just smells bad:

AND EXISTS ( SELECT * FROM DistrictStudents WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool )

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

bateman_aaron Valued Member Points: 59 More actions · Answer 6

Jeff Moden - Sunday, September 10, 2017 6:41 PM
Can you setup your scripting options to include all keys (PK, FK) and indexes and then generate the table scripts again, please?

There are no primary or foreign keys on the tables and there are no indexes on the machine that I am testing on. The production server does have indexes on all of the tables, but none of them seem to be helping, as both the production and test machine run the same query in about 6-7 seconds.

I believe when it was setup, the au_id columns were used just so they could delete duplicates quickly. New files get imported nightly and duplicates of the same students, teachers, buildings, etc are removed.

I could setup primary keys, but I do know that a table like Teachers or Students can have the same student exists multiple times, but in different schools or years. So I guess you would have to setup some kind of multi-column key? I "think" it would be impractical to recode everything to work based on the au_id's in each table.

bateman_aaron Valued Member Points: 59 More actions · Answer 7

TheSQLGuru - Sunday, September 10, 2017 7:24 PM
I don't have time to dig into it, but this part of the code just smells bad:
AND EXISTS ( SELECT * FROM DistrictStudents WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool )

At a glance, I don't see why it would cause an issue. I did create some indexes even though they were not recommended and I was able to shave off several seconds.
I do still wonder why it would take so long to run without giving a suggested index though.

I appreciate the time spent digging into it thus far.

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 8

bateman_aaron - Sunday, September 10, 2017 8:38 PM
Jeff Moden - Sunday, September 10, 2017 6:41 PM
Can you setup your scripting options to include all keys (PK, FK) and indexes and then generate the table scripts again, please?
There are no primary or foreign keys on the tables and there are no indexes on the machine that I am testing on. The production server does have indexes on all of the tables, but none of them seem to be helping, as both the production and test machine run the same query in about 6-7 seconds.
I believe when it was setup, the au_id columns were used just so they could delete duplicates quickly. New files get imported nightly and duplicates of the same students, teachers, buildings, etc are removed.
I could setup primary keys, but I do know that a table like Teachers or Students can have the same student exists multiple times, but in different schools or years. So I guess you would have to setup some kind of multi-column key? I "think" it would be impractical to recode everything to work based on the au_id's in each table.

You need to setup your test environment to be the same as your production environment, including the PKs and Indexes. And, we need the full monty DDL for your production environment and for you to also make sure that the Actual Execution Plan is from either the properly setup identical-to-production test database or from the production database itself.

As for the query itself, I believe a little Divide'n'Conquer in the form of "Pre-aggregation" is in order but I can't tell without the correct full monty DDL. None of us can.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

GroverVivek SSCommitted Points: 1674 More actions · Answer 9

bateman_aaron - Sunday, September 10, 2017 8:39 PM
TheSQLGuru - Sunday, September 10, 2017 7:24 PM
I don't have time to dig into it, but this part of the code just smells bad:
AND EXISTS ( SELECT * FROM DistrictStudents WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool )
At a glance, I don't see why it would cause an issue. I did create some indexes even though they were not recommended and I was able to shave off several seconds.
I do still wonder why it would take so long to run without giving a suggested index though.
I appreciate the time spent digging into it thus far.

Not sure but trying converting this correlated subquery to inner can help you

AND EXISTS ( SELECT * FROM DistrictStudents WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool )

Regards
VG

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 10

Considering the desired output, the real problem is that bloody student schedule table. It needs to be pre-aggregated to only the information required which will remove thousands and thousands of unnecessary row joins throughout the pipe. I expect it'll whittle down to being not much larger than the teacher table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101103 More actions · Answer 11

Assuming you have multiple CountyDistrict values, and the table isn't clustered already, try clustering the tables:

CREATE CLUSTERED INDEX DistrictTeachers__CL ON dbo.DistrictTeachers ( CountyDistrict, TeacherId ) WITH ( FILLFACTOR = 98 ) ON [PRIMARY];
CREATE CLUSTERED INDEX DistrictStudentSchedule__CL ON dbo.DistrictStudentSchedule ( CountyDistrict, SchoolCode, SchoolYearEnding, StudentId ) WITH ( FILLFACTOR = 98 ) ON [PRIMARY];
CREATE CLUSTERED INDEX SchoolBuilding__CL ON dbo.SchoolBuilding ( CountyDistrict, SchoolCode ) WITH ( FILLFACTOR = 98 ) ON [PRIMARY];

As noted previously, the DISTINCT is a huge drag on performance. You should be able to get rid of it with more specific joins between tables and/or CROSS APPLY using a TOP (1), such as for Teachers, rather than returning all matches (presumably the most recent name is good for the teacher even if the name did change).

It won't hurt to explicitly limit county district in the other tables, although SQL should already be doing that implicitly/automatically:

WHERE DistrictStudentSchedule.CountyDistrict = '111111'
AND DistrictTeachers.CountyDistrict = '111111'
AND SchoolBuilding.CountyDistrict = '111111'
AND ...

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".