September 9, 2017 at 5:02 pm
Greetings all!
Using the query below I am able to return about 6500 results in roughly 7 seconds. This seems to me to be entirely too long. I would expect the query to run in less than a second. Full disclosure, I am doing the experiments on a home server, basically new DELL T130. The processor is not super powerful, but I have the tempdb AND data files on a RamDisk, so I am sure it is not a memory bottleneck.
I also tried moving them onto an SSD mirrored array. Neither seemed to affect the peformance of the query. The entire DB is only about 5GB. I've attached, code, table sizes and execution plan. Can anyone help me figure out how I can reduce the speed? I'm a bit of a SQL newbie when it comes to performance tuning!Please help!
SELECT DISTINCT
DistrictTeachers.LastName,
DistrictTeachers.FirstName,
DistrictTeachers.TeacherId,
SchoolBuilding.SchoolName,
SchoolBuilding.SchoolCode,
DistrictStudentSchedule.SchoolYearEnding,
DistrictStudentSchedule.Semester,
DistrictStudentSchedule.CountyDistrict
FROM DistrictStudentSchedule INNER JOIN
DistrictTeachers ON DistrictStudentSchedule.CountyDistrict = DistrictTeachers.CountyDistrict AND DistrictStudentSchedule.TeacherId = DistrictTeachers.TeacherId INNER JOIN
SchoolBuilding ON DistrictStudentSchedule.CountyDistrict = SchoolBuilding.CountyDistrict AND DistrictStudentSchedule.SchoolCode = SchoolBuilding.SchoolCode
WHERE DistrictStudentSchedule.CountyDistrict = '111111'
AND EXISTS (
SELECT * FROM
DistrictStudents
WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId
AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID
AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict
AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding
AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool
)
The table sizes are as follows:
District Students: 51,000 records, 20 columns
District Student Schedule: 750,000 records, 17 columns
District Teachers : 10,000 records, 7 columns
SchoolBuilding: 2,000 records, 4 columns
September 9, 2017 at 5:45 pm
1) Always compare estimated rows and actual rows. when they are off by 6 order of magnitude, bad stuff going to be happening in the plan. You need to find out why that is happening and fix it to be successful here.
2) Possible cartesian product? I didn't dig into your stuff well but the row counts explode as you progress through the data.
3) Massively expensive SORT/DISTINCT, partly from row estimation errors
4) Heap tables can have negative consequences.
6) What are the schemas, including indexes, of the various tables?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
September 10, 2017 at 2:49 pm
Like Kevin said, we need the DDL for the tables and please be sure to include the index definitions. I saw no references to NCIs at all in your plan - are they all heaps?
September 10, 2017 at 4:49 pm
Understood. When I ran the query analyzer it gave 0 suggestions for indexes. As the data was copied from another machine (just tables and data) there are 0 indexes.
I have attached the schemes for the tables in question.
September 10, 2017 at 6:41 pm
bateman_aaron - Sunday, September 10, 2017 4:49 PMUnderstood. When I ran the query analyzer it gave 0 suggestions for indexes. As the data was copied from another machine (just tables and data) there are 0 indexes.I have attached the schemes for the tables in question.
Can you setup your scripting options to include all keys (PK, FK) and indexes and then generate the table scripts again, please?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 10, 2017 at 7:24 pm
I don't have time to dig into it, but this part of the code just smells bad:
AND EXISTS (
SELECT * FROM
DistrictStudents
WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId
AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID
AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict
AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding
AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool
)
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
September 10, 2017 at 8:38 pm
Jeff Moden - Sunday, September 10, 2017 6:41 PMCan you setup your scripting options to include all keys (PK, FK) and indexes and then generate the table scripts again, please?
There are no primary or foreign keys on the tables and there are no indexes on the machine that I am testing on. The production server does have indexes on all of the tables, but none of them seem to be helping, as both the production and test machine run the same query in about 6-7 seconds.
I believe when it was setup, the au_id columns were used just so they could delete duplicates quickly. New files get imported nightly and duplicates of the same students, teachers, buildings, etc are removed.
I could setup primary keys, but I do know that a table like Teachers or Students can have the same student exists multiple times, but in different schools or years. So I guess you would have to setup some kind of multi-column key? I "think" it would be impractical to recode everything to work based on the au_id's in each table.
September 10, 2017 at 8:39 pm
TheSQLGuru - Sunday, September 10, 2017 7:24 PMI don't have time to dig into it, but this part of the code just smells bad:
AND EXISTS (
SELECT * FROM
DistrictStudents
WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId
AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID
AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict
AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding
AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool
)
At a glance, I don't see why it would cause an issue. I did create some indexes even though they were not recommended and I was able to shave off several seconds.
I do still wonder why it would take so long to run without giving a suggested index though.
I appreciate the time spent digging into it thus far.
September 10, 2017 at 9:41 pm
bateman_aaron - Sunday, September 10, 2017 8:38 PMJeff Moden - Sunday, September 10, 2017 6:41 PMCan you setup your scripting options to include all keys (PK, FK) and indexes and then generate the table scripts again, please?
There are no primary or foreign keys on the tables and there are no indexes on the machine that I am testing on. The production server does have indexes on all of the tables, but none of them seem to be helping, as both the production and test machine run the same query in about 6-7 seconds.
I believe when it was setup, the au_id columns were used just so they could delete duplicates quickly. New files get imported nightly and duplicates of the same students, teachers, buildings, etc are removed.
I could setup primary keys, but I do know that a table like Teachers or Students can have the same student exists multiple times, but in different schools or years. So I guess you would have to setup some kind of multi-column key? I "think" it would be impractical to recode everything to work based on the au_id's in each table.
You need to setup your test environment to be the same as your production environment, including the PKs and Indexes. And, we need the full monty DDL for your production environment and for you to also make sure that the Actual Execution Plan is from either the properly setup identical-to-production test database or from the production database itself.
As for the query itself, I believe a little Divide'n'Conquer in the form of "Pre-aggregation" is in order but I can't tell without the correct full monty DDL. None of us can.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 11, 2017 at 1:25 am
bateman_aaron - Sunday, September 10, 2017 8:39 PMTheSQLGuru - Sunday, September 10, 2017 7:24 PMI don't have time to dig into it, but this part of the code just smells bad:
AND EXISTS (
SELECT * FROM
DistrictStudents
WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId
AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID
AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict
AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding
AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool
)At a glance, I don't see why it would cause an issue. I did create some indexes even though they were not recommended and I was able to shave off several seconds.
I do still wonder why it would take so long to run without giving a suggested index though.I appreciate the time spent digging into it thus far.
Not sure but trying converting this correlated subquery to inner can help you
AND EXISTS (
SELECT * FROM
DistrictStudents
WHERE DistrictStudentSchedule.MosisStateId = DistrictStudents.MosisStateId
AND DistrictStudentSchedule.DistrictStudentId = DistrictStudents.DistrictStudentID
AND DistrictStudentSchedule.CountyDistrict = DistrictStudents.CountyDistrict
AND DistrictStudentSchedule.SchoolYearEnding = DistrictStudents.SchoolYearEnding
AND DistrictStudentSchedule.SchoolCode = DistrictStudents.HomeSchool
)
Regards
VG
September 11, 2017 at 7:42 am
Considering the desired output, the real problem is that bloody student schedule table. It needs to be pre-aggregated to only the information required which will remove thousands and thousands of unnecessary row joins throughout the pipe. I expect it'll whittle down to being not much larger than the teacher table.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 12, 2017 at 9:33 am
Assuming you have multiple CountyDistrict values, and the table isn't clustered already, try clustering the tables:
CREATE CLUSTERED INDEX DistrictTeachers__CL ON dbo.DistrictTeachers ( CountyDistrict, TeacherId ) WITH ( FILLFACTOR = 98 ) ON [PRIMARY];
CREATE CLUSTERED INDEX DistrictStudentSchedule__CL ON dbo.DistrictStudentSchedule ( CountyDistrict, SchoolCode, SchoolYearEnding, StudentId ) WITH ( FILLFACTOR = 98 ) ON [PRIMARY];
CREATE CLUSTERED INDEX SchoolBuilding__CL ON dbo.SchoolBuilding ( CountyDistrict, SchoolCode ) WITH ( FILLFACTOR = 98 ) ON [PRIMARY];
As noted previously, the DISTINCT is a huge drag on performance. You should be able to get rid of it with more specific joins between tables and/or CROSS APPLY using a TOP (1), such as for Teachers, rather than returning all matches (presumably the most recent name is good for the teacher even if the name did change).
It won't hurt to explicitly limit county district in the other tables, although SQL should already be doing that implicitly/automatically:
WHERE DistrictStudentSchedule.CountyDistrict = '111111'
AND DistrictTeachers.CountyDistrict = '111111'
AND SchoolBuilding.CountyDistrict = '111111'
AND ...
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply