March 24, 2011 at 2:28 pm
The stats seem up to date and in line with the data distribution you said earlier (10 run per race).
That nested loop appears to be the culprick but it's hard to test without all the data.
Could you post the query plan for this (added a hint for HASH join)
SELECT
*
FROM
dbo.MstRace
INNER JOIN
dbo.MstRun ON
dbo.MstRun.RaceID = dbo.MstRace.ID
LEFT OUTER JOIN
dbo.MstRace AS MstRaceP1 ON
MstRaceP1.ID = dbo.MstRun.PrvRace1
LEFT OUTER HASH JOIN
dbo.MstRun AS MstRunP1 ON
MstRunP1.ID = dbo.MstRun.PrvRun1
WHERE
dbo.MstRace.ID < 10000
March 24, 2011 at 2:53 pm
Thanks again for your input.
The new execution plan is attached.
March 24, 2011 at 3:04 pm
It's worst :unsure:
I will be offline for a while, I check back later today hopefully with a different angle to look at this problem.
March 24, 2011 at 3:47 pm
Sorry, saw this thread earlier but my life is a bit zany at the moment.
Remove the 100 percent part. There's no purpose to it and the old trick of sorting in a view never really worked well anyway.
That is one seriously WIDE result set. Most of this is numeric, I hope?
Here's where we'll need to concentrate (formatted for my convenience):
FROM
ri_master.dbo.trainer AS trainer_1
CROSS JOIN
dbo.MstRaceType2
JOIN
dbo.MstCourse
JOIN
dbo.MstRaceType1
JOIN
dbo.MstRace AS MstRace_1
ONdbo.MstRaceType1.ID = MstRace_1.RaceType1
JOIN
dbo.MstRider
ONMstRace_1.RiderID = dbo.MstRider.ID
JOIN
dbo.MstDistance
ONMstRace_1.DistanceID = dbo.MstDistance.ID
JOIN
dbo.MstAge
ONMstRace_1.AgeID = dbo.MstAge.ID
ONdbo.MstCourse.ID = MstRace_1.CourseID
ONdbo.MstRaceType2.ID = MstRace_1.RaceType2
RIGHT JOIN
ri_master.dbo.trainer
JOIN
dbo.MstRun AS MstRun_1
ONtrainer_1.tid = MstRun_1.TrainerID
RIGHT JOIN
ri_master.dbo.jockey
JOIN
dbo.MstAge AS MstAge_1
JOIN
dbo.MstRaceType1 AS MstRaceType1_1
JOIN
dbo.MstRace
JOIN
dbo.MstRun
ONdbo.MstRace.ID = dbo.MstRun.RaceID
JOIN
ri_master.dbo.sire
ONdbo.MstRun.SireID = ri_master.dbo.sire.shid
JOIN
ri_master.dbo.dam
ONdbo.MstRun.DamID = ri_master.dbo.dam.dhid
JOIN
dbo.MstRider AS MstRider_1
ONdbo.MstRace.RiderID = MstRider_1.ID
JOIN
dbo.MstCourse AS MstCourse_1
ONdbo.MstRace.CourseID = MstCourse_1.ID
ONMstRaceType1_1.ID = dbo.MstRace.RaceType1
JOIN
dbo.MstGoing
ONdbo.MstRace.GoingID = dbo.MstGoing.ID
JOIN
dbo.MstRaceType2 AS MstRaceType2_1
ONdbo.MstRace.RaceType2 = MstRaceType2_1.ID
JOIN
dbo.MstDistance AS MstDistance_1
ONdbo.MstRace.DistanceID = MstDistance_1.ID
ONMstAge_1.ID = dbo.MstRace.AgeID
JOIN
ri_master.dbo.horse
ONdbo.MstRun.HorseID = ri_master.dbo.horse.hid
ONri_master.dbo.jockey.jid = dbo.MstRun.JockeyID
JOIN
ri_master.dbo.owner
ONdbo.MstRun.OwnerID = ri_master.dbo.owner.oid
ONMstRun_1.ID = dbo.MstRun.PrvRun1
ON MstRace_1.ID = dbo.MstRun.PrvRace1
Was there a particular reasoning between the layered joins? For the Right JOIN subjoin, sure, but this is more embedded then I believe is necessary. Doesn't necessarily hurt anything, but it's hard to read.
Taking this and comparing it (with a bit of further reformatting for my preferences 🙂 ) you've got some estimation malfunctions going on here. A good example in the query plan is the hook off the index seek on MSTRun on RaceID <= 10000. For some reason, the optimizer estimated *10* rows. It came back with over 100k. The keylookup afterwards is estimated 1 row, and fires off the same 110k rows.
Add a few of these together and your optimizer is pretty fried. We're going to need to take this apart in inches.
Alright, trainer_1 out of the mix, it doesn't matter, it's a straight cross join, and is 6000 rows. We'll try to clean that up later, for some reason it thinks hash matching a cross join is a good thing but there's a lot in this plan that's gone a little sideways, and I'm not sure I found the right one.
Here's the core, pre outer join query:
select 1
FROM
ri_master.dbo.trainer AS trainer_1
CROSS JOIN
------
dbo.MstRaceType2
JOIN
dbo.MstRace AS MstRace_1
ONdbo.MstRaceType2.ID = MstRace_1.RaceType2
JOIN
dbo.MstCourse
ONMstRace_1.CourseID = dbo.MstCourse.ID
JOIN
dbo.MstRaceType1
ONMstRace_1.RaceType1 = dbo.MstRaceType1.ID
JOIN
dbo.MstRider
ONMstRace_1.RiderID = dbo.MstRider.ID
JOIN
dbo.MstDistance
ONMstRace_1.DistanceID = dbo.MstDistance.ID
JOIN
dbo.MstAge
ONMstRace_1.AgeID = dbo.MstAge.ID
----- Pre-Right Join Connections.
Let's start with the core, what's that .sqlplan look like?
Please note every row in here is repeated 6600 times because of that cross join from ri_master.dbo.trainer. Is this intentional? There are NO limitations placed against that trainer list. There are right join associations later, but it's just repeated data here... and I'm not sure that's your intent. You do it again for every trainer after the right join (it says join but there's no restrictive on clause).
That's 6600 * 6600 rows that don't seem to have a logical purpose. It looks like you actually want to bring that cross join in after the right join, against MSTRun_1, which would make sense there.
At around here:
RIGHT JOIN
ri_master.dbo.trainer
JOIN
dbo.MstRun AS MstRun_1
ONtrainer_1.tid = MstRun_1.TrainerID
I think that's where you actually want to bring in that first reference to trainer_1, after the right join.
If you can, let me see what the sql plan for that first component looks like (you posted your ddl and indexes earlier so I'll be able to compare to that if necessary), and take a solid look at your trainer logic. If we can clean those two items up I believe we'll clean up most of the problem.
EDIT: I should mention that the SELECT 1 is going to kind of lie to us, because it's going to just use indexing because there's no cause to go to the clustered for covering fields. That's okay. We need to see the core logic in action first before we go anywhere else, like worrying about what columns it wants later.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 24, 2011 at 4:11 pm
Craig, many thanks for your input. I must apologise, as I have since tidied up the source of the original view to at least get it working to an extent, the cross join has been removed. The new source was included as an attachment as I was trying to make the post look neater as I realised it did look terrible to read initially. My original view source was generated by the view designer which I have since modified manually as it was doing unexpected things to my view.
Leaving out all the individual columns, the joins look like this at the moment, this is the version Maxim has kindly been looking at;
FROM
dbo.MstRace INNER JOIN
dbo.MstRun ON dbo.MstRace.ID = dbo.MstRun.RaceID LEFT OUTER JOIN
dbo.MstRace AS MstRaceP1 ON dbo.MstRun.PrvRace1 = MstRaceP1.ID LEFT OUTER JOIN
dbo.MstRun AS MstRunP1 ON dbo.MstRun.PrvRun1 = MstRunP1.ID INNER JOIN
dbo.MstAge ON dbo.MstRace.AgeID = dbo.MstAge.ID INNER JOIN
dbo.MstCourse ON dbo.MstRace.CourseID = dbo.MstCourse.ID INNER JOIN
dbo.MstDistance ON dbo.MstRace.DistanceID = dbo.MstDistance.ID INNER JOIN
dbo.MstGoing ON dbo.MstRace.GoingID = dbo.MstGoing.ID INNER JOIN
dbo.MstRider ON dbo.MstRace.RiderID = dbo.MstRider.ID INNER JOIN
dbo.horse ON dbo.MstRun.HorseID = dbo.horse.hid INNER JOIN
dbo.MstRaceType1 ON dbo.MstRace.RaceType1 = dbo.MstRaceType1.ID INNER JOIN
dbo.MstRaceType2 ON dbo.MstRace.RaceType2 = dbo.MstRaceType2.ID INNER JOIN
dbo.dam ON dbo.MstRun.DamID = dbo.dam.dhid INNER JOIN
dbo.jockey ON dbo.MstRun.JockeyID = dbo.jockey.jid INNER JOIN
dbo.owner ON dbo.MstRun.OwnerID = dbo.owner.oid INNER JOIN
dbo.sire ON dbo.MstRun.SireID = dbo.sire.shid INNER JOIN
dbo.trainer ON dbo.MstRun.TrainerID = dbo.trainer.tid LEFT OUTER JOIN
dbo.MstDistance AS MstDistanceP1 ON MstRaceP1.DistanceID = MstDistanceP1.ID LEFT OUTER JOIN
dbo.MstRaceType1 AS MstRaceType1P1 ON MstRaceP1.RaceType1 = MstRaceType1P1.ID LEFT OUTER JOIN
dbo.MstRaceType2 AS MstRaceType2P1 ON MstRaceP1.RaceType2 = MstRaceType2P1.ID LEFT OUTER JOIN
dbo.trainer AS trainerP1 ON MstRunP1.TrainerID = trainerP1.tid LEFT OUTER JOIN
dbo.MstCourse AS MstCourseP1 ON MstRaceP1.CourseID = MstCourseP1.ID LEFT OUTER JOIN
dbo.MstRider AS MstRiderP1 ON MstRiderP1.ID = MstRaceP1.RiderID
March 24, 2011 at 4:24 pm
Ah hah! I must have missed that when I re-scanned the thread. That makes more sense as to his earlier observations. I do remember thinking to myself "Are we looking at the same query?!" 😛
I'll be back in a bit on this. My system is currently crying while I add a column to some rediculously large table so I have a bit of time. 🙂
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 24, 2011 at 4:58 pm
A couple of first glance notes with the right query.
First, your SQLPlan, which you have to go to the XML to see this, has this problem:
<StmtSimple StatementCompId="1" StatementEstRows="89912" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="TimeOut" StatementSubTreeCost="348.373" StatementText="SELECT * FROM [vw_raceruntrainer002] WHERE [raceid]<=@1" StatementType="SELECT">
You never finished optimizing. It got too complex.
Next, in your index definitions above, it doesn't describe an Index I see being used on MstRun. IX_RaceIDLenghtsBehindWinner.
I'm assuming you ran this with an extra component for raceid <= 10000 to try to get an actual execution plan. It actually goofed up the query plan a little bit, but I realize the necessity.
Am I correct in assuming you're expecting 1.4 million rows out of this query at the end because MSTRun is actually the core table to this, and that's the rowcount? I don't THINK any of these joined items will duplicate rows, but can you confirm?
Nothing in here looks horribly poor. The scans are all on small tables (two or three internal pages at most, no issue), and you're going to almost always scan because you're pulling *everything*, so index seeking is not really up for discussion here.
Your estimated and actual number of rows get close to each other near the end (89k est, 110k actual), and there might be a slightly better shape to the query if it estimated your <=10000 correctly, but that's because of the parameterization of it and it was probably compiled with a very small number, like 1 or 2.
I see why Maxim was going after join mechanics to try to speed this up, but I'm relatively sure this won't help.
You commented earlier this is primarily a home business. Can I assume you're running this on your desktop PC using standard internal drives to the PC?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 24, 2011 at 5:32 pm
Craig, Thanks for the reply again, I am going to go through the execution plan and try and hook up your points with what is included in the plan. Before today, I had never heard of such a thing, it seems to offer quite an insight into the mechanics of sql?
IX_RaceIDLenghtsBehindWinner is an index used for another query / application, I didn't include in the definitions above as I was trying to keep things clear with regard to this view. There are in fact another five indexes on MstRun not detailed in the indexes.rtf, they are over Rundate, HorseIDCourseid etc. I thoguht they were not relevant to this problem:blush:
The <= 10000 was indeed to get the query to execute, it never finished when left to run over the whole file previously.
Ultimately I was looking to run this view / query overnight each night into a table, and then create indexes over quite a few columns to enable some of my appliucations to do some heavy querying of the resultant table. In a word, yes I wanted the whole 1,400,000+ rows.
My machine is fairly standard dual core 2.8mHz processor, on checking I have 2gb ram. Disks two raided 1tb sata HDD, SQL runs on a 100gb partition of the raid (that's raid that spans two disks, not mirroring). Win XP sp3.
I could easily upgrade to 4gb ram, would this give a performance boost of note?
If as you are thinking there is no real improvement to be made from the SQL, perhaps I will do the query without MstRaceP1 and MstRunP1 into a table, then run some code to go through the table and pickup rows associated with PrvRace1 and PrvRun1 and populate the fields this way?
I hope I haven't wasted anyones time with this, all the help is most appreciated.
jake
March 24, 2011 at 5:43 pm
The most important point first:
dji (3/24/2011)
I hope I haven't wasted anyones time with this, all the help is most appreciated.
You haven't Jake. This is an excellent example to learn from. Seemingly complex but actually simple, yet huge, so it shows the problems with large data volume mechanics. That and we're selective volunteers. If we wasted our time, it was our choice. 😀
Craig, Thanks for the reply again, I am going to go through the execution plan and try and hook up your points with what is included in the plan. Before today, I had never heard of such a thing, it seems to offer quite an insight into the mechanics of sql?
In a way, it IS SQL. The SQL engine offers storage, sure. It offers backups of that. It offers some data consistency checks as constraints. Nothing any decent filesystem with a little scripting couldn't do. The biggest part of the engine itself is the optimizer, which builds these execution plans, and all the hooks that allow the optimizer to go faster (partitioning, indexing, join methodology, etc.).
The execution plan is the first point of reference when performance goes out of whack.
IX_RaceIDLenghtsBehindWinner is an index used for another query / application, I didn't include in the definitions above as I was trying to keep things clear with regard to this view. There are in fact another five indexes on MstRun not detailed in the indexes.rtf, they are over Rundate, HorseIDCourseid etc. I thoguht they were not relevant to this problem:blush:
Every index can be relevant, if only for the reason that it's NOT using it.
My machine is fairly standard dual core 2.8mHz processor, on checking I have 2gb ram. Disks two raided 1tb sata HDD, SQL runs on a 100gb partition of the raid (that's raid that spans two disks, not mirroring). Win XP sp3.
I could easily upgrade to 4gb ram, would this give a performance boost of note?
That depends, and I need to go find the correct perfmon counters. The next thing to check is twofold. First is your I/O from drive. The second is your page faults to virtual memory (aka: swapfile or pagefile). The upgrade will only help the second. That you're reading from and writing to the same physical array isn't helping. I believe you're actually hitting hardware limitations, not query limitations. When you want everything, you get everything... and there's not a lot of optimization that can happen when you say "Gimme all of it". It's doing the best it reasonably can from what I'm seeing.
If as you are thinking there is no real improvement to be made from the SQL, perhaps I will do the query without MstRaceP1 and MstRunP1 into a table, then run some code to go through the table and pickup rows associated with PrvRace1 and PrvRun1 and populate the fields this way?
You're reading my mind. If we can't deal with the length, let's try going after the width. I'd start with building out the 1.4 mill rows from the MstRun table first, perhaps moving over all associated data. Next, move over a few more tables worth of data using updates. Rinse, repeat.
It will take longer then an all at once approach if the hardware could keep up, but if it never completes, anything is faster then infinity.
Also, make sure your target database is grown out to a reasonable size before you start this. If you're constantly filegrowthing in 1 megabyte increments, that's certainly not helping.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 24, 2011 at 11:46 pm
Thanks for all the help on this problem to everyone who has posted. It has been a massive learning curve, and a very useful one.
One idea I have thought about for a while to speed things up is a solid state drive for my SQL data directory. The last reviews I read had some problems with the technology, perhaps I need to look at this again. I could spend money on a new machine entirely as it is critical for me to keep upping demands.
I should have added the large drive containing my SQL db is not used for any other i/o of note. The large chunk of space left over from the 2tb riad is used for backup of racing videos overnight which only takes 10 minutes max. So the drive is basically used only for SQL, attached direcetly to a built in riad controller on the mb.
jake
March 25, 2011 at 11:19 am
Hi again Jake
I think it would be a great idea to divide the process into two parts. First populate a table with all the information from the runs and after that add the information regarding the previous run.
I'd be happy to look at this again after you have the first part of the query materialized if you'd like.
It's great to see what problems others are facing, especially when they approach it like you do.
Maxim
June 6, 2011 at 10:00 am
For anyone remotely interested!!
I bought a new laptop recently, intel i7 quadcore, 4gb ram. Loaded SQL 2008 R2 attached my db featured in this thread and ran this query.
Return 1,500,000 records in 10 mins 56 seconds. A remarkable improvement, presumably the combination of bigger processing power and sql 2008 r2 have made this quantum leap in performance possible.
June 7, 2011 at 9:46 am
dji (6/6/2011)
For anyone remotely interested!!I bought a new laptop recently, intel i7 quadcore, 4gb ram. Loaded SQL 2008 R2 attached my db featured in this thread and ran this query.
Return 1,500,000 records in 10 mins 56 seconds. A remarkable improvement, presumably the combination of bigger processing power and sql 2008 r2 have made this quantum leap in performance possible.
Indeed it is true that sometimes better hardware IS the simplest and most cost-effective solution! 😎
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply