August 23, 2010 at 5:11 pm
First off - I apologize if this isn't the correct forum, but I have a question regarding a very slow query that shouldn't be. I have tables of live data (records are constantly being streamed in) that I also query against. I have used the actual execution plan to pinpoint where it is slow and it always comes back to these 'live' tables. The part that is the slowest is usually a clustered index scan (sometimes I can trick it into doing a clustered index seek which I've heard is faster - I haven't been able to validate this) on the largest table (15 mil+ records). I have 2 numerical ID fields (both unique) set up as PKs on this table - one ID field ties back to another table, the other - to another table and (for experimentation) I also indexed one of the PK ID fields. Now - my basic question is - a query to return about 600 rows takes over a minute and a half. Can this be improved upon? I realize my description is somewhat vague so if more information is needed - I'll do my best to accomodate. Thank you in advance.
August 24, 2010 at 8:04 am
Ok you in the right place..so far so good.
The optimizer is cost based and SQl chooses from there. A clustered scan vs a seek based depends on your query a Scan may or may not be faster.General speaking the optimizer is going to do its job.
You just need to rewrite your query to better tell it what you need.
In order to help we need to see the query plan and the query, we can help out from there.
Other wise we can only guess.
take a quick look here...
August 24, 2010 at 8:08 am
To really drill down on specific solutions, yeah, a copy of the query (or queries) you're running, sample structure, sample data, and, most importantly, actual (not estimated) execution plans.
But, in general, a scan on a 125 million row table means that each and every row is examined. Depending on other behaviors, this is usually a bad thing. To get 600 rows out of 125 million, I would expect to see a seek, not a scan, and yes, it should perform scads better.
But without the query and the execution plans, I can't tell you specifically what you need to fix.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 24, 2010 at 8:13 am
And as soon as you have time
PLEASE READ Grants book:-D
August 24, 2010 at 8:21 am
Here's the query:
select d.column1, c.column1, b.column1
from table1 a
inner join table2 b on a.id1 = b.id1
inner join table3 c on b.id2 = c.id1
inner join table4 d on a.id2 = d.id1
inner join table5 e on b.field1 = e.id1
where cast(convert(varchar, a.datefield, 101) as smalldatetime) between 'date1' and 'date2'
and c.column1 in ('value1','value2','value3','value4')
and d.column1 in (select value from #temptable)
and b.column1 > 0
group by d.column1, c.column1, b.column1
There's a calculation being done on b.column1 that uses a field in the 'e' table (incase you're wondering why it's joined and not referenced). I'll try and attach the execution plan somehow in another post.
August 24, 2010 at 8:25 am
if you are having troubles posting the plan please follow Gails Post
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
August 24, 2010 at 8:30 am
Please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 24, 2010 at 8:31 am
Why are you converting datefield into varchar and then back into smalldatetime? What are date1 and date2 and what data type are they? This is why we need DDL as well as the query and the execution plan.
John
August 24, 2010 at 8:45 am
I can't post the execution plan because it may have proprietary information in it. I'm casting and converting a datetime value (2010-08-25 12:34:56.789) to be used against a smalldatetime value (2010-08-25 12:00:00.000)
I think the root of my problem may be the way I have (or don't have) PKs and indexes set up. I haven't had to deal with PKs and indexes on this magnitude of data - so I just have the default PKs set up. Would indexing help and on what and how?
August 24, 2010 at 8:53 am
You don't need to convert at all. Just datefield BETWEEN date1 AND date2 will do. That alone will probably speed up your query a bit. I'm afraid we can't provide any specific help on indexing without seeing DDL and execution plans. A good place to start is to put an index on columns used in WHERE clauses and in JOIN predicates. Once you have created indexes you need to set up a job to maintain them.
John
August 24, 2010 at 8:58 am
First things first...
cast(convert(varchar, a.datefield, 101) as smalldatetime) between 'date1' and 'date2'
That is not SARGable. No indexes on the date columns will help here. What's the reason behind that conversion?
Start with indexes on
c (column1, id1)
d (column1, id1)
b (id1, id2, column1)
Without seeing the exec plan, that's a shot in the dark.
All the exec plan has is table and column names, and any constants passed to the query.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 24, 2010 at 9:01 am
I apologize for the inability to show the execution plan. I will try your suggestions and let you know if and how I was able to fix it.
August 26, 2010 at 6:52 am
This is probably a crap answer - but it was just a matter of getting the indexes and PKs and fill factors correct. My index seeks are still kinda' high, but the overall query will run for a week of data (somewhere in the neighborhood of about 60000 rows) in about 25-30 seconds. Thanks for everyone's help!
August 26, 2010 at 6:56 am
Please will you post the final version of your query? It may help someone else who's having a similar problem.
Thanks
John
August 26, 2010 at 7:17 am
The query didn't change much. I took the advice of GilaMonster and changed the date filter parameter to keep the original datatype of the field and expanded the date parameters ('date1' and 'date2' - smalldatetime type) to look @ a 24 hour window like so:
select d.column1, c.column1, b.column1
from table1 a
inner join table2 b on a.id1 = b.id1
inner join table3 c on b.id2 = c.id1
inner join table4 d on a.id2 = d.id1
inner join table5 e on b.field1 = e.id1
where a.datefield between 'date1' and 'date2'
and c.column1 in ('value1','value2','value3','value4')
and d.column1 in (select value from #temptable)
and b.column1 > 0
group by d.column1, c.column1, b.column1
The cost savings were minimal but still a savings. Again - it was really a matter of extensively playing with the indexes and fill factors (none of which were set up correctly - if at all) of all the tables involved. I had read on another site that fill factors impact clustered keys the most since they're 'grouped'/'paged' together, so I set all of my clustered indexes to 80% fill (didn't do 100% 'cuz I wasn't sure @ the time if it would help much). There were also some math calculations being done against that data on the fly - so I ditched that and brought in the raw data into a temp table and did the calcs there. I also set up some pretty extensive index management jobs and index stats updates to be done on a nightly basis.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply