Query of live data too slow

Question

Query of live data too slow

chugghin

Say Hey Kid

Points: 672
More actions
August 23, 2010 at 5:11 pm

#228261

First off - I apologize if this isn't the correct forum, but I have a question regarding a very slow query that shouldn't be. I have tables of live data (records are constantly being streamed in) that I also query against. I have used the actual execution plan to pinpoint where it is slow and it always comes back to these 'live' tables. The part that is the slowest is usually a clustered index scan (sometimes I can trick it into doing a clustered index seek which I've heard is faster - I haven't been able to validate this) on the largest table (15 mil+ records). I have 2 numerical ID fields (both unique) set up as PKs on this table - one ID field ties back to another table, the other - to another table and (for experimentation) I also indexed one of the PK ID fields. Now - my basic question is - a query to return about 600 rows takes over a minute and a half. Can this be improved upon? I realize my description is somewhat vague so if more information is needed - I'll do my best to accomodate. Thank you in advance.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

456789psw SSCarpal Tunnel Points: 4728 More actions · Answer 1

Ok you in the right place..so far so good.

The optimizer is cost based and SQl chooses from there. A clustered scan vs a seek based depends on your query a Scan may or may not be faster.General speaking the optimizer is going to do its job.

You just need to rewrite your query to better tell it what you need.

In order to help we need to see the query plan and the query, we can help out from there.

Other wise we can only guess.

take a quick look here...

http://www.qdpma.com/SQLServerCostBasedOptimizer.html

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 2

To really drill down on specific solutions, yeah, a copy of the query (or queries) you're running, sample structure, sample data, and, most importantly, actual (not estimated) execution plans.

But, in general, a scan on a 125 million row table means that each and every row is examined. Depending on other behaviors, this is usually a bad thing. To get 600 rows out of 125 million, I would expect to see a seek, not a scan, and yes, it should perform scads better.

But without the query and the execution plans, I can't tell you specifically what you need to fix.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

456789psw SSCarpal Tunnel Points: 4728 More actions · Answer 3

And as soon as you have time

PLEASE READ Grants book:-D

chugghin Say Hey Kid Points: 672 More actions · Answer 4

Here's the query:

select d.column1, c.column1, b.column1

from table1 a

inner join table2 b on a.id1 = b.id1

inner join table3 c on b.id2 = c.id1

inner join table4 d on a.id2 = d.id1

inner join table5 e on b.field1 = e.id1

where cast(convert(varchar, a.datefield, 101) as smalldatetime) between 'date1' and 'date2'

and c.column1 in ('value1','value2','value3','value4')

and d.column1 in (select value from #temptable)

and b.column1 > 0

group by d.column1, c.column1, b.column1

There's a calculation being done on b.column1 that uses a field in the 'e' table (incase you're wondering why it's joined and not referenced). I'll try and attach the execution plan somehow in another post.

456789psw SSCarpal Tunnel Points: 4728 More actions · Answer 5

if you are having troubles posting the plan please follow Gails Post

http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 7

Why are you converting datefield into varchar and then back into smalldatetime? What are date1 and date2 and what data type are they? This is why we need DDL as well as the query and the execution plan.

John

chugghin Say Hey Kid Points: 672 More actions · Answer 8

I can't post the execution plan because it may have proprietary information in it. I'm casting and converting a datetime value (2010-08-25 12:34:56.789) to be used against a smalldatetime value (2010-08-25 12:00:00.000)

I think the root of my problem may be the way I have (or don't have) PKs and indexes set up. I haven't had to deal with PKs and indexes on this magnitude of data - so I just have the default PKs set up. Would indexing help and on what and how?

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 9

You don't need to convert at all. Just datefield BETWEEN date1 AND date2 will do. That alone will probably speed up your query a bit. I'm afraid we can't provide any specific help on indexing without seeing DDL and execution plans. A good place to start is to put an index on columns used in WHERE clauses and in JOIN predicates. Once you have created indexes you need to set up a job to maintain them.

John

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

First things first...

cast(convert(varchar, a.datefield, 101) as smalldatetime) between 'date1' and 'date2'

That is not SARGable. No indexes on the date columns will help here. What's the reason behind that conversion?

Start with indexes on

c (column1, id1)

d (column1, id1)

b (id1, id2, column1)

Without seeing the exec plan, that's a shot in the dark.

All the exec plan has is table and column names, and any constants passed to the query.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

chugghin Say Hey Kid Points: 672 More actions · Answer 11

I apologize for the inability to show the execution plan. I will try your suggestions and let you know if and how I was able to fix it.

chugghin Say Hey Kid Points: 672 More actions · Answer 12

This is probably a crap answer - but it was just a matter of getting the indexes and PKs and fill factors correct. My index seeks are still kinda' high, but the overall query will run for a week of data (somewhere in the neighborhood of about 60000 rows) in about 25-30 seconds. Thanks for everyone's help!

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 13

Please will you post the final version of your query? It may help someone else who's having a similar problem.

Thanks

John

chugghin Say Hey Kid Points: 672 More actions · Answer 14

The query didn't change much. I took the advice of GilaMonster and changed the date filter parameter to keep the original datatype of the field and expanded the date parameters ('date1' and 'date2' - smalldatetime type) to look @ a 24 hour window like so:

select d.column1, c.column1, b.column1

from table1 a

inner join table2 b on a.id1 = b.id1

inner join table3 c on b.id2 = c.id1

inner join table4 d on a.id2 = d.id1

inner join table5 e on b.field1 = e.id1

where a.datefield between 'date1' and 'date2'

and c.column1 in ('value1','value2','value3','value4')

and d.column1 in (select value from #temptable)

and b.column1 > 0

group by d.column1, c.column1, b.column1

The cost savings were minimal but still a savings. Again - it was really a matter of extensively playing with the indexes and fill factors (none of which were set up correctly - if at all) of all the tables involved. I had read on another site that fill factors impact clustered keys the most since they're 'grouped'/'paged' together, so I set all of my clustered indexes to 80% fill (didn't do 100% 'cuz I wasn't sure @ the time if it would help much). There were also some math calculations being done against that data on the fly - so I ditched that and brought in the raw data into a temp table and did the calcs there. I also set up some pretty extensive index management jobs and index stats updates to be done on a nightly basis.