Weird pattern of behaviour

  • inserts rarely give any issues including blocking, however, you mention only inserting if exists. I've encountered many cases of procs being written which branch internally thus being able to produce alternate plans, the classic is "if exists update else insert"  these types of procs can be counterproductive. Parameterisation of proc calls rather than setting any values inside the proc can sometimes help. I've never found auto stats to do what you want when you want it to !!! If in doubt update stats yourself. As to the select changing, well yes it will as the data grows, I'm sure Kimberley Tripp ( and microsoft ) have some good articles on this, I'm reminded of one I read which showed how the addition of one row changed the select from a seek to a scan.

    As Gail says you can use profiler to capture the query plans, which is something I do often, you'll likely get a large trace but it should help pinpoint.

    On average a secondary index has to be better than 95% selective for the optimiser to use it , unless it's a covered index .

    One simple idea I've had is to put pauses into the import, at each pause run the queries yourself and capture the plan.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • That select will table scan if the number of records returned exceeds more than about 1% of the table. I can't tell whether the in is a single value (and if it is, you should be using =) or if it's multiple. SQL does this, because bookmark lookups on a significant portion of the table are very expensive. Since the NC is only on col8, SQL would have to do the search on the NC, then do a bookmark lookup t either the clustered index or the heap to retrieve col1 through col7. If a large number of boookmark lookups have to be done, it can be cheaper to simply scan the cluster/heap to retrieve the records.

    The SELECT returns at most a single row for every entry in the IN clause. I know when matching a single value we should be using = but the queries are generated for us by Kodo. We specify parameters, it builds a query. I don't know what control we have over that. Even though the IN clause can hold multiple values (the example I gave was for a single value), the number of rows returned will always be a vert small percentage of the total table except when the database is empty / new.

    Do you need all 8 columns returned? If not, remove the ones you don't and consider adding the other columns as included columns in the index.

    Yes, we need all the data.

    What data type is col8? What is the data type of the paramter?

    col8 is a varchar. The parameter is a String.

    Execution plans are discarded when the statistics of the underlying table changes. You do have auto-update statistics on?

    I don't know what is on and what is not. I think all the settings are default values. If this setting is on by default then it will be on. I will check.

    If you want to force recompiles on every exec (not something I'd recomend, but worth trying) then add (OPTION RECOMPILE) to the end of your query

    Since Kodo generates our queries for us, we are limited to what we can and cannot do to the actual query. We have the option of using passthru SQL statements. We are also considering removing Kodo from the database access layer anyway since it seems to be tying our hands too tight.

    I will check the SQL Profiler to capture the execution plans. I have more information to add but I will use a fresh post for that.

    Thanks, Max

  • Here's some more information that may be useful.

    1. I let my data load run for a couple of days and notes the continued degradation of the SELECT statement. When I ran the same statement in SQL Studio, it returned very quickly. I ran a batch of 10 queries which ran in a few milliseconds. Meanwhile, the same query is taking about 200ms in the dataload.

    2. I restarted WebLogic and SQL Server - the queries continue to perform the same.

    This really is strange. One would think that the same query would be executed in the same amount of time. I'm not even talking about a large table i.e. it currently holds 455207 rows. I think I need to capture the execution plan in SQL Profiler to see what the query is actually doing.

    UPDATE - while writing this post I ran a batch of 10 SELECTs but I included the WITH(INDEX(0)) option on the table. The query performed equally as badly as the data load. I think the source of my problem is that the SELECTs being issued are not using the query for some reason - even though the query IS used when I run the same manually.

    Thanks for everyone's input!

    Cheers, Max

  • I captured the execution plan from SQL Profiler and it looks like the INDEX is being used. The duration reported is 176ms which is way longer than anything I've seen in SQL Studio except when I used WITH(INDEX(0)). I don't think I can actually format text here in a nice way so I'll chop the Showplan ALL output down:

    StmtText

    --------

    Nested Loops(Inner Join, OUTER REFERENCES[t0].[epc_id]) OPTIMIZED)

    |--Index Scan

    |--Clustered Index Seek

    I know I have ripped out all the juicy information but does the above look correct for a SELECT statement using a single index? It appears to me as though the nonclustered index is being used to find a value in the clusteredindex and then that pointer is being used to locate the actual row.

    Cheers, Max

  • When I run the slow query in SQL Studio I get fairly consistent times. I setup a test to run the same statement 10 times in a row and record the elapsed time (not a great indicator but it will do for now). I tried the original query and two modified versions - version1 WITH INEX(0)); version 2 WITH(INDEX(index_name)). The quickest query was always the one that specified no index. I thought there may be an issue with the index but when I rebuilt the queries, no difference. I even dropped and created them again. Same thing, no difference. It seems that no matter what I do to this table I cannot make the queries run quicker - certainly not as quick as when the database holds less data. I hate to think how long the SELECT would take if I have millions of rows in the table which is likely for production.

  • The SELECT returns at most a single row for every entry in the IN clause.

    Yes, but how many enteries do you typically have in the IN clause? There comes a point (around 1% of the table in many cases) where a scan is faster than several seeks.

    Can you post a sample slow select and the structure of the table (with indexes) You can mangle the names if needed. Without that, we're guessing.

    col8 is a varchar. The parameter is a String

    Make sure the param is not a unicode string. If you see the SQL query, the string shouldn't be specified as N'ABC'. (or if the SQL is built up with params, the param type should be varchar, not nvarchar.) If the column is varchar and the parameter/constant is of type nvarchar, then an implicit conversion takes place, converting the column to nvarchar before doing the search. This prevents index seeks.

    The index scan tells me that, for some reason, your where clause is not sargable. It's cheap enough for SQL to do the bookmark lookups (the clustered index seeks) so it should be cheap enough to do NC index seeks, rather than a scan. I can't tell why without been able to see the query.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • the number of entries in the IN clause can be any number, maybe up to 100-200. It's purely driven by how many items are included with the event. For all of my testing the number of is 1. Here is an example of a slow select:

    SELECT t0.col1,

    t0.col2,

    t0.col3,

    t0.col4,

    t0.col5,

    t0.col6,

    t0.col7,

    t0.col8

    FROM dbo.table1 t0

    WHERE (t0.col8 IN ('some value here'))

    The table structure is:

    col1, varchar(64), non-unique nonclustered index

    col2 varchar(64), non-unique nonclustered index

    col3 bigint (primary key)

    col4 varchar(64), non-unique nonclustered index

    col5 varchar(64), non-unique nonclustered index

    col6 varchar(255), non-unique nonclustered index

    col7 varchar(64), non-unique nonclustered index

    col8 varchar(255), unique nonclustered index

    The parameter IS being specified as a N'value', I've seen it many, many times. I was blissfuly unaware of the implicit conversion and I will look into this right away. THANKS!

  • updating the column definition for col8 from varchar(255) to nvarchar(255) has resulted in a dramatic difference. I'll leave the test running to see if anything else happens but so far so good.

    A big THANK YOU to everyone who helped and especially Gail.

    Cheers, Max

  • Pleasure. I've had the problem a few times.

    What happens is that both sides of the expression must be of the same type. If the column is varchar and the parameter is nvarchar, SQL evaluates the expression as CONVERT(col8, nvarchar(255)) IN (N'abc',N'def') and the convert prevents index seeks.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I hate to disagree, how unlike me! , but Gail's statement needs some clarification .. ( I should add that I'm only sort of up on this as I'm tuning an international app which mixes varchar and nvarchar data types )

    In SQL 2000 the seeks turn to scans only when the nvarchar is on the right hand side compared to a varchar. when the table is nvarchar and the sarg is varchar everything is fine.

    In sql 2005 it's a bit different, ( in my tests ) all combinations did index seeks although the plan where the nvarchar is the sarg vs the varchar column is different. In all cases the reported io is identical.  I don't have anything to hand large enough ( table wise ) to check for timing differences - these multi-core boxes are so damn fast these days!!!

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Interesting observation, Colin and thanks for sharing.

    When my test started I noted much improved performance. For the first 329mins I was getting really good throughput, comparable with Oracle and then "something" happened. At 329mins I was seeing over 3000 events / minute. Then it dropped to <400 events / minute. I tried a couple of things like exec sp_updatestats and I even recreated the index that I was looking at before. Neither helped and in fact performance went down further still i.e. < 300 events / minute. I ran SQL Profiler and I see that the same SELECT query that was causing problems was running slower yet again; hence the updatestats and index rebuild. I am going to stop my load, drop the index, create the index and then start the load off from the same place to see if that helps. Fingers crossed.

    Cheers, Max

    Here are all of the things I have tried again and the results:

    [1] exec sp_updatestats -- no affect.

    [2] rebuild index on table1.col8 -- performance appears to have taken a further hit.

    [3] stop data load, remove all events of Type1, recreate index and restart data load -- no difference.

    [4] restart WLS and MSSQL -- no difference.

    [5] clear out database, recreate index, start data load with Type1 events to ensure those types are not causing the problem -- SELECTs now taking 0ms. Problem is not with Type1 events.

    So it has been shown that fiddling with the index does no good but deleting all data speeds it up. That makes sense but it's also a huge concern. If I am seeing this performance degradation with <500k rows, what's it going to be like with several million! Still diggin'

  • I do feel for you and it's difficult trying to visualise what I'd be doing in your situation - trouble is so much tuning is almost a black art - I often can't describe what it is that leads me to find solutions to problems - some of which are very obscure.

    So apologies if we've been through some of this already ..

    Database mdf and ldf growths can really slow performance. I'd be watching cpu usuage, disk io completion time, I'd be looking at ntfs fragmentation, I'd be looking at plan reuse and caching, page life expectancy and free pages, context switching. I'd probably run up dbcc umsstats as this allows you to see if threads are waiting.

    I have custom code I run which captures blocking processes and wait stats. Lots of these are to eliminate rather than locate issues.

    You could try turning off auto update and create stats on the database, you can usually catch the events in the proc cache though if stats are auto updating.

    There are problems with loads ( and other ops ) when the whole bunch get bound up into a transaction, implicit or not. This is one of the differences of using a cursor vs a while loop for rebuild tables , the while loop makes the whole lot into one transaction log event, whilst the cursor makes each sperate, this is illustrated where the index rebuild takes , say 40 mins, with tlog backups in full recovery every 5 mins.

    so how's the log growths?

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • I think it's well and truly nailed now. For some event types the SELECT where clause looked like this:

    WHERE table1.col8 IN (NULL, N'value')

    The code was recently fixed to remove the errant appearance of NULL and subsequent tests have shown performance to be more stable now.

    Thanks, Max

  • In SQL 2000 the seeks turn to scans only when the nvarchar is on the right hand side compared to a varchar. when the table is nvarchar and the sarg is varchar everything is fine.

    That's actually what I said. If the column is varchar and the param is nvarchar. It has nothing to do with the order of the arguments. If SQL is comparing a varchar and an nvarchar, the varchar will be converted before been compared. If its the column been converted, then you get a scan (as you would if you had any function on the column). If it's the param been converted, it's not an issue and you don't see any change. though you will still notice an implicit conversion in the exec plan, just of the parameter, not the column.

    I'll do a test on 2005 today, if I get a change. I have some really large tables that I can use.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail - I didn't disagree with you, just clarified - the conversion is only one way which causes a problem in 2000  whereas in 2005 it doesn't stop the seek either way. so technically although both sides should be the same they don't have to be.  I just thought it was a bit more specific rather than talking about column or parameter conversion.

    where column = parameter

    column       param          result

    varchar      nvarchar       scan ( 2000 )

    nvarchar    varchar         seek  ( 2000 )

    varchar      nvarchar       seek ( 2005 )

    nvarchar    varchar         seek ( 2005 )

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 15 posts - 31 through 45 (of 46 total)

You must be logged in to reply to this topic. Login to reply