June 5, 2009 at 3:42 am
We have a table with 200+ million rows and growing. This table is so heavily updated that there are no indices placed on it: reads are far less occuring then inserts/updates, it's accepted that the the occasional read takes a while. However, there is a read which would improve a process when it would perform faster:
SELECT COUNT(1) FROM tablename
WHERE c1 = @v1 AND c2 - @v2 and c3 = @v3 and c4 = V4
I've created a construction with ROW_NUMBER with partitioning on the WHERE columns, which gives an improvement of about 33%. This is is not bad, but I wonder if there is another way to improve this. Obviously a redesign of this part of the database would be better, but that is not going to happen in the near future.
TIA,
Greetz,
Hans Brouwer
June 5, 2009 at 3:51 am
Hi,
can u give the definition of your table and supposed query to read the table.
Though, for quick view, you can replace your inner query (if there is any) with 'join'; this will boost your execution near to 50% in most cases.
"Don't limit your challenges, challenge your limits"
June 5, 2009 at 5:54 am
Hi Kruti,
There is no inner SELECT, the statement is as I have given. There are some more columns, but that is not important, only the SELECT COUNT(1) is executed. The columns in the WHERE clause are all NVARCHAR(n), max length 20.
Tnx for answering.
Greetz,
Hans Brouwer
June 5, 2009 at 7:44 am
Adding an index will allow the COUNT process to find something with fewer pages than the entire table. Otherwise, you have to do a table scan, every time, no options.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 5, 2009 at 9:23 am
FreeHansje (6/5/2009)
We have a table with 200+ million rows and growing. This table is so heavily updated that there are no indices placed on it:
Have you tested and proven that even one index degrades insert and update performance to unacceptable levels?
SELECT COUNT(1) FROM tablename
WHERE c1 = @v1 AND c2 - @v2 and c3 = @v3 and c4 = V4
Nonclustered index on c1, c2, c3, c4. There is no other way to make that count fast.
Without an index you're doing the equivalent of asking SQL to read every single entry in the entire telephone directory to see how many Piet van Staaden's live in Pofadder. With an index it can go straight to the records required.
Edit: If you're on Enterprise Edition, consider table partitioning.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 6, 2009 at 4:44 am
Hey,
You say that this table has heavy insert/update activity - inserts are not a problem, but I am at a loss to understand how updates can be performed in reasonable time on such a large table without any indexes?
I am also intrigued to see the ROW_NUMBER() OVER (PARTITION BY...) query you mention - as it is not at all clear to me how that could improve read performance? How did you measure it, and what improved by 33%? Logical reads? Execution time?
It would be interesting to know how many forwarded records exist in that heap. Try running the following (it will scan the whole table so may take a few minutes):
DECLARE @db_id SMALLINT;
DECLARE @object_id INT;
SET @db_id = DB_ID(N'name of your database');
SET @object_id = OBJECT_ID(N'fully qualified name of your table');
SELECT * FROM sys.dm_db_index_physical_stats(@db_id, @object_id, 0, NULL , 'DETAILED');
The number of forwarded records is shown in the forwarded_record_count column. Forwarded records in a heap really hurt scan performance. Creating a clustered index (which would generally improve insert and update performance too, but potentially hurt select performance for key lookups) would remove this issue.
There are a number of other potential ways to speed up the query without a redesign, but you didn't give enough information in your original post to say which might work for you. One important question is whether the COUNT needs to be up-to-the-second or whether a delay could be tolerated. Possible non-index solutions include:
Use the READ UNCOMMITTED isolation level (if the COUNT can tolerate dirty reads)
Use replication (or log shipping / mirroring) to move the select load to another place where you can create supporting indexes.
Enable parallelism for the COUNT query (OPTION MAXDOP (n))
Use a local partitioned view over data stored in separate tables with read-only data on read-only filegroups.
Increase memory on the server to enable more of the table to remain in buffer cache.
Place the table data on a faster I/O path.
...and so on...
Paul
edit: removed angle brackets from the code (bless this site's formatting)
June 6, 2009 at 8:56 am
Hi Gila, Paul,
I am not at work(obviously!) so I have to do this from memory:
First of all, a quick(&dirty)solution was needed. A correction: there is a primkey defined, so it's not without ANY index. Information was given to me by some permanent staff, I have been only there for a few weeks, don't know all the systems yet. I have suggested indices on the c1-c4 columns, my collegue told me they have removed those indices 6 months ago and so achieved a big performance boost. Maybe I should check this for myself, but that is in the past.
Table partitioning is something to consider, not only in this situation. However, I do not wish to implement table partitioning on the fly, so it's gonna take a while, if and when.
As how I tested the speed improvement of 30+ %, simply by looking at the time it took with the old query and with a ROW_NUMBER construction, on a test-system with 40 mil rows. The old query took 9-10 seconds, the ROW_NUMBER query took 6-7 seconds, both after compilation. I could have done a more thorough measuring, I agree; I have similar code as Paul gave to do just that, did not think of it. So I looked at executiontime.
As to why the ROW_NUMBER construction works, I dunno. I notice sometimes that a ROW_NUMBER construction instead of the WHERE-clause helps when you are looking at very many records, so in such situations I check if this works.
As for your other suggestions: I will look into some of them, didn't think about that. The file/filegroup config certainly is not optimal, it's on my to-do list. Paralellism: I'm not to keen on messing with SQL Server's own decisions with what the optimal way is to handle things. Memory is already 32 Gb and overall this is plenty for what's happening on this machine.
Tnx for answering, you have been helpful.
Greetz,
Hans Brouwer
June 6, 2009 at 9:18 am
FreeHansje (6/6/2009)
A correction: there is a primkey defined, so it's not without ANY index. Information was given to me by some permanent staff, I have been only there for a few weeks, don't know all the systems yet. I have suggested indices on the c1-c4 columns, my collegue told me they have removed those indices 6 months ago and so achieved a big performance boost. Maybe I should check this for myself, but that is in the past.
Ah - a PK changes everything (assuming it is clustered). If there is a clustered index, disregard the stuff about forwarded records.
FreeHansje (6/6/2009)
Table partitioning is something to consider, not only in this situation. However, I do not wish to implement table partitioning on the fly, so it's gonna take a while, if and when.
Sounds sensible. One needs to be careful with partitioning in 2K5 because of the parallelism issues and the loop join used to access partitions. Very generally speaking, I tend to favour table partitioning for ease of maintenance only - not for performance. Naturally there are exceptions. Also speaking very generally, partitioned views can give better performance in these types of scenarios - again 2K5 specific.
FreeHansje (6/6/2009)
As how I tested the speed improvement of 30+ %, simply by looking at the time it took with the old query and with a ROW_NUMBER construction, on a test-system with 40 mil rows. The old query took 9-10 seconds, the ROW_NUMBER query took 6-7 seconds, both after compilation. I could have done a more thorough measuring, I agree; I have similar code as Paul gave to do just that, did not think of it. So I looked at executiontime.
The code I posted wasn't for performance testing. My worry here is that the cache wasn't cleared between tests so the second run benefited from having much of the table pre-loaded in buffer cache. DBCC DROPCLEANBUFFERS and DBCC FREESYSTEMCACHE('ALL') will clear the caches for you. SET STATISTICS IO, TIME ON before the test and SET STATISTICS IO, TIME OFF after will give you some useful statistics too - or you can use profiler of course.
FreeHansje (6/6/2009)
Paralellism: I'm not to keen on messing with SQL Server's own decisions with what the optimal way is to handle things.
For some reason, I was assuming you are running an OLTP system with MAXDOP 1 at the server level. Honestly don't know why I assumed that! My point was to make sure that you were benefiting from parallel scanning of the table. Testing is obviously required, but parallel table or clustered index scans often out-perform single-threaded scans, though it does rather depend on the I/O characteristics of the system, physical fragmentation, and how aggressive the read-ahead manager gets.
If you are able to provide further details later on or have any further questions, please ask - one of the helpful and knowledgeable people on here will help if they can.
June 6, 2009 at 9:22 am
FreeHansje (6/6/2009)
I have suggested indices on the c1-c4 columns, my collegue told me they have removed those indices 6 months ago and so achieved a big performance boost. Maybe I should check this for myself, but that is in the past.
Not indices, Just one is needed on the combination of c1, c2, c3, c4.
Got a test environment where you can test exactly what performance degradation a single index causes? For what it's worth, it usually takes 10 or more indices to have a noticeable effect. (emphasis on 'usually')
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 7, 2009 at 1:12 pm
I would also consider checking what data can actually be entered in the table and review if you really need Nvarchar rather than varchar. If you have no need of the additional characters and can change it to varchar then you could pretty well halve the space taken by the nvarchar columns. End result will be even if you have to table scan you read far less off disk.
I would want to test extra indexes as well - find it hard to believe adding a few would cause a problem.
Can you spread the table over multiple file groups - and hence multiple disc drives?
Faster disc drives?
Better Raid levels?
etc
Could all help if you really cannot index it..
Mike
June 9, 2009 at 2:21 am
Hi Mike,
These things are indeed going to be looked into, but not on a ad-hoc basis. The configuration of the SQL Servers leaves a lot to be desired.
Greetz,
Hans Brouwer
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply