March 16, 2005 at 8:29 am
I have a table that grows from 0 rows to 420,000+ rows each day and then wiped out each night. The data is sorted on 2 columns, column 1 is an integer, column 2 is a char(4). the index is sorted ASC on both.
Data is inserted into this table every 5 minutes during the day. The integer column is just a counter that denoted which 5 minute period of the day the data in column 2 (and subsequent columns) belongs to. The data looks like this when selected:
0, A 0, B 0, C 1, A 1, B 1, C
My Question: is there a performance benefit to clustering this index if the data is ordered in such a way that new data is always added to the end?
I know clustering places the data in the same order as the index. My fear is that at every 5 minute interval the entire data portion will be re-sorted along with the index. I do not know enough about the internals of sqlserver to know if there is some type of optimize process that is smart enough to do an append rather than resorting 400,000+ rows.
Can anyone shed some light here?
March 16, 2005 at 8:39 am
>>My fear is that at every 5 minute interval the entire data portion will be re-sorted along with the index.
"Re-sorting" may be the outward appearance of what happens, but it is page-splitting and fragmentation that are the real concern behind the scenes.
If your data always arrives "sorted", i.e. if each insert is in the same ascending order as the existing clustered index, then clustering should not cause much, if any degradation on insert, and may make subsequent SELECT's perform better.
"May" improve, because without knowing the typical queries that run against this table, and the other indexes on it, there's no real way of knowing.
March 16, 2005 at 8:48 am
2 things to note - 1) This is the only index on the table and 2) The table is pinned.
We have the memory to spare and the table is hit very hard by user queries and other internal process.
Does this make things better or worse with regard to this index becoming clustered?
March 17, 2005 at 1:54 am
If data is added only to the end, then there won't be a page splitting / fragmentation issue (and you can have a 100% fill factor) but is the index relevant to queries? If not, you might just as well use a heap. Even if it is relevant to the queries, you might get better performance with a non-clustered covering index.
March 17, 2005 at 8:20 am
Yes, as I said, the table is hit very hard with user queries and other internal queries.
Forgive my ignorance as I'm a programmer being asked to fill a DBA's shoes, but what does a 100% fill factor do for performance? As I understand it, the FF is used to let sql server know when to split the page cache to make more room for incoming data. Does setting the FF to 100% mean "append only?"
Thanx,
--Frank
March 17, 2005 at 8:35 am
>>Does setting the FF to 100% mean "append only?"
No, it means that each index page (which for a clustered index also means data page) is completely full, with no room to add more rows. If you had to add another row to an existing page, Sql Server would have to perform an expensive page splitting operation to create 2 pages.
Hence the importance of new inserts being in ascending order of the clustered index - because they are added at the end of the index, they don't result in expensive page splits in the middle of the index. The 100% FF also means that since the pages are full, there is less disk I/O required, so performance increases.
March 17, 2005 at 8:36 am
SQL Server reads Extents which are nothing but eight Pages. The fuller the pages the less I/O it will have to perform to get the requested data.
HTH
* Noel
March 17, 2005 at 9:45 am
The fill factor only applies when the index is created. As inserts occur, this 'spare' space will get filled up until SQL Server is forced to make a page split. Hence, where there are inserts in the middle of the table, you need to reindex periodically so as to reset the fill factor space. The frequency of doing this is a matter of judgement depending on the actual usage encountered and the timing of when you can afford the performance hit of reindexing.
(incidentally, I also am a developer but getting a good appreciation of how the system operates is invaluable in designing efficient code - and makes one much more popular with 'real' DBAs !!)
March 17, 2005 at 2:03 pm
Thank you all - I'll apply these answers to the index this evening.
Stewart: Same here - definitely gaining an strong appreciation for sql server - it's not near the bear of a system I believed it to be. I'm glad to be walking in DBA shoes for a while - very humbling.
--Frank
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply