March 4, 2004 at 7:42 am
Hello all (I posted this under the Performance folder also):
I have a question relating to the efficiency of non-clustered index reads as they relate to both Clustered and Heap tables.
I have a table with over a billion rows. Yes, that’s right… Billion with a “B.” This table is inside of a data warehouse and is appended to every night during the nightly warehouse load.
Currently the table is clustered on a non-unique key and has two non-clustered keys.
My question is “Is it more efficient on large tables to maintain a heap table with all non-clustered indexes?”
I ask this specifically because if a non-clustered index’s row locator is a pointer to the row then by transversing the non-clustered index we can go directly to the data rows in question verses getting the clustered key and then transversing the clustered index into to the data.
It sounds right to me but there may be something I don’t know about the internal access mechanisms of SQL Server.
This is directly out of BOL.
“The structure of the row locator depends on whether the data pages are stored in a heap or are clustered. For a heap, a row locator is a pointer to the row. For a table with a clustered index, the row locator is the clustered index key.”
Any comments?
*****************************************************************
Below are BOL extracts relating to Heap Tables and Indexes
Heaps have one row in sysindexes with indid = 0. The column sysindexes.FirstIAM points to the first IAM page in the chain of IAM pages that manage the space allocated to the heap. Microsoft® SQL Server™ 2000 uses the IAM pages to navigate through the heap. The data pages and the rows within them are not in any specific order, and are not linked together. The only logical connection between data pages is that recorded in the IAM pages.
Table scans or serial reads of a heap can be done by scanning the IAM pages to find the extents holding pages for the heap. Because the IAM represents extents in the same order they exist in the data files, this means that serial heap scans progress uniformly down each file. This is more efficient than following the data page chains used in earlier versions of SQL Server, in which the data page chain often takes a somewhat random path through the files of a database. Using the IAM pages to set the scan sequence also means that rows from the heap are not typically returned in the order in which they were inserted.
Index Allocation Map (IAM) pages map the extents in a database file used by a heap or index. IAM pages also map the extents allocated to the ntext, text, and image page chain for any table that has columns of these types. Each of these objects has a chain of one or more IAM pages recording all the extents allocated to it. Each object has at least one IAM for each file on which it has extents. They may have more than one IAM on a file if the range of the extents on the file allocated to the object exceeds the range that an IAM can record.
IAM pages are allocated as needed for each object and are located randomly in the file. sysindexes.dbo.FirstIAM points to the first IAM page for an object, and all the IAM pages for that object are linked in a chain.
An IAM page has a header indicating the starting extent of the range of extents mapped by the IAM. The IAM also has a large bitmap in which each bit represents one extent. The first bit in the map represents the first extent in the range, the second bit represents the second extent, and so on. If a bit is 0, the extent it represents is not allocated to the object owning the IAM. If the bit is 1, the extent it represents is allocated to the object owning the IAM page.
When Microsoft® SQL Server™ 2000 needs to insert a new row and no space is available in the current page, it uses the IAM and PFS pages to find a page with enough space to hold the row. SQL Server uses the IAM pages to find the extents allocated to the object. For each extent, SQL Server searches the PFS pages to see if there is a page with enough free space to hold the row. Each IAM and PFS page covers a large number of data pages, so there are few IAM and PFS pages in a database. This means that the IAM and PFS pages are generally in memory in the SQL Server buffer pool, so they can be searched quickly.
SQL Server allocates a new extent to an object only when it cannot quickly find a page in an existing extent with enough space to hold the row being inserted. SQL Server allocates extents from those available in the filegroup using a proportional allocation algorithm. If a filegroup has two files, one of which has twice the free space of the other, two pages will be allocated from the file with more empty space for every one page allocated from the other file. This means that every file in a filegroup should have a similar percentage of space used.
Microsoft® SQL Server™ 2000 supports indexes defined on any column in a table, including computed columns.
If a table is created with no indexes, the data rows are not stored in any particular order. This structure is called a heap.
The two types of SQL Server indexes are:
Clustered indexes sort and store the data rows in the table based on their key values. Because the data rows are stored in sorted order on the clustered index key, clustered indexes are efficient for finding rows. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order. The data rows themselves form the lowest level of the clustered index.
The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. If a table has no clustered index, its data rows are stored in a heap.
Nonclustered indexes have a structure completely separate from the data rows. The lowest rows of a nonclustered index contain the nonclustered index key values and each key value entry has pointers to the data rows containing the key value. The data rows are not stored in order based on the nonclustered key.
The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or are clustered. For a heap, a row locator is a pointer to the row. For a table with a clustered index, the row locator is the clustered index key.
The only time the rows in a table are stored in any specific sequence is when a clustered index is created on the table. The rows are then stored in sequence on the clustered index key. If a table only has nonclustered indexes, its data rows are stored in a unordered heap.
Indexes can be unique, which means no two rows can have the same value for the index key. Otherwise, the index is not unique and multiple rows can share the same key value.
There are two ways to define indexes in SQL Server. The CREATE INDEX statement creates and names an index. The CREATE TABLE statement supports the following constraints that create indexes:
When you create an index on SQL Server 2000, you can specify whether the keys are stored in ascending or descending order.
SQL Server 2000 supports indexes defined on computed columns, as long as the expression defined for the column meets certain restrictions, such as only referencing columns from the table containing the computed column, and being deterministic.
A fill factor is a property of a SQL Server index that controls how densely the index is packed when created. The default fill factor usually delivers good performance, but in some cases it may be beneficial to change the fill factor. If the table is going to have many updates and inserts, create an index with a low fill factor to leave more room for future keys. If the table is a read-only table that will not change, create the index with a high fill factor to reduce the physical size of the index, which lowers the number of disk reads SQL Server uses to navigate through the index. Fill factors are only applied when the index is created. As keys are inserted and deleted, the index will eventually stabilize at a certain density.
Indexes not only speed up the retrieval of rows for selects, they also usually increase the speed of updates and deletes. This is because SQL Server must first find a row before it can update or delete the row. The increased efficiency of using the index to locate the row usually offsets the extra overhead needed to update the indexes, unless the table has a lot of indexes.
This example shows the Transact-SQL syntax for creating indexes on a table.
USE pubs
GO
CREATE TABLE emp_sample
(emp_id int PRIMARY KEY CLUSTERED,
emp_name char(50),
emp_address char(50),
emp_title char(25) UNIQUE NONCLUSTERED )
GO
CREATE NONCLUSTERED INDEX sample_nonclust ON emp_sample(emp_name)
GO
Deciding which particular set of indexes will optimize performance depends on the mix of queries in the system. Consider the clustered index on emp_sample.emp_id. This works well if most queries referencing emp_sample have equality or range comparisons on emp_id in their WHERE clauses. If the WHERE clauses of most queries reference emp_name instead of emp_id, performance could be improved by instead making the index on emp_name the clustered index.
Many applications have a complex mix of queries that is difficult to estimate by interviewing users and programmers. SQL Server 2000 provides an Index Tuning Wizard to help design indexes in a database. The easiest way to design indexes for large schemas with complex access patterns is to use the Index Tuning Wizard.
You provide the Index Tuning Wizard with a set of SQL statements. This could be a script of statements you build to reflect a typical mix of statements in the system, but it is usually a SQL Profiler trace of the actual SQL statements processed on the system during a period of time that reflects the typical load on the system. The Index Tuning Wizard analyzes the workload and the database, and then recommends an index configuration that will improve the performance of the workload. You can choose to either replace the existing index configuration, or to keep the existing index configuration and implement new indexes to improve the performance of a slow-running subset of the queries.
March 4, 2004 at 7:49 am
Sorry, but what exactly is now your question?
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
March 4, 2004 at 7:58 am
Is it more efficient on large tables to maintain a heap table with all non-clustered indexes than to maintain a clustered table with non-clustered indexes?
March 4, 2004 at 8:09 am
Who knows?
One thing that comes to mind is that the clustered index pages must be loaded into memory when using a nonclustered index.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
March 5, 2004 at 2:43 am
You can't remove leaf level fragmentation from a table without a clustered index.
Only a covered index will bypass the actual table for data recovery.
A nonclustered index will require more i/o than a clustered index for a retrieval.
That aside do you have a performance issue or a particular problem with your table?
You might want to consider partitioning if you do.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
March 6, 2004 at 6:40 am
Frank, the clustered index is part of the non-clustered index, it doesn't touch the clustered table/index if the query involves ONLY the clustered key and any NCI'ed column. It doesn't always have to hit the clustered table, the clustered key column(s) is not on seperate pages from the table
Roberts,
"A nonclustered index will require more i/o than a clustered index for a retrieval"
Not sure what you meant here, if your NCI is covering you could potentially get exponentially better performance versus hitting the table itself.
Poster, best command to play with and get familiar with is "set statistics IO on" which will allow you to see the actual logical IO (number of times SQL hits a page)
HTH
------------
Ray Higdon MCSE, MCDBA, CCNA
March 8, 2004 at 2:55 am
- Keep in mind that your clustering-index may be uniqueified by sqlserver itself if it is not a unique key _and_ it is added to all nci ! So keep it as small as possible.
- I've noticed that there were issues regarding pagemanagement with heaps (not freeing them correct) with SQL7 (heavy insert/update/delete(no truncate)). Now all our tables have clustering indexes and so I didn't follow it up with sql2k.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
March 8, 2004 at 3:12 am
Heaps still fragment at the leaf level if data changes, however, technically a Heap is generally best for bulk data import, once you need to update or delete the data the Heap structure is not so useful.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
March 8, 2004 at 3:20 am
I agree with Colin. Since 7.0 heaps were built to reclaim space, that means SQL will hunt out free space and try to insert new data where a page has room.
------------
Ray Higdon MCSE, MCDBA, CCNA
March 8, 2004 at 3:31 am
I know that was the goal, but having a 48Kb table using +500Mb is what happened . After creating the CI sizes were back to normal.
Choosing a good CI is a heavy task, especialy with DWH because of the impact with huge volumes.
Playing around with it is a must because there is no straight line to walk on, there are only hints and tips.
It will always be a tradeoff.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
March 8, 2004 at 3:40 am
I was trying to convey that was a bad thing alzdba, I should have mentioned I agreed with you too. Anytime SQL has to perform a lot of logic before an insert especially in regards to IO, it is a bad thing.
------------
Ray Higdon MCSE, MCDBA, CCNA
March 8, 2004 at 3:52 am
Correct.
no offence meant , just adding the symptoms.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply