May 20, 2011 at 1:23 pm
Look at the DBCC CLEANTABLE to reclaim the space.
May 20, 2011 at 4:13 pm
TheSQLGuru (5/20/2011)
For THIS poster's table (which has PK on identity) I agreed that your statement was correct. But it is NOT generally applicable, and many people read these forums and a statement like that taken generally can be very problematic. I was addressing the cases where 100% fill factor on a clustered index can lead to horrible fragmentation on pure insert load.
Kevin,
All of my prior statements in this thread were directed toward the specific circumstances of the original poster, no others.
Your post above referred to the PK being on the IDENTITY column. The important point is that the clustered index is on the IDENTITY column, not that the PK is on the IDENTITY column. The PK doesn't have to be on the IDENTITY column or the clustered index, although if it were on a different column(s), fragmentation of it on record insertions might be realized. I wouldn’t want other readers of this forum to be confused by that statement.
Your phrase "pure insert load": Are you referring to an operation such as a Bulk Insert operation?
(Referring to the specific circumstances of the original poster) If so, and "IDENTITY INSERT" were set to "OFF", and FILL FACTOR were set to 100%, wouldn't record insertions always be going to the "front" of a clustered index, filling pages in the database? And wouldn't there be a low level of fragmentation on that index associated with the record insertions, since all newly filled pages would be full or close to full, depending upon the maximum density that could be achieved for a page?
LC
May 20, 2011 at 5:53 pm
Lee Crain (5/20/2011)
(Referring to the specific circumstances of the original poster) If so, and "IDENTITY INSERT" were set to "OFF", and FILL FACTOR were set to 100%, wouldn't record insertions always be going to the "front" of a clustered index, filling pages in the database? And wouldn't there be a low level of fragmentation on that index associated with the record insertions, since all newly filled pages would be full or close to full, depending upon the maximum density that could be achieved for a page?LC
I am not sure I understand what you are asking here. If you set IDENTITY INSERT to OFF, all this does is allow you to insert your own values for the identity column. If you do not specify a value for that column - SQL Server does not pick one that is 'available' and use it.
I also need to point out that the discussion concerning fill factor has left out one key point. If the table in question has any varchar columns, and those columns are updated with new values that increase the length of the column - that will introduce page splits and thus increase the fragmentation.
Put another way - just because you have an IDENTITY column as the clustered index on a table does not mean the table will never become fragmented. The fragmentation of the table will be determined by both updates and deletes - and inserts will be added to the logical end of the table.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 20, 2011 at 6:53 pm
Jeffrey Williams-493691 (5/20/2011)
Lee Crain (5/20/2011)
(Referring to the specific circumstances of the original poster) If so, and "IDENTITY INSERT" were set to "OFF", and FILL FACTOR were set to 100%, wouldn't record insertions always be going to the "front" of a clustered index, filling pages in the database? And wouldn't there be a low level of fragmentation on that index associated with the record insertions, since all newly filled pages would be full or close to full, depending upon the maximum density that could be achieved for a page?LC
I am not sure I understand what you are asking here. If you set IDENTITY INSERT to OFF, all this does is allow you to insert your own values for the IDENTITY column. If you do not specify a value for that column - SQL Server does not pick one that is 'available' and use it.
I also need to point out that the discussion concerning fill factor has left out one key point. If the table in question has any varchar columns, and those columns are updated with new values that increase the length of the column - that will introduce page splits and thus increase the fragmentation.
Put another way - just because you have an IDENTITY column as the clustered index on a table does not mean the table will never become fragmented. The fragmentation of the table will be determined by both updates and deletes - and inserts will be added to the logical end of the table.
If you set IDENTITY INSERT to OFF, it prevents insertion of your own values into the identity column. This is the default and I mentioned it to make sure the conditions for my question were clear.
It is my knowledge that values that are too large to be updated into a record are placed off record with a pointer to the location of the data in the record in place of the actual data. Correct me if I'm mistaken about this subject.
I did not mention UPDATE's or DELETE's in my last post. My specific question to Kevin referenced what he described as a "pure insert load".
LC
P.S. There may be some ambiguity in my post about my reference to the "front" of a clustered index. I am referring to the leading edge where new records are inserted.
May 20, 2011 at 7:05 pm
My mistake - I switched the ON/OFF for IDENTITY INSERT.
My statement about varchars does not have anything to do with LOB data. When you update a row - if any of the column lengths change, that row will have to be moved because it will no longer fit on the page. For example, let's say you have a varchar(50) column that when inserted is blank. Later, the users update the system and now that varchar has 25 characters - the page where that row is stored is full, the row is moved.
What Kevin is talking about is a table where only inserts are ever performed. No updates and no deletes. Purely inserts are performed on the table - think audit trail tables.
Only on a table where there is purely inserts being performed and you have a monotonically increasing value defined for the clustering key, will you be guaranteed that a 100% fill factor will not suffer index fragmentation.
As soon as you have any updates or deletes - fragmentation on the table is possible and will occur if the table was created or rebuilt with a 100% fill factor.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 20, 2011 at 8:50 pm
Jeffery,
I thought I read that in either SQL Server 2005 or 2008, updating records with fatter varchars no longer caused page splits. That may not be so but I seem to remember reading something about it. I've read so much in the last 3 years that some of it is starting to blur. Your statement, "...the page where that row is stored is full, the row is moved." Does that cause a page split or simply move that particular row off that page and leave a pointer to it, in its place? The latter action is what I thought I remembered reading about.
What Kevin is talking about is a table where only inserts are ever performed. No updates and no deletes. Purely inserts are performed on the table - think audit trail tables.
And that's what I was asking about. I couldn't see that even with a 100% fill factor, inserting new records would cause page splits.
As soon as you have any updates or deletes - fragmentation on the table is possible and will occur if the table was created or rebuilt with a 100% fill factor.
I am quite familiar with this subject and fully understand why this is so. The only exception I know of is performing record deletions starting with MIN(id) and moving forward with the intention of deleting the oldest records. I've automated jobs to delete 100's of thousands of records per day from certain tables. Sometimes, fragmentation goes up a little bit, sometimes it goes down a little bit, but mostly it hovers near zero. Over the last 18 months, all but one of the clustered indexes on these tables have never exceeded 6% fragmentation and only the same one of them has ever had to be rebuilt because it exceeded 20%, my intentional threshold. All of these tables have one thing in common: only inserts, and mass record deletions of the oldest records every day. The only table in this group that has been the exception has some UPDATE's performed.
Thanks for your response,
LC
June 13, 2011 at 1:11 pm
Jeffrey Williams-493691 (5/20/2011)I also need to point out that the discussion concerning fill factor has left out one key point. If the table in question has any varchar columns, and those columns are updated with new values that increase the length of the column - that will introduce page splits and thus increase the fragmentation.
Jeffrey,
Sorry I took so long to respond on this subject; I've been very busy.
I knew I'd read documentation to the contrary of your assertion. Adding data to VARCHAR columns does not cause page splits.
The following is from Books Online for SQL Server 2005. You can search for it in the Index under Data Storage, Pages:
"Large Row Support: Rows cannot span pages in SQL Server 2005, however portions of the row may be moved off the row's page so that the row can actually be very large. The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes (8 KB). However, this does not include the data stored in the Text/Image page type. In SQL Server 2005, this restriction is relaxed for tables that contain varchar, nvarchar, varbinary, or sql_variant columns. When the total row size of all fixed and variable columns in a table exceeds the 8,060 byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA allocation unit, starting with the column with the largest width. This is done whenever an insert or update operation increases the total size of the row beyond the 8060 byte limit. When a column is moved to a page in the ROW_OVERFLOW_DATA allocation unit, a 24-byte pointer on the original page in the IN_ROW_DATA allocation unit is maintained. If a subsequent operation reduces the row size, SQL Server dynamically moves the columns back to the original data page. For more information, see Row-Overflow Data Exceeding 8 KB."
LC
June 13, 2011 at 1:55 pm
Lee Crain (6/13/2011)
Jeffrey,Sorry I took so long to respond on this subject; I've been very busy.
I knew I'd read documentation to the contrary of your assertion. Adding data to VARCHAR columns does not cause page splits.
The following is from Books Online for SQL Server 2005. You can search for it in the Index under Data Storage, Pages:
"Large Row Support: Rows cannot span pages in SQL Server 2005, however portions of the row may be moved off the row's page so that the row can actually be very large. The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes (8 KB). However, this does not include the data stored in the Text/Image page type. In SQL Server 2005, this restriction is relaxed for tables that contain varchar, nvarchar, varbinary, or sql_variant columns. When the total row size of all fixed and variable columns in a table exceeds the 8,060 byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA allocation unit, starting with the column with the largest width. This is done whenever an insert or update operation increases the total size of the row beyond the 8060 byte limit. When a column is moved to a page in the ROW_OVERFLOW_DATA allocation unit, a 24-byte pointer on the original page in the IN_ROW_DATA allocation unit is maintained. If a subsequent operation reduces the row size, SQL Server dynamically moves the columns back to the original data page. For more information, see Row-Overflow Data Exceeding 8 KB."
LC
I believe that quote relates to LOB data only (e.g. varchar(max), nvarchar(max), etc...). It is easy enough to test and see what happens.
Create a table with an IDENTITY column as the clustered index (primary key) with a second column for the data defined as varchar(200). Populate the table with each varchar column set to 100 characters.
Rebuild the index after inserting the rows so you have 0 fragmentation and each page is filled.
Then, update one of the rows and check to see how many pages you have and the fragmentation on the table.
You will see that you have increased by 1 page - fragmentation has increased and you do not have any ROW_OVERFLOW_DATA related to the table.
I used the following quick test:
Use tempdb;
Create Table splitTest (
id int identity Primary Key Clustered
,data varchar(200));
Go
Truncate Table dbo.splitTest;
Go
Insert Into dbo.splitTest
Select replicate('x', 100)
From dbo.Tally t
Where n <= 8000;
Go
Alter Index All On dbo.splitTest Rebuild;
Go
Declare @tableID int;
Set @tableID = object_id('dbo.splitTest');
Select *
From sys.dm_db_index_physical_stats(db_id('tempdb'), @tableID, Null, Null, 'Detailed');
Go
Update dbo.splitTest
Set data = replicate('x', 200)
Where id = 1;
Go
Declare @tableID int;
Set @tableID = object_id('dbo.splitTest');
Select *
From sys.dm_db_index_physical_stats(db_id('tempdb'), @tableID, Null, Null, 'Detailed');
Go
Drop Table dbo.splitTest;
Go
You will need a Tally table in tempdb - or change the insert to use your own.
On my test, I get 116 pages after the insert and 117 pages after the update - and all levels are IN_ROW_DATA.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 13, 2011 at 4:51 pm
Jeffrey,
I've read your email and it makes sense. However, I can't execute your code and I'd like to.
It is not intuitively obvious to me what a Tally table is. Please explain.
LC
June 13, 2011 at 4:59 pm
Jeffrey,
I won't be convinced until I execute your code myself.
Reading further in Books Online, in the "Row-overflow data" index listing:
"Surpassing the 8,060-byte row-size limit might affect performance because SQL Server still maintains a limit of 8 KB per page. When a combination of varchar, nvarchar, varbinary, sql_variant, or CLR user-defined type columns exceeds this limit, the SQL Server Database Engine moves the record column with the largest width to another page in the ROW_OVERFLOW_DATA allocation unit, while maintaining a 24-byte pointer on the original page. Moving large records to another page occurs dynamically as records are lengthened based on update operations. Update operations that shorten records may cause records to be moved back to the original page in the IN_ROW_DATA allocation unit. Also, querying and performing other select operations, such as sorts or joins on large records that contain row-overflow data slows processing time, because these records are processed synchronously instead of asynchronously.
Therefore, when you design a table with multiple varchar, nvarchar, varbinary, sql_variant, or CLR user-defined type columns, consider the percentage of rows that are likely to flow over and the frequency with which this overflow data is likely to be queried. If there are likely to be frequent queries on many rows of row-overflow data, consider normalizing the table so that some columns are moved to another table. This can then be queried in an asynchronous JOIN operation.
The length of individual columns must still fall within the limit of 8,000 bytes for varchar, nvarchar, varbinary, sql_variant, and CLR user-defined type columns. Only their combined lengths can exceed the 8,060-byte row limit of a table."
The last sentence obviously does not apply to VARCHAR(MAX) or VARBINARY(MAX) data types, since by definition, they have no 8,000 byte limitation.
I'd like to execute your SQL code and evaluate the results for myself. Please explain how to create a Tally table and how it should be populated.
LC
June 13, 2011 at 5:01 pm
It is just a numbers table.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 13, 2011 at 5:11 pm
June 13, 2011 at 11:29 pm
give it a try to see the correct usage of the table.
USE <DBName>
GO
DBCC UPDATEUSAGE (<DBName>,"SchemaName.TableName");
GO
and then run sp_spaceused
June 14, 2011 at 4:25 pm
Thanks, Perry, for pointing me toward that article. And thanks, Jeff Moden, for writing the article on Tally tables. I'll be using that information in the future.
Jeffrey, I created a Tally table and executed your query. If what seems like an intuitive interpretation of the results it produced is correct, then for Index Level 0, the VARCHAR column with an increased size is IN_ROW_DATA and we now have fragmentation in the table; a page split has occurred. I see the results: 116 pages before the UPDATE, 117 pages afterward. "alloc_unit_type_desc" says plainly "IN_ROW_DATA". But there are some oddities about the results.
If you execute the query consecutively several times, the following columns' values change from execution to execution:
1. "avg_fragmentation_in_percent"
2. "fragment_count"
3. "avg_fragment_size_in_pages"
We are not getting consistent results for each execution of the query. Why?
And why does Books Online make an unambiguous statement about the VARCHAR and VARBINARY data types that says plainly that columns that increase in size during an UPDATE are moved out of row with a pointer to them? It is specifically NOT referring to the (MAX) versions of VARCHAR and VARBINARY.
On the surface, I'd say I agree with your assessment. But under the surface, I'm not clear that we understand what is actually happening during the UPDATE statement. And, ultimately, I don't agree with you. In fact, I'm now agnostic on the subject; I don't know what the correct assessment is.
From my perspective, we appear to have suspect and/or conflicting information.
LC
June 14, 2011 at 4:54 pm
I am still researching this topic. From Books Online, index listing for "IN_ROW_DATA allocation unit":
"An Index Allocation Map (IAM) page maps the extents in a 4-gigabyte (GB) part of a database file used by an allocation unit. An allocation unit is one of three types:
IN_ROW_DATA - Holds a partition of a heap or index.
LOB_DATA - Holds large object (LOB) data types, such as xml, varbinary(max), and varchar(max).
ROW_OVERFLOW_DATA - Holds variable length data stored in varchar, nvarchar, varbinary, or sql_variant columns that exceed the 8,060 byte row size limit."
.....
ROW_OVERFLOW_DATA Allocation Unit
For every partition used by a table (heap or clustered table), index, or indexed view, there is one ROW_OVERFLOW_DATA allocation unit. This allocation unit contains zero (0) pages until a data row with variable length columns (varchar, nvarchar, varbinary, or sql_variant) in the IN_ROW_DATA allocation unit exceeds the 8 KB row size limit. When the size limitation is reached, SQL Server moves the column with the largest width from that row to a page in the ROW_OVERFLOW_DATA allocation unit. A 24-byte pointer to this off-row data is maintained on the original page.
Text/Image pages in the ROW_OVERFLOW_DATA allocation unit are managed in the same way pages in the LOB_DATA allocation unit are managed. That is, the Text/Image pages are managed by a chain of IAM pages.
LOB_DATA Allocation Unit
When a table or index has one or more LOB data types, one LOB_DATA allocation unit per partition is allocated to manage the storage of that data. The LOB data types include text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), and CLR user-defined types.
_____________
I tried an experiment. I increased the number of rows to 200,000. This produced 3 levels of indexes. The UPDATE statement execution left the "alloc_unit_type_desc" column designation as IN_ROW_DATA for all 3 levels of the indexes.
The documentation that I copied into this post, above, would seem to conflict with the results produced by the query. If the UPDATE of a row's VARCHAR column lengthened it, then why wasn't a 4th index level marked with the ROW_OVERFLOW_DATA designator? Or maybe this allocation unit is invisible to the "sys.dm_db_index_physical_stats" function.
The documentation plainly states that "For every partition used by a table (heap or clustered table), index, or indexed view, there is one ROW_OVERFLOW_DATA allocation unit. This allocation unit contains zero (0) pages until a data row with variable length columns (varchar, nvarchar, varbinary, or sql_variant) in the IN_ROW_DATA allocation unit exceeds the 8 KB row size limit.".
LC
Viewing 15 posts - 31 through 45 (of 48 total)
You must be logged in to reply to this topic. Login to reply