Performance Suffers after adding column

Question

Post reply

Performance Suffers after adding column

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Frank Kalis SSC Guru Points: 111183 More actions · Answer 1

I don't like your sarcasm here and I didn't want to get personell.

While you made a good observation on this CLUSTERED PRIMARY KEY, it doesn't change a thing.

Run this script:

USE tempdb

CREATE TABLE foolish_consultant

(

c1 VARCHAR(1000)

)

INSERT INTO foolish_consultant

SELECT 'abc1234567890'

UNION ALL

SELECT 'cde'

DECLARE @page_adress AS BINARY(6)

SELECT

@page_adress = [first]

FROM

sysindexes

WHERE

[id] = OBJECT_ID('foolish_consultant')

SELECT

CAST

(

SUBSTRING(@page_adress, 6, 1) +

SUBSTRING(@page_adress, 5, 1)

AS INT

&nbsp AS file#

, CAST

(

SUBSTRING(@page_adress, 4, 1) +

SUBSTRING(@page_adress, 3, 1) +

SUBSTRING(@page_adress, 2, 1) +

SUBSTRING(@page_adress, 1, 1) AS INT

) AS page#

DBCC TRACEON(3604)

DBCC PAGE('tempdb',1,29,3)

UPDATE foolish_consultant SET c1 = '0987654321cba'

DBCC PAGE('tempdb',1,29,3)

ALTER TABLE foolish_consultant

ADD c2 VARCHAR(1000)

GO

DBCC PAGE('tempdb',1,29,3)

UPDATE foolish_consultant SET c2 = 'So what?'

DBCC PAGE('tempdb',1,29,3)

DROP TABLE foolish_consultant

Modify the pages for DBCC PAGE to meet your values and examine closely the output of DBCC PAGE!

Again, I would really be interesting in an explanation why someone thinks data in SQL Server spans over multiple pages or a row spans over multiple slots.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 2

Shawn, Frank is absolutely correct in saying that all the columns of a row are always stored together, whether or not the table is a heap and has any non-clustered indexes or not. I will cut and paste some from the thread for discussion here.

If the table is a heap (no indexes) then the new column will be stored separately from the rest of the row, resulting in extra IO.

This was your original statement, which you later followed up with:

[Books Online url] "If a table is created with no indexes, the data rows are not stored in any particular order. This structure is called a heap."

...

It does go on to say a heap can have non-clustered indexes but the first statement backs up my original post about a heap.

This statement from Books Online in no way says that the columns of a row can be stored apart from eachother. It simply says that for a heap the rows of the table are stored in a seemingly random order. In other words, if you start from the first row and move to the next one, then that row could be any row in the table, regardless of which keys are defined. When a row is created it is simply stored on any page (owned by that table) where there is room, no matter which other rows are already stored there.

And it really means any table without a clustered index, but whether it has any non-clustered indexes or not makes no difference, it is still a heap and functions the same way.

It says nothing about how the columns of each row are stored, and the reason for that is that they are always stored together on one row, on one page*.

Given there is no reason (such as an index) for SQL to keep the data together, then it almost becomes obvious.

I do not see why an index would make any difference as to whether or not SQL Server would store the data together. In your first post you said yourself that storing the extra column separately would require extra I/O. Is that not reason enough not to do it? And on the other hand, what reason would there be to store it separately?

Now, back to what I said in my previous post (which I assume was missed due to the heated debate). I am not sure that the MS consultant is wrong on this, it might be that you misunderstood him (which of course would mean he misunderstood your answer to mean what he actually expected). Either that, or he is just wrong. Anyway, like I said, I think that he was thinking of forward-pointers. Read more about them in my two articles Cluster that index! and Cluster that index - Part 2 (which is actually part 3...)

In short we can say that forward-pointers are used in heaps when a row needs to be moved since it will no longer fit on the row where it is stored. This can be due to a varying size column changing size, or a new column being added to the table (and thereby each row), or something similar. In any case there is now a problem. Either all non-clustered indexes must be updated, since the RIDs (row identifiers that are made up of file#, page# and slot#) are now incorrect with the row moving to a new page. This is of course not an acceptable alternative, so instead a forward-pointer is placed at the old location of the row (where the RIDs are pointing) and the non-clustered indexes do not need to be updated.

However, this can be bad for performance (especially for table scans) since SQL Server will need to follow the pointers back and forth to fetch the rows. It is therefore a good recommendation to check heap tables for forward-pointers (particularly after an operation that might have caused them) and get rid of them by either shrinking the file or by creating a clustered index (which can then be dropped if not needed). Both of these will 'defrag' the table since the rows will be reorganised and the forward-pointers removed.

In any case forward-pointers always point to the entire row, with all columns stored together.

* The exceptions to storing all columns on one page is BLOB data in SQL Server 2000, and in SQL Server 2005 row-overflow data caused by multiple large varying size fields. See this entry in my blog for more info on the latter.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 3

I just remembered that the original question of the thread has not really been answered yet.

Rich, does your Product table have a clustered index? I assume your query looks something like "WHERE SearchTerms LIKE '%glue%'", which would mean a table scan is necessary to find the matching rows, unless there is an index on SearchTerms. Assuming this, plus that the table does not have a clustered index, indicates that forward-pointers might very well be the problem in this case. See my two articles linked to above for information on what these are and how to check for them.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Don Guillermo des Las Colinas SSC Veteran Points: 206 More actions · Answer 4

Thanks for the recall Steve. I was catching up on all the posts when I got in this morning.

I chose "Manage Indexes" off the menu on my Products table for the following info:

Only one entry exists (PK_Product) and it reads "No" in the Clustered Column. There are 2 columns listed in the Column field.

There are only 56,509 rows in this table and of these only 1,895 have anything in the new column.

The code used to search this field is using "LIKE '%glue%" in the where clause.

I'll check out those articles you referred to Chris. Thnks!

Rich

gbn SSCertifiable Points: 6429 More actions · Answer 5

I've been busy and just getting time to post now.

Sorry folks, I'm wrong.

I misunderstood the consultant and should have known better anyway.

After my first post, I went on the defensive and then any reasoning on my part went out of the window. And the hole just kept on getting deeper...

I got mixed up with forward pointers (as Chris H posted). I remember now that it was one of the differences between SQL 6.5 and 7 in how updates could be handled. Also, rows must stay together because of the 8060 byte row size limit to fit within an 8k page (except BLOB data of course).

The consultant has given no wrong information, it was my fault that I've wrongly assumed something despite the fact I should have known better.

Please accept my apologies for wasting your time.

regards

Shawn

Frank Kalis SSC Guru Points: 111183 More actions · Answer 6

Now it's my turn to apologize. My wording was a bit inappropriate, I guess

Btw, is the original question finally answered?

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

Hey Frank... I think we all used words we didn't mean at one time or another...

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 8

Hey Shawn, I don't think any time was wasted. It is always nice with a good debate that really has people getting down into the internals of SQL Server to prove their case. I know I learn a lot from those, both by reading and by posting, and I hope others do as well.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/