November 5, 2008 at 7:29 am
Under what circumstances could you justify/allow a table to be a heap ? a temporary dumping ground that was written to far more than it was read from for example ? Or is there no reason at all why a table shouldn't have at least one index ?
thanks ~simon
November 5, 2008 at 7:52 am
IMHO every physical table should have a clustered index. It is the absence of a clustered index that makes a table a heap. If there is no suitable (i.e., unique, seldom changing, small size) column then I add an Integer or Big Integer column with the Identity property enabled and use that column as the clustered index.
Note, the clustered index does not have to be the primary key but also the Natural Primary key does not have to be used as the actual Primary Key of the table. You can specify that the added column is the actual primary key and then add a Unique constraint for the Natural Key. This is useful when the natural key is an unwieldy combination of columns, may change frequently, is not naturally incrementing, etc.
Specifying a Unique index on the Natural Key means you can still define Foreign Keys that reference the column(s) and you can join on them as well. Using the Int/BigInt as the Primary Key and Clustered Index saves space in other indexes, saves space in the base table (it is used more efficiently), and probably improves performance on inserts.
YMMV 🙂
November 5, 2008 at 8:48 am
Hi Simon
It entirely depends upon the table purpose 😀
November 5, 2008 at 8:54 am
hiyah
so what purpose would you say yes its acceptable to be a heap ? 🙂
November 5, 2008 at 10:16 am
Simon_L (11/5/2008)
a temporary dumping ground that was written to far more than it was read from for example ?
Something that's written to frequently is a bad candidate for a heap because of the forwarding pointers that a heap can have. forwarding pointer occurs when a row is updated and doesn't fit on the page any longer. The row is moved and a pointer is left in its place showing where the row has moved to.
The nice thing is that it means the nonclustered indexes don't have to be updated when the row moves, the downside is that it means to read that row requires 2 random IOs, which, on a large table with lots of forwarding pointers is horrendously expensive in terms of IO.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 5, 2008 at 9:12 pm
Simon_L (11/5/2008)
Under what circumstances could you justify/allow a table to be a heap ? a temporary dumping ground that was written to far more than it was read from for example ? Or is there no reason at all why a table shouldn't have at least one index ?thanks ~simon
I'm a great believer that most tables must have a PK and a clustered index on something temporal... but the real answer is "It Depends". If it's just a staging table for imported data, and depending on how you're using it, it's sometimes better to have it with no clustered index or a PK...
I also have HUGE "nearly" online archive tables... all are heaps... none have indexes... they are there just to have the data available for emergencies and getting the data normally isn't an "instant" requirement of archived data.
Ok... why do I do that? Like I said, these tables are bloody huge... keeping indexes on them for every query that someone may want to use someday (maybe, once a year) is just too expensive disk-size-wise. I saved nearly 100GB just by dropping the original indexes from the archive tables.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 7, 2008 at 9:21 am
Archive tables with little or no frequent access afterwards are probably the only time I'd avoid an index. Should I have to retrieve data from it in some manner, I could create a temporary clustered index based on the query I'm going to run, do my retrieval, and drop the index immediately (don't forget shrinkdatabase if space is getting tight).
Also, especially with archive tables that developers have forgetten about but everyone is scared to drop, even though space is getting tight, you may have to forego any index to reduce space usage.
Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein
November 7, 2008 at 10:19 am
Gaby A. (11/7/2008)
Archive tables with little or no frequent access afterwards are probably the only time I'd avoid an index. Should I have to retrieve data from it in some manner, I could create a temporary clustered index based on the query I'm going to run, do my retrieval, and drop the index immediately (don't forget shrinkdatabase if space is getting tight).Also, especially with archive tables that developers have forgetten about but everyone is scared to drop, even though space is getting tight, you may have to forego any index to reduce space usage.
even then a clustered index doesn't hurt because it doesn't take up extra space.
~BOT
Craig Outcalt
November 7, 2008 at 5:34 pm
SQLBOT (11/7/2008)
even then a clustered index doesn't hurt because it doesn't take up extra space.~BOT
Any bet's on that? 😉
USE TempDB
GO
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
--===== Measure the table size before adding a clustered index
DBCC UPDATEUSAGE (0,'dbo.JBMTest')
EXEC dbo.sp_SpaceUsed 'dbo.JBMTest'
--===== Measure the table size after adding a clustered index
CREATE CLUSTERED INDEX IXC_JBMTest_RowNum ON dbo.JBMTest (RowNum)
DBCC UPDATEUSAGE (0,'dbo.JBMTest')
EXEC dbo.sp_SpaceUsed 'dbo.JBMTest'
DROP TABLE dbo.JBMTest
--Jeff Moden
Change is inevitable... Change for the better is not.
November 7, 2008 at 6:08 pm
Ok, Jeff proved that the table is larger with an index than without, but why would that be true of a clustered index? Doesn't a clustered index primarily order the data already existing by the key and not create new data to be stored?
And to answer the original question, I often leave small tables in a heap that are created as temporary storage places for imported data. That data then gets beaten into shape and validated before being put into the (indexed and partially normalized) primary tables. Then the import tables are dropped.
---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
November 7, 2008 at 7:14 pm
Because, contrary to popular belief, Clustered Indexes are NOT totally contained in the data at the Leaf level... there is an Intermediate Level containing index rows AND a Root Level, also containing index rows. The extra bytes come from them. Lookup "Clustered Indexes, Architecture" in Books Online to see more.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 8, 2008 at 12:53 am
That makes complete sense. Thanks.
---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
November 8, 2008 at 7:47 am
You bet... thanks for the feedback, Timothy.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 8, 2008 at 8:05 am
LOL,
yes, yes... touche
In those cases when I'm strapped for extra 208k of disk, I usually drop the clustered index.
~BOT
Craig Outcalt
November 8, 2008 at 9:00 am
http://blogs.sqlserver.org.au/blogs/greg_linwood/archive/2006/09/25/387.aspx
The first and second articles offer the bulk of the argument. Good reading regardless of whether you agree.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply