November 16, 2011 at 7:39 am
Hugo Kornelis (11/16/2011)
vk-kirov (11/16/2011)
...but the question has a minor issue. When I created the QOD database with the DemoTable table and tried to rebuild the index, I got the following error message:The statement has been terminated.
Msg 1101, Level 17, State 12, Line 1
Could not allocate a new page for database 'QOD' because of insufficient disk space in filegroup 'PRIMARY'. Create the necessary space by dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.
#$%#$^& !!!
You won't believe the time I spent going over my calculations again and again, to make sure I didn't mess this one up - and then I forget that rebuilding the index will temporarily require double the size. In a few hours, my mailbox will flood with ne wpost notifications of the "I want my point back" type. (kicks self)
OK, so I am not going crazy then. I calculated that it would be over 1 GB for the data size, realized that there was no way that a reindex on that would work, and picked the error. When I saw it was wrong, I scratched my head. Oh, well, mistakes happen, and in principle this is a very useful exercise as all forms of DBAs, developers, infrastructure, production, all of us need to be able to calculate consumed space. So consider this an encouraging vote.
November 16, 2011 at 8:00 am
I'd like more questions like this. One of the biggest technical leaps I made as a DBA was reading the "Inside" book series & learning HOW sql server does it's storage magic. It really helped me think better about the impact of my designs and queries, even for simple things like why using appropriate datatypes will/may results in fewer pages, and why this matters when saving & querying.
It's a big topic, but it's something I think is really helpful to know.
These types of questions are a great refresher to me on the calculations and thought process, and I think it will help others understand the guts of sql better.
November 16, 2011 at 8:37 am
jeff.mason (11/16/2011)
Hugo Kornelis (11/16/2011)
vk-kirov (11/16/2011)
...but the question has a minor issue. When I created the QOD database with the DemoTable table and tried to rebuild the index, I got the following error message:The statement has been terminated.
Msg 1101, Level 17, State 12, Line 1
Could not allocate a new page for database 'QOD' because of insufficient disk space in filegroup 'PRIMARY'. Create the necessary space by dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.
#$%#$^& !!!
You won't believe the time I spent going over my calculations again and again, to make sure I didn't mess this one up - and then I forget that rebuilding the index will temporarily require double the size. In a few hours, my mailbox will flood with ne wpost notifications of the "I want my point back" type. (kicks self)
OK, so I am not going crazy then. I calculated that it would be over 1 GB for the data size, realized that there was no way that a reindex on that would work, and picked the error. When I saw it was wrong, I scratched my head. Oh, well, mistakes happen, and in principle this is a very useful exercise as all forms of DBAs, developers, infrastructure, production, all of us need to be able to calculate consumed space. So consider this an encouraging vote.
Ditto on what Jeff said! 😉
Hugo - thank you for this question. like many others here this morning, I enjoyed this question and although I too selected the ERROR for the answer, I totally get what your intentions were and have benefited from the experience today.
Cheers mate! 🙂
November 16, 2011 at 8:42 am
Hugo Kornelis (11/16/2011)
I do seriously like the spirit of the question. Just suprised at the lack of detail in certain aspects. Considering how much attention to detail Hugo usually pays to others QOD's I was suprised. Others have pointed out that some of your statement also contradict ones the articles they reference as do the numbers you have in your math. I can see around that though becuase a SQL data page has been the same size for a long time and your table rows fit.
The primary thing this question is missing from a real world size calculations perspective is if the DB was created with FULL or BULK as the option for logging. Was the log file set to auto-grow? How often is the tran log is backed up?
The calculations on size requirements the DB recovery option, Tran log backup policy, and the size of the logged transactions are even more important that the ones Hugo brings up when determining the amount of space needed to support an OLTP database with 2 million rows in a single table, a char row of random values for a primary key (instead of a finding a candidate key in the data), and other items that would cuase this DB to be costly.
Number one disk space abuser in my world is Databases created with Full recovery model and no Tran log backup or DB backup jobs. It is amazing to me that I have yet to find an environment that does not have at least one database like this.
Also one last note: Not certain if Auto create Statistics was left on or off so I will believe it was the default or on. The tables Primary Key Clustered Index is not at all related to the data stored in the table or to the 2 Million new rows being inserted into it. The stats that would exist before and after the Index rebuild would require more disc space than a good Non-Clustered index. A proper Table Primary Key created from two or more candidate key columns would also create a Clustered Index that could reduce the true space required by this table and all the data structures that it has. The sys.dm_db_missing_index_details management view can be used after the initial data insert is completed to find out exactly what a good non-clustered index would need to be. If you want to find what the table candidate keys should be so you can have the best of both worlds use the sp_special_columns to find information about the column that uniquely identifies rows in the table. This would only work if you had a version of the table with data in it that did not have the current random char(10) value as a Primary Key.
November 16, 2011 at 9:01 am
Again, thanks for all the nice and encouraging comments. I have now decided to continue with the series. Sorry for those who did not like it and would rather not see it continue; based on the feedback so far, you are in the minority.
SanDroid (11/16/2011)
The main thing this question is missing is if the DB was created with FULL or BULK as the option for logging.
How exactly is that relevant for the answer? No errors due to a full log are possible, as the log file is specified to be "sufficently large".
Some of your statements contradict ones the articles they reference.
Ouch, that is a very embarassing mistake. Can you please elaborate on which of my statements you mean, and which articles they contradict? I thought I checked everything very carefully; if I was wrong, I'd like to know before I start working on the next similar question.
November 16, 2011 at 9:29 am
Hugo Kornelis (11/16/2011)
SanDroid (11/16/2011)
The main thing this question is missing is if the DB was created with FULL or BULK as the option for logging.
How exactly is that relevant for the answer? No errors due to a full log are possible, as the log file is specified to be "sufficently large".
Some of your statements contradict ones the articles they reference.
Ouch, that is a very embarassing mistake. Can you please elaborate on which of my statements you mean, and which articles they contradict? I thought I checked everything very carefully; if I was wrong, I'd like to know before I start working on the next similar question.
I was editing my original post while you read and resonded to this. The original post did not comunicate what I intended.
EDIT: With that being said. You can not create a Primary Key on a table without a CLUSTERED INDEX being creted for it. This is AutoMagic in SQL server. You allude in your explination that the Index is Clustered because it is not specified as NON-Clustered. There is a link to this information in your reference on primary key constraints http://msdn.microsoft.com/en-us/library/ms175132.aspx
November 16, 2011 at 10:04 am
I like the question but the answer says:
Since 200,474 pages equates to about 1.53 GB, you will not get error 1105.
I ran this code and I got error 1105 which I would not have received had I allocated more than 2G or allowed autogrowth but we were specifically told not to do this. The page count only goes down once the data is compressed.
Francis
November 16, 2011 at 10:27 am
fhanlon (11/16/2011)
I like the question but the answer says:Since 200,474 pages equates to about 1.53 GB, you will not get error 1105.
I ran this code and I got error 1105 which I would not have received had I allocated more than 2G or allowed autogrowth but we were specifically told not to do this. The page count only goes down once the data is compressed.
I found this also. There are several other settings the question ignored that can change your outcome. 😎
November 16, 2011 at 10:41 am
Reposting Edit as new post.
Hugo Kornelis (11/16/2011)
Ouch, that is a very embarassing mistake. Can you please elaborate on which of my statements you mean, and which articles they contradict? I thought I checked everything very carefully; if I was wrong, I'd like to know before I start working on the next similar question.
You can not create a Primary Key on a table without a CLUSTERED INDEX being created for it. This is AutoMagic in SQL server. You allude in your explination that the Index is Clustered because it is not specified as NON-Clustered and that is not correct. There is a link to this information in your reference on primary key constraints. Here is the direct link to Unihttp://msdn.microsoft.com/en-us/library/ms175132.aspx
Here is the quote:
When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index. The primary key column cannot allow NULL values.
When you create a UNIQUE constraint, a unique nonclustered index is created to enforce a UNIQUE constraint by default. You can specify a unique clustered index if a clustered index on the table does not already exist.
I think your explination confuses what happens when you create a PRIMARY KEY constraint with what can happen if a UNIQUE constraint has been created first.
This is important to consider when sizing new data tables as you should use a UNIQUE constraint until you find proper candidate for the PRIMARY KEY or create a set of normalized tables with Row_Id's as primary keys.
November 16, 2011 at 10:54 am
SanDroid, you have me totally confused. Hugo said that if you don't specify NONCLUSTERED on creation of a Primary Key on a table that has has no clustered index, the index that supports the primary key will be clustered. The article you post says exactly the same thing. Why do you object? I don't even understand what your objection is, and Hugo's explanation of the creation of a clustered index on a primary key seemed basic and accurate to me.
November 16, 2011 at 11:05 am
jeff.mason (11/16/2011)
SanDroid, you have me totally confused. Hugo said that if you don't specify NONCLUSTERED on creation of a Primary Key on a table that has has no clustered index, the index that supports the primary key will be clustered. The article you post says exactly the same thing. Why do you object? I don't even understand what your objection is, and Hugo's explanation of the creation of a clustered index on a primary key seemed basic and accurate to me.
What is in the referenced material:
When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index.
What was stated by Hugo:
Since the PRIMARY KEY constraint is created without a CLUSTERED or NONCLUSTERED specification, the supporting index will be created as a clustered index
Creating a NONCLUSTERED Index for a Primary key constraint is not something that can be done just by specifying NONCLUSTERED in the initial table create statement of a table with only one index by design. I was asked today if creating a NONCLUSTERED index for a PRIMARY KEY on a table with no INDEXEs was possible after someone else read this.
November 16, 2011 at 11:25 am
If I understand your question, this should address it. Run the first two batches first, investigate the indexes, and then run the third batch. No errors.
Create table test1
(
test1 int primary key nonclustered identity (1,1),
test1text varchar(255)
)
GO
insert test1 values ('test1')
insert test1 values ('test2')
insert test1 values ('test3')
insert test1 values ('test4')
insert test1 values ('test5')
GO
create clustered index test2 on test1 (test1text)
GO
In other words, you can created a nonclustered primary key without a clustered index. Nothing stops a heap from having a primary key. The point is that if you start with a heap and don't specify what the PK should be, the default will be clustered. That was Hugo's point and the article you link agrees. So unless you are thinking of something else and I still misunderstand, this should settle that point.
November 16, 2011 at 12:31 pm
jeff.mason (11/16/2011)
If I understand your question, this should address it. Run the first two batches first, investigate the indexes, and then run the third batch. No errors.........
So unless you are thinking of something else and I still misunderstand, this should settle that point.
I think it was me that had misunderstood. Since I have always created a table PK as a unique set of values and the first one I never thought that you could make a PK constraint and set it to noncluster unless a cluster index already existed.
I see now that realy this is just how the defaults work for when you do not specify CLUSTERED or NONCLUSTERED.
:w00t:
Thanks for being patient and pointing that out. Awesome!
November 16, 2011 at 12:39 pm
SanDroid (11/16/2011)
I was editing my original post while you read and resonded to this. The original post did not comunicate what I intended.
No problem, that can happen. I'll go back to the revised version of your message.
Others have pointed out that some of your statement also contradict ones the articles they reference as do the numbers you have in your math. I can see around that though becuase a SQL data page has been the same size for a long time and your table rows fit.
Even after rereading the entire discussion, I fail to see wherer others point out errors that relate to the data page size. Can you elaborate, please?
The primary thing this question is missing from a real world size calculations perspective (...)
You are completely right. But I never intended the question to be from a real world size calculations. In the real world, one would never calculate the data size to the exact page number. And one would never simply specify the log file size as "sufficiently large",
The intention of my question was to test understanding of (an aspect of) how data is stored on disk. Your post mentions a lot of considerations that are all very relevant for estimating disk space requirements in a real-world scenario, but are not relevant for this QotD. I hope readers take note of what you say, but for this discussion, I won't respond to those points.
a char row of random values for a primary key (instead of a finding a candidate key in the data)
The only place where I mention using random values for a primary key is in a prior message in this discussion, not in the question itself. The context of that remark was a calculation of the number of theoretically possible key values to show that it's easy to generate unique values for a million rows - so easy that even a random value would have an extreme low likelihood of producing a duplicate.
Since the table and columns are completely fictional, there is no way to assess if the char(10) primary key is a surrogate key or a key from the business (though people who know me would know that I would never use a surrogate key without also adding a unique constraint on the business key!)
FYI, there defintely are cases where a char(10) column from the business data is the perfect candidate key. This could be one of them.
Not certain if Auto create Statistics was left on or off so I will believe it was the default or on.
Good point. I should probably have added in the question that statistics should not be included in the calculation.
The tables Primary Key Clustered Index is not at all related to the data stored in the table or to the 2 Million new rows being inserted into it.
That's just an assumption. The question contains no information to confirm or deny this theory.
The stats that would exist before and after the Index rebuild would require more disc space than a good Non-Clustered index.
I was unable to find information on the size of statistics, but since they contain a lot less information that indexes, I fail to see how any stats could ever require more disk space than any index.
That being said, this also feels like comparing apples to oranges to me.
A proper Table Primary Key created from two or more candidate key columns would also create a Clustered Index that could reduce the true space required by this table and all the data structures that it has.
Only if the total size of all columns in that primary key is less than 10 bytes. Otherwise, it'll take more size than the current clustered index. (And if the DemoTableKey is then declared as UNIQUE, which it should, the actual space usage will in fact go up instead of down!)
The sys.dm_db_missing_index_details management view can be used after the initial data insert is completed to find out exactly what a good non-clustered index would need to be.
Not at all! Just after the data insert, the information from this DMV would be highly skewed. In a real-world system (which this is not!), you could indeed get some useful information from the missing indexes DMVs - but only after letting it run for a while, or running a representative test. This view is added to when the optimizer compiles a query that could benefit from an index that doesn't exist. These suggestions are only for a single query. If you plan to use this DMV for real tuning, try to combine the various suggestions into a single index that helps lots of queries. And never forget that these suggestions give an indication of estimated saving for a single query, but do not take into account how the extra index might harm performance of inserts, updates, deletes, and merges.
With that being said. You can not create a Primary Key on a table without a CLUSTERED INDEX being creted for it.
You can. Just specify the optional NONCLUSTERED keyword when specifying the primary key. See the sample code posted by Jeff Mason, that will create a heap (table without clustered index), plus a nonclustered index to support the primary key.
The Books Online article you link to does try to explain this, but does not really phrase this well. There is a much better explanation in the CREATE TABLE article at http://msdn.microsoft.com/en-us/library/ms174979.aspx, where it reads:
"CLUSTERED | NONCLUSTERED
Indicate that a clustered or a nonclustered index is created for the PRIMARY KEY or UNIQUE constraint. PRIMARY KEY constraints default to CLUSTERED, and UNIQUE constraints default to NONCLUSTERED.
In a CREATE TABLE statement, CLUSTERED can be specified for only one constraint. If CLUSTERED is specified for a UNIQUE constraint and a PRIMARY KEY constraint is also specified, the PRIMARY KEY defaults to NONCLUSTERED."
As far as I see, this is exactly what I say in my explanation - in the absence of an explicit CLUSTERED or NONCLUSTERED keyword, the index for the primary key defaults to clustered (unless another index is specified as clustered - an edge case that I chose not to include in the explanation as it does not apply here and seldom -if ever!- occurs in real world situations).
November 16, 2011 at 12:42 pm
Hugo - great question.
As for more questions like this - why not, though that means no cheap points!
Viewing 15 posts - 16 through 30 (of 58 total)
You must be logged in to reply to this topic. Login to reply