April 5, 2007 at 7:57 am
Hi all,
I was given a database with a table that has 60000000 rows, the total mdf file's size is 16G.
The first I have to fix is: there is no ID column, primary key, index in this huge table.
I was trying to add another ID column and make it primary key and indexing it, but unfortuntely it's not successful.
What should I do?
Thanks lots.
April 5, 2007 at 8:15 am
You should be able to add a integer column with the Identity property set to true on this table. Setting the identity property will not only generate the artibtrary ID's, it will add a unique clustered index and set it as the primary key as well (provided there is not already a pk or clustered index on the table).
If it failed when you tried, it was most likely due to an Enterprise Manager timeout. It will take quite a bit of time to create this column and index. Perhaps 30-40 minutes depending on the speed of your system.
If your EM times out, use an ALTER TABLE statement from Query Analyzer to do the job.
April 5, 2007 at 8:16 am
A few questions:
- Can you post the schema?
- What did you try? and what was the error?
You need an index and a PK. Probalby more than one index. You should look at which columns are queried, particularly in WHERE clauses or which might be linked to other tables.
April 5, 2007 at 8:25 am
Thanks for reply. Here is the table:
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[IISlog]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[IISlog]
GO
CREATE TABLE [dbo].[IISlog] (
[date] [datetime] NULL ,
[time] [datetime] NULL ,
[c-ip] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[cs-username] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[s-ip] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[s-port] [smallint] NULL ,
[cs-method] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[cs-uri-stem] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[cs-uri-query] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[sc-status] [smallint] NULL ,
[cs(User-Agent)] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[cs(Referer)] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
April 5, 2007 at 8:29 am
"If it failed when you tried, it was most likely due to an Enterprise Manager timeout. It will take quite a bit of time to create this column and index. Perhaps 30-40 minutes depending on the speed of your system.
If your EM times out, use an ALTER TABLE statement from Query Analyzer to do the job."
The table's size is over 63 million rows, it takes hours to add a new column and still failed, time-out? I don't know.
What do you mean "ALTER TABLE statement from Query Analyzer"? does that add a new column immeadiately? I doubt it, because anyway the system needs to modify over 63 millions records.
April 5, 2007 at 8:33 am
Sorry I didn't keep the error message, I was trying to export data from the old table to a new table with indexed primary ID column, it took me hours and still failed, the new table was created but has 0 rows in it.
Regardless how I will use the table, the first thing I need to do to fix the table is, I believe, to add index and PK to it. Basically the table needs a new ID column which doesn't exist in the current schema. (How s** it is!)
April 5, 2007 at 9:53 am
This could take hours depending on your hardware. It will add a int column, set its identity property to true, make it a primary key, and create a unique clustered index.
ALTER TABLE IISLog ADD IISLog_id INT IDENTITY CONSTRAINT column_IISLog_id_pk PRIMARY KEY
April 5, 2007 at 9:58 am
If you create a unique clustered index on this you will have a serious impact on the server, and probably gain nothing. The identity value you put on will not have any bearing to any information in there beyond being a rownumber. I would personally avoid it, and certainly not cluster it - someone may infer that the rownumber has some relationship to the order of the data or something silly like that. You will also find that your transaction logs will grow dramatically - specially if you cluster it...
What kind of queries are you running against it this table?
April 5, 2007 at 12:54 pm
AFter two hours, it's done and returned:
Warning: The table 'IISLog' has been created but its maximum row size (13625) exceeds the maximum number of bytes per row (8060). INSERT or UPDATE of a row in this table will fail if the resulting row length exceeds 8060 bytes.
(63008970 row(s) affected)
Original files' size:
mdf: 16,860,096
ldf: 1,008,000
New files' size:
mdf:37,697,856
ldf: 28,333,632
April 5, 2007 at 2:25 pm
By executing the following query, it took me 8 minutes (query from indexed 63 million rows). Is it OK or not good? The hardware is: Intel Xeon 2.4G with 1G RAM (it gives me 4 CPU Usage History windows in Task Manager / Performance, does that mean it has four CPUs?)
select max(len([cs-uri-query])) as [max-cs-uri-query], max(len([cs(User-Agent)])) as [max-cs-User-Agent], max(len([cs(Referer)])) as [max-cs-Referer] from iislog
If the performance is not good, what should I do to optimize it?
Thanks.
April 5, 2007 at 4:56 pm
Here's the schema you posted...
CREATE TABLE [dbo].[IISlog] (
[date] [datetime] NULL ,
[time] [datetime] NULL ,
[c-ip] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[cs-username] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[s-ip] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[s-port] [smallint] NULL ,
[cs-method] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[cs-uri-stem] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[cs-uri-query] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[sc-status] [smallint] NULL ,
[cs(User-Agent)] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[cs(Referer)] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
Let's talk optimization...
Your goal should be to try to get the row size down to 8k bytes or 4 k bytes or something that will come close to a whole number of rows within 8k bytes. The other 60 bytes will be used for nullability markers, etc. This will help a bit with speed when indexes are used and will certainly help with storage requirements... you just did a max on several columns... pick a slightly higher even number and make the columns that wide.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2007 at 6:23 pm
Hi Jeff,
I totally agree with you that the table is not well designed at all. I am just trying to fix problem instead finding who designed the database.
But my question is: what should I do before I shrink the database? considering the table has millions rows, I really want to make sure that any change to the database won't lost data, and can be successfully finished.
Thanks.
April 5, 2007 at 9:16 pm
Ok, I can tell you're just a little nervous about this whole thing and I don't blame you one little bit... so, here's the plan and why I asked the questions I did...
The first problem to fix towards increasing the performance, is to fix the design so far as column widths go. You've started to take the first step to ensure no data is lost... determining the max width of the data currently in the table. You need to post those maximum widths for all the variable length columns before we take another step.
The second step would be to try those widths (plus a bit of a growth margin) by copying the table to another using SELECT/INTO... we would add the IDENTITY column at the same time. If it fails or truncates something... no biggee because we've done absolutely nothing to the original data. We'll simply drop the new table and try again with larger column widths (won't fail the first time if we do our homework).
Once we get a new table fully populated, complete with an IDENTITY column, properly sized datatypes, and properly indexed, THEN we can put the new table in service in about 65 milli seconds by doing a snap rename of the old and new tables. We will NEVER overwrite the original table structure or data until we'll 2000% sure that the new table is correct. Any data that may have been added to the old table between the time we copied the table to the new table and the time we did the snap rename can be easily added to the new table from the old table... there won't be much.
And THEN, once we're sure the new table is correct and we've validated that all rows and columns are present, accounted for, and correct, you're gonna get your DBA to do a full backup 'cause, right after that, we're gonna delete the old table.
And, I'm going to help you write the code to do it. That's what I really had in mind... but I was trying to get some info from you, first.
Now, about those questions I asked...
"We're all in this together and I'm pullin' for ya!" --Red Green
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2007 at 9:53 pm
Oh yeah... almost forgot... you need a complete list and script for any triggers that may exist on the table as well as a complete list of views that use the table as a source. We'll need to reinstantiate the triggers and recompile all the views... EM normally takes care of all that but we're being super careful... I'd do that even if I were using EM to do this all.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 6, 2007 at 7:30 am
Jeff's got great advice here and I agree with him. If this server is used by a few people it might be ok, but 1GB of RAM is pretty low. Even if it was me that needed to access it alone I'd be looking at 2GB to get any good performance out of things.
Viewing 15 posts - 1 through 15 (of 40 total)
You must be logged in to reply to this topic. Login to reply