October 3, 2012 at 5:57 am
ScottPletcher (10/2/2012)
Even turning on certain options / features lengthens rows in SQL Server.
I've never heard of such a thing, Scott. Do you have an example of this?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2012 at 6:10 am
foxxo (10/3/2012)
What about adding a new BIGINT column with NULL default so theres no locking, populate with values via script in batches.Then have a short outage while the new column is marked as IDENTITY(1billion,1), then rename old and new Id columns. OldId will remain for reference, or you could try dropping the column during the outage, but it will take some time.
I believe that would cause massive page splitting to make room for the new column. Dropping the clustered index probably wouldn't help here either because the resulting heap would still need to expand the rows.
I'd have to do some testing to make sure it would work correctly but I would try making the new table as an empty table with the IDENTITY seed on the BIGINT column larger than the largest value in the old table. Then, combine the two tables using a partitioned view. This new view would be named the same as the old table and, of course, the old table would be renamed. Then, create an INSTEAD OF trigger to intercept new inserts to force the new inserts into the new table rather than the old. Correctly done, the partitioned view would work for UPDATEs, DELETEs, and SELECTs without further complication.
Except for a possibly new constraint on the old and new tables, the whole shootin' match could be done online in about 65 milliseconds.
Again, this all is just a thought and should be tested prior to actually trying to implement it. And, yeah.... it'll take a bit of planning to do it right the first time.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2012 at 8:13 am
also, if you are running on an old server with I/O bottlenecks, are you running SQL2008. An earlier versin may affect the answer.
Its not an area I am familiar with, but would partitioning the table help him out here?
October 3, 2012 at 8:15 am
another thing, if you are going to create a new table and manually insert the records into it, dont forget to turn identity insert on; otherwise you will potentially destroy the key sequence. Not good if it is used as a FK on other tables 😉
October 3, 2012 at 8:29 am
Eugene Elutin (10/2/2012)
ScottPletcher (10/2/2012)
sql-lover (10/2/2012)
bteraberry (10/2/2012)
ScottPletcher (10/2/2012)
Maybe I'm missing something.Why not just ALTER the column to be a bigint instead of an int?
He said these are very big tables. Altering the column means that every single records needs more storage space to accommodate the larger data type. Tons of downtime is likely to result because in most environments the extra space won't be available without shuffling everything around.
You nail it down! ... that is correct ...
Can you show the results that demonstrate that claim? You only need an additional 4 bytes per row. Did you pack the table to 99-100% rather than 98%?
Now, you might have done something dopey and put the identity in your clustered key, in which case you cannot just ALTER it. And dropping and recreating the clus index would indeed be much more overhead than a simple ALTER column.
OP did mentioned that this column does participate in relationship, so I do think it is at least PK. Could you please clarify why putting the identity into clustered key is "dopey"? Do you, somehow, know what this table holds? I want the same crystal ball ;-).
Actually, I can easily believe that even without index on this column, it may be much faster to re-insert into new table than ALTER the existing one. It may depend on position of this column (let me guess it's a first one) and wideness of the table. Also, OP cannot allow too long down-time which will be required in case of using ALTER.
I guess the best way would be the one suggested by OP. May be it needs to be batched.
Why would the relative position of the column make the slightest bit of difference at all?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 3, 2012 at 8:53 am
foxxo (10/3/2012)
What about adding a new BIGINT column with NULL default so theres no locking, populate with values via script in batches.Then have a short outage while the new column is marked as IDENTITY(1billion,1), then rename old and new Id columns. OldId will remain for reference, or you could try dropping the column during the outage, but it will take some time.
You can't change a column to identity after you create it.
October 3, 2012 at 8:54 am
aaron.reese (10/3/2012)
another thing, if you are going to create a new table and manually insert the records into it, dont forget to turn identity insert on; otherwise you will potentially destroy the key sequence. Not good if it is used as a FK on other tables 😉
True! Thought about that, already implemented on my script. Thanks for the advice.
Regarding your other comment, I can't use partitioning (would be awesome, assuming I have the proper Indexes) because this is not Enterprise edition..
Here's what I've done so far ..
-Exported the tables via BCP and moved the flat files to a different server.
-I created a dummy database there with empty schema and tables and now importing flat files using bcp in. It is taking a while ...old server is even older, but same SQL flavor.
Once second step is done, I will duplicate the source table (created another one already) and insert rows via bcp on second table. So I can create script to drop FKs, etc, on source... let's see ...
October 3, 2012 at 8:58 am
We recently encountered the same issue in multiple production db's.
When re-inserting the data back in use BULK INSERT and remove the indexes, it is significantly faster.
October 3, 2012 at 9:10 am
So I guess there's no issue with rows being UPDATEd / DELETEd while the new table is being created?
Or with the potential lock outs/delays to current modifications during the export of the table?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
October 3, 2012 at 9:42 am
ScottPletcher (10/2/2012)
sql-lover (10/2/2012)
bteraberry (10/2/2012)
ScottPletcher (10/2/2012)
Maybe I'm missing something.Why not just ALTER the column to be a bigint instead of an int?
He said these are very big tables. Altering the column means that every single records needs more storage space to accommodate the larger data type. Tons of downtime is likely to result because in most environments the extra space won't be available without shuffling everything around.
You nail it down! ... that is correct ...
Can you show the results that demonstrate that claim? You only need an additional 4 bytes per row. Did you pack the table to 99-100% rather than 98%?
Now, you might have done something dopey and put the identity in your clustered key, in which case you cannot just ALTER it. And dropping and recreating the clus index would indeed be much more overhead than a simple ALTER column.
Probably won't matter what the fill factor is.
First, the fill factor only affects build/rebuild on indexes. It doesn't affect subsequent inserts and updates.
If the Identity column is the leading edge of the clustered index, and that index was originally built or has been rebuilt (recently) with a fill factor that allows for average row-size to increase by 4 bytes, then fill factor will matter. But, if the cluster is on an ID value, and the table isn't subject to a lot of size-increasing updates, you would normally have the fill factor at 100, because inserts will be at the end of the index anyway, and a lower fill factor just means you're using disk space to hold nothing at all in that case.
If the ID column isn't the leading edge of the clustered index, but is the leading edge of other critical indexes, then you'll still have a problem with page-splits, just not in the clustered index. That means I/O, CPU, RAM, page-locks (and/or heavy tempdb use), possibly extent/index locks, etc. So you may still have problems with this even if it's not the leading edge.
Even if it's not the leading edge, but it's anywhere in the b-tree except the leaf level, you'll still have to face potentially running page-splits due to that. If it's only at the leaf level (rare for an ID, since there's not generally much use for one in that position), it might potentially be possible to avoid splits, theoretically. I doubt it's a realistic scenario, but I concede it could happen.
Then there's the fact that, at the page level, fixed-width columns, like Int and BigInt, are stored first, and variable width (varchar, varbinary, sparse [if I'm not mistaken]) ones are stored after. That means, depending on table structure and how the engine decided to do the actual physical storage, you're likely going to get page-rewrites and splits anyway, just because of the size increase. It's not like increasing the maximum size of a varchar column, which can often be done simply through a metadata change with a DDL command, without having to actually do anything at the storage level of the data. The only way you'd avoid that, theoretically, on an Int to BigInt modification, is if the column happens to be stored last, and row-size is such that it's not an even 8k page division, so there might be room after each row for the 4-byte increase. You'd still get rewrites of all the pages, but not splits. The rewrites aren't likely to take as much time as inserts, but are likely to lock the table for whatever duration they do take, while inserting into a separate table via BCP is unlikely to result in that kind of lock escalation.
There are people who know the storage internals better than I do, and if any of them want to correct what I've written here, please do. I've tested the Alter Table version, and it's not what you'd expect if you're used to doing that on varchar or varbinary columns. Try it yourself, you'll see what I mean.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 3, 2012 at 9:47 am
Faced 1st challenge (dumb error)
I can't use "n" option with BCP or re-inserted from source table. Source has "int" on that column. Target table already has the new "bigint" data type. I am getting this error:
Starting copy...
SQLState = 42000, NativeError = 7339
Error = [Microsoft][SQL Native Client][SQL Server]OLE DB provider 'STREAM' for l
inked server '(null)' returned invalid data for column '[!BulkInsert].DOB'.
Can someone refresh my mind and tell me what BCP option do I need to use? I already have the flat file from source. I just need to import the information into the new table, which already has "bigint" there. This is on my testing environment, of course.
The remaining ones or discrepancy, is something I was planning to insert during the short offline period, using pure SELECT and INSERT commands and IDENTITY INSERT.
Anyone? ...
*** EDIT ***
I honestly don't remember if I can import data via BCP and change or specify datatype somewhere. But I can actually alter or change the data type now on my testing environment on that table, and export again via BCP. Still I wonder if I can skip that extra step.
October 3, 2012 at 10:51 am
sql-lover (10/3/2012)
Faced 1st challenge (dumb error)I can't use "n" option with BCP or re-inserted from source table. Source has "int" on that column. Target table already has the new "bigint" data type. I am getting this error:
Starting copy...
SQLState = 42000, NativeError = 7339
Error = [Microsoft][SQL Native Client][SQL Server]OLE DB provider 'STREAM' for l
inked server '(null)' returned invalid data for column '[!BulkInsert].DOB'.
Can someone refresh my mind and tell me what BCP option do I need to use? I already have the flat file from source. I just need to import the information into the new table, which already has "bigint" there. This is on my testing environment, of course.
The remaining ones or discrepancy, is something I was planning to insert during the short offline period, using pure SELECT and INSERT commands and IDENTITY INSERT.
Anyone? ...
*** EDIT ***
I honestly don't remember if I can import data via BCP and change or specify datatype somewhere. But I can actually alter or change the data type now on my testing environment on that table, and export again via BCP. Still I wonder if I can skip that extra step.
Since you didn't post the table structure or the BCP commands you used, it is a little hard to troubleshoot.
On the assumption, that you are trying to export and then import in native mode, you will need to modify the export BCP to export the identity column as BIGINT, instead of INT. One way to do that is by exporting a query where you cast the ID column to BIGINT, instead of exporting the table.
I recommend that you test with a small number of rows until you get everything working.
October 3, 2012 at 10:56 am
Michael,
Good catch!
Here's the simple schema...
CREATE TABLE [dbo].[MyTargetTable](
[col1] [bigint] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
[col2] [int] NOT NULL,
[col3] [smallint] NOT NULL,
[col4] [smalldatetime] NOT NULL,
[col5] [real] NULL
) ON [FG2]
col1 is INT on source ... I also discovered I can use BCP and XML file to specify data type, but still having some weird errors.
October 3, 2012 at 10:59 am
This is the kind of code I was referring to if you choose to go with a chunking process rather than BCP. We've had similar projects as yours and this is how we normally do it.
declare @i int, @cnt int, @blockSize int = 5000;
set identity_insert dbo.YourNewTable on;
while 1 = 1
begin
begin try
begin transaction;
-- recalculate each time (3-4 reads so lightweight) since there may be gaps in ID's ...
-- otherwise, just calculate before loop and add the @blockSize to @i each time
select @i = coalesce((select top (1) PrimaryKeyID from dbo.YourNewTable order by PrimaryKeyID desc), 0);
insert into dbo.YourNewTable (PrimaryKeyID, Col2, Col3, Col4)
select top (@blockSize) PrimaryKeyID, Col2, Col3, Col4
from dbo.YourOldTable
where PrimaryKeyID > @i
option (maxdop 4); -- I'm pretending there are 16 processor cores
set @cnt = @@ROWCOUNT;
commit transaction;
if @cnt < @blockSize break;
waitfor delay '00:00:00.500';
end try
begin catch
print error_message();
rollback transaction;
end catch
end
set identity_insert dbo.YourNewTable off;
The block size here is 5000. What will work in your situation really depends, but on critical systems I try to error on the low side. Each iteration is wrapped explicitly in its own transaction to prevent the whole thing from being the same transaction which would kill your log. And then as I mentioned before, I'm using both a maxdop and a delay of a half second to make 100% sure the query cannot interfere with normal activities on the server.
I would use the same script one last time after your old table has been renamed. I would write a final script to do the old table rename, then use this same script above getting the data from the renamed table to do the final population, then rename the new table. Your downtime should be a matter of seconds or less.
I'm not trying to say that you shouldn't use BCP, I'm just saying how we have handled similar issues in the past.
** edit: fixed an error of when @@rowcount was referenced within the script **
October 3, 2012 at 12:18 pm
bt,
That's a very nice piece of code, thank you so much. However, I wonder how many locks (if one) it will create. Like I said, the production server has serious IO issues and very bad drive layout. I'm actually in the process of migrate to a new environment I designed.
But I may try and test on my ITG box. Also, how do I know when complete? The live table gets incremented sporadically. I guess the scripts will surpass the insert rate.
Still I wonder how I can make BCP work via XML.
Viewing 15 posts - 16 through 30 (of 43 total)
You must be logged in to reply to this topic. Login to reply