February 1, 2010 at 4:47 am
**Scenario 1**: I have just got a Table with Bookings for a Train for a month, Reservations of 5 mil, I need to change the schema of the table. 60 mil in 12 months time.
If i want to make a DDL operation on the shema it takes ages.
I put in a BCP out and drop the table re-create it and bcp in.
but index-recreation is taking time. not as much as the other one with straright alter table did.
**Scenario 2**: I partition the table into twelve months, is it possible to apply the ddl and would it be any faster. as it can be parallel, i have a test table and it returned to be more positive with 1 mil records of int,int,int col.
Then i thought i can use a partiton switch to make sure i have the data in a different table, considering the lock escalations process and switching the escalations off.
Then issue the DML on the table,
1. Is there a easy way to copy data across paritions parallel,?
2. Not Considering OPENROWSET as it has to go through the layer of OLEDB, it wont be as fast as native sql server. or is there any stats which are available if any one has done some stats on it?
Scenario 2 is more appealing for me, as i can rebuild indexes parallel and also if data for Dec is not required i can still index it when i need to, i.e., i can hold it for some time.
**Scenario 3**: Is there any thing else in the industry which is used by the community.
Regards
Vinay
February 1, 2010 at 3:36 pm
What is the change you are trying to implement, are you scripting it or running it through management studio?
Andrew
February 1, 2010 at 4:57 pm
First, table partitioning is an "Enterprise Edition" option. If you don't have the "Enterprise Edition", it's not an option.
If you don't have the "Enterpise Edition", you could make a "Partitioned View" with the underlying correctly date-named tables, a bit of code to automatically create and move data from the "current month" table to a date-named table, and the bit of code to automatically update the partitioned view to include the new table.
There are some uncomplicated but strict rules for creating good partioned views in Books Online.
The advantage here is that once a date-named table has been created and provided that no data is added or modified, the indexes on those tables shouldn't need any maintenance at all. Only the "current month" table will need that.
The disadvantage is, of course, that you'll need to write some code to create the tables and populate them instead of just "sliding" a window. If you write the code correctly, though, it'll become a monthly "no brainer".
Also, be aware that even correctly setup partitioned views may not operate as fast as advertized. Like everything else, "It Depends".
Of course, if you have the "Enterprise Edition", partitioned tables would definitely be the way to go.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 3, 2010 at 9:03 am
The issue is not what we are thinking!.
Well to Explain it
--------------------------------------------------------------------------------------
Partition1 | Partition2 | Partition3 | Partition4 | Partition5 |
42 MilRec | 42 Mil Records | 42 Mil Records | 42 Mil Records | 42 Mil Records |
---------------------------------------------------------------------------------------
Now i want to alter this table to add a new column with not null defaulting some value.
Estimated time would be taken is around about 7 hours to alter this table to add a not null for 7 years worth of history.
Now What is the fastest way available for us to make sure alter happens with speed.
Thats the reason why i wanted to swtith this to a mirror table and alter the blank table and copy data back to the primary table , as i cannot switch back the partition into the main table as the structure is not the same any more.
What is the quick way to alter the table with not null columns.
Currently:
We do is
Remove all the Foreign keys.
Rename the table to a old table.
Create a new table
Insert all data back to Original Reservations Table. in a loop.
Create all Foreign keys.
Truncate the old table
Drop the old table.
To limit the Transaction log overflow.
Need help which does some thing like a bcp without the over head of casting and converting.
Any ideas on this?
Regards
Vinay
February 21, 2010 at 10:03 am
Why does the schema of such a huge table have to change? Please explain the logic behind this as it seems highly unlikely to be sensible. What are you doing that requires the addition of an extra column, and why does it have to be NOT NULL with a default value?
It is generally a bad idea to add or remove columns from a table at the best of times. Doing so on a huge partitioned table, while preventing SQL Server from performing it as a meta-data operation seems close to masochism.
If you can describe a sensible and reasonable business or technical justification for the requirement, I'll give it some thought.
Paul
February 21, 2010 at 11:20 am
Its is not very uncommen to have a business change in the current business conditions and we get new upstream and downstream systems and Our OLTP needs to cope with the data what we are processing what type of customers we are taking in prioritisation mechanisam many other things which need as per the business change.
This makes up the story aquisition of different company makes it difficult to incorporate their schema and other things.
So ultimately the schema has to change to accomodate the in and out of data.
and we need 7yrs worth of history to stay online and available.
I hope i put it in the right way. Trust me there are scenario's where this is realy necessary.
And Not null is the only way as the column must have data. What do you do in such a situation, I give up. as i dont have anything in my hand.
but if somebody is having some clean idea then that would be interesting.
Regards
Vinay
February 21, 2010 at 6:19 pm
Vinay,
I've not read all of the posts on this thread so let me ask... can the system be down while you do this or not? If so, for how long?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 21, 2010 at 9:45 pm
The reason I asked for details was because the best solution often depends on the details.
All you've done really is ask me to trust you, and that there really is no alternative.
I have no reason to disbelieve you, of course, it's just that without knowing the details, I cannot be sure I would agree with your assessment.
I have a feeling Jeff is about to suggest bulk copying the data out and using a BCP format file on the way in.
If so, this is indeed an option.
Another option is to add the column as a metadata change (adding it as NULLable). You could then go through and update small batches from NULL to the default value over time.
There are many, many methods. The correct one depends on the details, which we don't have.
Paul
February 22, 2010 at 4:32 am
I would like to give some ideas based on my experience ( though this approach may change depends on your business requirement)
1. Rename the source table ( history data table) as OLD and keep away from the usage ( Make sure no process or application can use thios table)
2. Create NEW table with additional column and partition based on date range or month value
3. Add indexes on the NEW table
4. Create NEW processing table without any partition but similar to the NEW structure. No index at this moment
5. BULD load ( use TABLOCK , etc) to load data from the OLD ?(history table) required for single partition ( ie for the month data)
6. Create indexes (similar to the one required for NEW table). Complete other tasks required for parttion switch
7. Apply PS from NEW processing to Target
8. Validate the data count ..on Target
9. Drop all indexes
Repeat the steps from step 5 to till 9 for all the months. That's all.
February 22, 2010 at 4:51 am
Jeff Moden (2/21/2010)
Vinay,I've not read all of the posts on this thread so let me ask... can the system be down while you do this or not? If so, for how long?
The system can be down for an hour or two as we do the tweeks in the weekend jobs,
The window could be also increased to 3 hours max which we have time to make sure that the patch applied correctly and a small tests which run on top of it.
Thank you in advance.
Regards
Vinay
February 22, 2010 at 4:58 am
Paul White (2/21/2010)
The reason I asked for details was because the best solution often depends on the details.All you've done really is ask me to trust you, and that there really is no alternative.
I have no reason to disbelieve you, of course, it's just that without knowing the details, I cannot be sure I would agree with your assessment.
I have a feeling Jeff is about to suggest bulk copying the data out and using a BCP format file on the way in.
If so, this is indeed an option.
Trust me it took long time to mockup the data for 7 years and bring the database to a grinding halt. then next i made sure i had one partition per year,
Did a bcp on parallel into the database 7 clients hitting diffenrt partitions to bcp in data into the table that was slower than the
single threaded bcp into the partitioned table. may be i was maxing on memory,
Another option is to add the column as a metadata change (adding it as NULLable). You could then go through and update small batches from NULL to the default value over time.
There are many, many methods. The correct one depends on the details, which we don't have.
Paul
This is an other thing which i thougt should be straright forward doing this not null and then updating it to some default value, which business requires. Then agian i am running out of 3 hour data window allowed, it makes me feel like a looser.
I thought i will finally give up on this.
Regards
Vinay
February 22, 2010 at 6:44 am
bhushanvinay (2/22/2010)
This is an other thing which i thougt should be straright forward doing this not null and then updating it to some default value, which business requires. Then agian i am running out of 3 hour data window allowed, it makes me feel like a loser. I thought i will finally give up on this.
Before you give up, have you considered using a view?
You would need to test carefully, but I have use this approach successfully before.
Rather than get too much into the detail, here is a full demonstration script...
USE tempdb;
GO
IF OBJECT_ID(N'dbo.HugeTable', N'V') IS NOT NULL DROP VIEW dbo.HugeTable;
IF OBJECT_ID(N'dbo.HugeTable', N'U') IS NOT NULL DROP TABLE dbo.HugeTable;
IF OBJECT_ID(N'dbo.HiddenHugeTable', N'U') IS NOT NULL DROP TABLE dbo.HiddenHugeTable;
GO
CREATE TABLE dbo.HugeTable
(
row_id INTEGER IDENTITY PRIMARY KEY,
data BIGINT NOT NULL,
);
GO
-- Row number generator
WITH L0 AS (SELECT 1 AS c UNION ALL SELECT 1),
L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
Num AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N FROM L5)
-- Add 250K rows to the test table (takes around one second)
INSERT dbo.HugeTable WITH (TABLOCK)
(data)
SELECT TOP (250000)
N
FROM Num
ORDER BY
N ASC;
GO
-- Add a new column, as a purely metadata operation (instant)
ALTER TABLE dbo.HugeTable ADD new_column INTEGER NULL;
GO
-- Rename the table (instant if it succeeds)
EXECUTE sp_rename N'dbo.HugeTable', N'HiddenHugeTable', 'OBJECT';
GO
-- Optional: add an index to allow us to find NULL values quickly later on
-- Takes around 1 second to execute
-- Index could be created later, possibly ONLINE
CREATE NONCLUSTERED INDEX nc1 ON dbo.HiddenHugeTable (new_column)
WITH (FILLFACTOR = 50, PAD_INDEX = ON, SORT_IN_TEMPDB = ON, ONLINE = OFF, MAXDOP = 4)
ON [PRIMARY];
GO
-- Create a view over the renamed table
-- The view returns a default value if it finds a NULL in the new column
-- To clients, it seems as if the new column exists, and has the required value
CREATE VIEW dbo.HugeTable
WITH SCHEMABINDING
AS
SELECT HHT.row_id,
HHT.data,
new_column = ISNULL(HHT.new_column, 12345) -- 12345 is the DEFAULT value
FROM dbo.HiddenHugeTable HHT;
GO
-- Create INSTEAD OF triggers on the view to allow changes to the data
-- INSERTs, UPDATEs, and DELETEs are simply performed against the
-- renamed real table. The triggers are written separately for clarity.
-- INSERT
CREATE TRIGGER dbo.[trg dbo.HugeTable IOI]
ON dbo.HugeTable
WITH EXECUTE AS CALLER
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
SET ROWCOUNT 0;
IF NOT EXISTS (SELECT * FROM inserted) RETURN;
INSERT dbo.HiddenHugeTable
(data, new_column)
SELECT data, new_column
FROM inserted;
END;
GO
-- DELETE
CREATE TRIGGER dbo.[trg dbo.HugeTable IOD]
ON dbo.HugeTable
WITH EXECUTE AS CALLER
INSTEAD OF DELETE
AS
BEGIN
SET NOCOUNT ON;
SET ROWCOUNT 0;
IF NOT EXISTS (SELECT * FROM deleted) RETURN;
DELETE HHT
FROM deleted DEL
JOIN dbo.HiddenHugeTable HHT
ON HHT.row_id = DEL.row_id;
END;
GO
-- UPDATE
CREATE TRIGGER dbo.[trg dbo.HugeTable IOU]
ON dbo.HugeTable
WITH EXECUTE AS CALLER
INSTEAD OF UPDATE
AS
BEGIN
SET NOCOUNT ON;
SET ROWCOUNT 0;
IF NOT EXISTS (SELECT * FROM inserted) RETURN;
UPDATE HHT
SET data = INS.data,
new_column = INS.new_column
FROM inserted INS
JOIN dbo.HiddenHugeTable HHT
ON HHT.row_id = INS.row_id
END;
GO
-- Tests
-- 1. Show the first ten rows from the view - the new column is there, with the default value
SELECT TOP (10) * FROM dbo.HugeTable ORDER BY row_id ASC;
-- 2. Show the first ten rows from the hidden table - the new column is there, but full of NULLs
SELECT TOP (10) * FROM dbo.HiddenHugeTable ORDER BY row_id ASC;
-- 3. Add two new rows, through the view
INSERT dbo.HugeTable (data, new_column)
SELECT -1, 10 UNION ALL
SELECT -2, 20;
-- 4. Show that the new rows exist
SELECT * FROM dbo.HugeTable WHERE row_id > 250000;
-- 5. Update the new rows
UPDATE dbo.HugeTable
SET data = data - 100,
new_column = new_column + 50
WHERE row_id > 250000;
-- 6. Show the updated rows
SELECT * FROM dbo.HugeTable WHERE row_id > 250000;
-- 7. Delete the new rows
DELETE dbo.HugeTable WHERE row_id > 250000;
-- 8. Show that the rows have gone
SELECT * FROM dbo.HugeTable WHERE row_id > 250000;
-- 9. An example query to update a section of the underlying NULL values
UPDATE TOP (500)
dbo.HiddenHugeTable
SET new_column = 12345
WHERE new_column IS NULL;
-- 10. Once all the NULL values have been updated, we don't need the view any more
BEGIN TRANSACTION;
-- Drop the view
IF OBJECT_ID(N'dbo.HugeTable', N'V') IS NOT NULL DROP VIEW dbo.HugeTable;
-- Rename the table
EXECUTE sp_rename N'dbo.HiddenHugeTable', N'HugeTable', 'OBJECT';
-- Drop the index we used to help the update (note the table name has changed)
DROP INDEX nc1 ON dbo.HugeTable;
COMMIT TRANSACTION;
-- 11. Tidy up
IF OBJECT_ID(N'dbo.HugeTable', N'V') IS NOT NULL DROP VIEW dbo.HugeTable;
IF OBJECT_ID(N'dbo.HugeTable', N'U') IS NOT NULL DROP TABLE dbo.HugeTable;
IF OBJECT_ID(N'dbo.HiddenHugeTable', N'U') IS NOT NULL DROP TABLE dbo.HiddenHugeTable;
-- End of demo script
Paul
February 22, 2010 at 11:38 am
Hi Paul,
Seems idd like a verry clever idea.
Gonna save this post for later use, could come in handy.
Ps: didn't you miss deleting the triggers also once the view isn't needed anymore, cause tehy will reference a non existing object (HiddenHugeTable)
Wkr,
Eddy
February 22, 2010 at 10:14 pm
eddy-644184 (2/22/2010)
Seems like a very clever idea. Im going to save this post for later use, could come in handy.PS: Didn't you miss deleting the triggers also once the view isn't needed any more? They will reference a non existing object (HiddenHugeTable)
Hey Eddy,
Thanks. It can be made to work well, but thorough testing for compatibility with the old structure is absolutely essential.
When the view is deleted, the triggers are automatically deleted by SQL Server. There's no need to drop them explicitly.
Paul
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply