Archieving Data

  • Hi,

    We have one large table with 6880875 records. It has created date. The min created date is a year ago but really we doesn't required to keep the data more than 2-3 months in this table.

    We want to archive this data from Prod to some other database in different server.

    What is the best approach and how could we achieve this? Is it possible through Sql job

  • There are a lot of unknowns here, and some of it depends on what tools you have at your disposal. I would personally build an SSIS package to move the rows you need from one server to another. The SSIS package will manage all the connections to the separate servers, and handle extraction and movement of the data. Just write a query on the source adapter to pull out the rows you want, move them over the wire to the other server, then get rid of them at the source.

    Executive Junior Cowboy Developer, Esq.[/url]

  • ramana3327 (5/5/2015)


    Hi,

    We have one large table with 6880875 records. It has created date. The min created date is a year ago but really we doesn't required to keep the data more than 2-3 months in this table.

    We want to archive this data from Prod to some other database in different server.

    What is the best approach and how could we achieve this? Is it possible through Sql job

    Which edition of SQL Server do you have? And how much free space do you have on the data drive?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • For the sake of discussion we'll just say that we want to archive stuff older than three months.

    The most basic approach would be to use a stored proc and a SQL agent job.

    Using this sample data:

    -- setup sample data

    IF OBJECT_ID('tempdb.dbo.livetable') IS NOT NULL DROP TABLE dbo.livetable;

    IF OBJECT_ID('tempdb.dbo.archiveTable') IS NOT NULL DROP TABLE dbo.archiveTable;

    -- your live data here

    CREATE TABLE dbo.liveTable

    (

    ld_id int primary key,

    c1 varchar(10),

    create_date datetime not null

    );

    -- archive data here (this would be located somewhere else)

    CREATE TABLE dbo.archiveTable

    (

    ld_id int,

    c1 varchar(10),

    create_date datetime not null

    );

    -- sample data with 2 old records

    INSERT dbo.liveTable VALUES

    (10,'ok','5/1/2015'),

    (9,'ok','4/1/2015'),

    (8,'too old','1/1/2015'),-- more than 3 months old

    (7,'too old','12/1/2014');-- more than 3 months old

    GO

    -- Before archive

    SELECT * FROM dbo.liveTable;

    SELECT * FROM dbo.archiveTable;

    You could create a stored procedure ran this:

    BEGIN TRANSACTION arch1

    -- Get ID's of stuff > 3mo's old

    SELECT ld_id

    INTO #tmp

    FROM dbo.liveTable

    WHERE create_date < DATEADD(MONTH,-3,CONVERT(date,getdate()));

    -- insert it into #acrchiveTable

    INSERT dbo.archiveTable

    SELECT *

    FROM dbo.liveTable

    WHERE ld_id IN (SELECT ld_id FROM #tmp);

    -- Delete from #livetable

    DELETE FROM dbo.liveTable

    WHERE ld_id IN (SELECT ld_id FROM #tmp);

    COMMIT WORK

    GO

    DROP TABLE #tmp;

    and create a SQL Agent Job that runs that stored proc regularly. It would run slowest the first time of course. I run it all as one transaction so as not to delete data that has not been archived.

    This is by no means a complete solution and excludes dealing with things like preventing duplicates in the archive data. It's an example of the more basic ways I have approached archiving data.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • We are using Sql 20082 Standard edition. Data drive has enough free space but that table stores the errors that we don't required to store more than 3 months.

    To do this data archiving we need to have linked server otherwise we can't able to do that process b/n servers

  • ramana3327 (5/6/2015)


    We are using Sql 20082 Standard edition. Data drive has enough free space but that table stores the errors that we don't required to store more than 3 months.

    Understood. I wanted to know how much space you had so that we can do this as easily and with as little downtime/major blocking time as possible. From the sounds of it, we should be able to limit any interference to about 65ms. 🙂

    I need just one more thing, please. Please post the DDL for the table including any constraints and indexes so I can check for anything that might prevent what I'm going to suggest and maybe give you a couple of clues to keep things real simple.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ramana3327 (5/6/2015)


    We are using Sql 20082 Standard edition. Data drive has enough free space but that table stores the errors that we don't required to store more than 3 months.

    To do this data archiving we need to have linked server otherwise we can't able to do that process b/n servers

    Going between servers I would use SSIS to archive the data on another server. Wouldn't require the use of linked servers.

    Jeff, I know, you would not use SSIS as you have trouble even spelling it. 😀 😉

  • Space means in the destination server or source server?

    Here is the table structure

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[ERRORS](

    [Id] [uniqueidentifier] NOT NULL,

    [InnerErrorId] [uniqueidentifier] NULL,

    [CreatorFK] [uniqueidentifier] NOT NULL,

    [Created] [datetime] NOT NULL,

    [Trace] [ntext] NOT NULL,

    [Message] [nvarchar](max) NOT NULL,

    [IsInnerError] [bit] NOT NULL CONSTRAINT [DF_Error_IsInnerError] DEFAULT ((0)),

    [ErrorApplicationId] [int] NULL,

    [IsHidden] [bit] NOT NULL CONSTRAINT [DF_Error_IsHidden] DEFAULT ((1)),

    [IsTerminating] [bit] NOT NULL CONSTRAINT [DF_Error_IsTerminating] DEFAULT ((1)),

    [Information] [nvarchar](2000) NOT NULL CONSTRAINT [DF_Error_Information] DEFAULT (''),

    CONSTRAINT [PK_Error] PRIMARY KEY CLUSTERED

    (

    [Id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[ERRORS] WITH CHECK ADD CONSTRAINT [FK_Error_User] FOREIGN KEY([CreatorFK])

    REFERENCES [dbo].[User] ([ID])

    GO

    ALTER TABLE [dbo].[ERRORS] CHECK CONSTRAINT [FK_Error_User]

    GO

    ALTER TABLE [dbo].[ERRORS] WITH CHECK ADD CONSTRAINT [FK_ERROR_ERRORApplication] FOREIGN KEY([ERRORApplicationId])

    REFERENCES [dbo].[ErrorApplication] ([Id])

    GO

    ALTER TABLE [dbo].[ERRORS] CHECK CONSTRAINT [FK_ERROR_ERRORApplication]

    GO

  • Lynn Pettis (5/6/2015)


    ramana3327 (5/6/2015)


    We are using Sql 20082 Standard edition. Data drive has enough free space but that table stores the errors that we don't required to store more than 3 months.

    To do this data archiving we need to have linked server otherwise we can't able to do that process b/n servers

    Going between servers I would use SSIS to archive the data on another server. Wouldn't require the use of linked servers.

    Jeff, I know, you would not use SSIS as you have trouble even spelling it. 😀 😉

    I might not know how to spell it but I do know that it's a 4 letter word in my book. 😉 I also love it when folks say they would use SSIS to do such-and-such but don't show how to do it. "It's easy to get to the Moon... just build a rocket ship!" :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ramana3327 (5/6/2015)


    Space means in the destination server or source server?

    Here is the table structure

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[ERRORS](

    [Id] [uniqueidentifier] NOT NULL,

    [InnerErrorId] [uniqueidentifier] NULL,

    [CreatorFK] [uniqueidentifier] NOT NULL,

    [Created] [datetime] NOT NULL,

    [Trace] [ntext] NOT NULL,

    [Message] [nvarchar](max) NOT NULL,

    [IsInnerError] [bit] NOT NULL CONSTRAINT [DF_Error_IsInnerError] DEFAULT ((0)),

    [ErrorApplicationId] [int] NULL,

    [IsHidden] [bit] NOT NULL CONSTRAINT [DF_Error_IsHidden] DEFAULT ((1)),

    [IsTerminating] [bit] NOT NULL CONSTRAINT [DF_Error_IsTerminating] DEFAULT ((1)),

    [Information] [nvarchar](2000) NOT NULL CONSTRAINT [DF_Error_Information] DEFAULT (''),

    CONSTRAINT [PK_Error] PRIMARY KEY CLUSTERED

    (

    [Id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[ERRORS] WITH CHECK ADD CONSTRAINT [FK_Error_User] FOREIGN KEY([CreatorFK])

    REFERENCES [dbo].[User] ([ID])

    GO

    ALTER TABLE [dbo].[ERRORS] CHECK CONSTRAINT [FK_Error_User]

    GO

    ALTER TABLE [dbo].[ERRORS] WITH CHECK ADD CONSTRAINT [FK_ERROR_ERRORApplication] FOREIGN KEY([ERRORApplicationId])

    REFERENCES [dbo].[ErrorApplication] ([Id])

    GO

    ALTER TABLE [dbo].[ERRORS] CHECK CONSTRAINT [FK_ERROR_ERRORApplication]

    GO

    Ok. Thanks.

    I guess the first thing that I'd do is make the realization that this is an "ERROR LOG" of sorts and that you don't actually want anything to prevent you from writing errors like the FK's on this table theoretically could. They slow things down a bit and you should take anything and everything that get's thrown at this table.

    The fact that your clustered index is based on UNIQUEIDENTIFIER column is pretty scary but we'll need to change the clustered index for reasons of performance anyway.

    You also have an NTEXT column, which has been deprecated for nearly a decade now. We'll change that to NVARCHAR(MAX) on the way through this. Unless you're updating that column after the error, I strongly suspect it won't matter to whatever is populating this table.

    I just got done building my normal million row test table using the DDL you provided. Still working on this but I'm out of time for the night. I'll have to pick it up again tomorrow night.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/6/2015)


    Lynn Pettis (5/6/2015)


    ramana3327 (5/6/2015)


    We are using Sql 20082 Standard edition. Data drive has enough free space but that table stores the errors that we don't required to store more than 3 months.

    To do this data archiving we need to have linked server otherwise we can't able to do that process b/n servers

    Going between servers I would use SSIS to archive the data on another server. Wouldn't require the use of linked servers.

    Jeff, I know, you would not use SSIS as you have trouble even spelling it. 😀 😉

    I might not know how to spell it but I do know that it's a 4 letter word in my book. 😉 I also love it when folks say they would use SSIS to do such-and-such but don't show how to do it. "It's easy to get to the Moon... just build a rocket ship!" :hehe:

    It's easier to post T-SQL than SSIS packages. Plus, not enough information to actually build a package for the OP. I would recommend getting a book on SSIS and work through it to build the package, that's how I learned SSIS 10 years ago, and I built some moderately complex SSIS packages while working at the school district. I haven't written any since I left the district other than some simple import/export packages for data ingests in Afghanistan that I just used the wizard to accomplish.

  • Thanks for the help.

    I don't think this table has child tables. When I check for dependencies I didn't see any other tables that depend on the error table.

    I run the below query to makesure it doesn't have child tables. It gives me only two of parent tables User & error application tables. I think we can directly delete the records from this

    DECLARE @TableName SYSNAME

    SET @TableName = 'error'

    SELECT 'Parent tables' AS TableConnectionType,

    OBJECT_SCHEMA_NAME(referenced_object_id) AS obj_schema_name,

    OBJECT_NAME(referenced_object_id) AS obj_name

    FROM sys.foreign_keys

    WHERE parent_object_id = OBJECT_ID(@TableName)

    UNION

    SELECT 'Child tables' AS TableConnectionType,

    OBJECT_SCHEMA_NAME(parent_object_id) AS obj_schema_name,

    OBJECT_NAME(parent_object_id) AS obj_name

    FROM sys.foreign_keys

    WHERE referenced_object_id = OBJECT_ID(@TableName)

  • Lynn Pettis (5/7/2015)


    Jeff Moden (5/6/2015)


    Lynn Pettis (5/6/2015)


    ramana3327 (5/6/2015)


    We are using Sql 20082 Standard edition. Data drive has enough free space but that table stores the errors that we don't required to store more than 3 months.

    To do this data archiving we need to have linked server otherwise we can't able to do that process b/n servers

    Going between servers I would use SSIS to archive the data on another server. Wouldn't require the use of linked servers.

    Jeff, I know, you would not use SSIS as you have trouble even spelling it. 😀 😉

    I might not know how to spell it but I do know that it's a 4 letter word in my book. 😉 I also love it when folks say they would use SSIS to do such-and-such but don't show how to do it. "It's easy to get to the Moon... just build a rocket ship!" :hehe:

    It's easier to post T-SQL than SSIS packages. Plus, not enough information to actually build a package for the OP. I would recommend getting a book on SSIS and work through it to build the package, that's how I learned SSIS 10 years ago, and I built some moderately complex SSIS packages while working at the school district. I haven't written any since I left the district other than some simple import/export packages for data ingests in Afghanistan that I just used the wizard to accomplish.

    Heh... you're preaching to the choir about the difficultly in posting an SSIS solution. As for getting a book and actually using SSIS, probably/thankfully never going to happen for me. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/7/2015)


    Lynn Pettis (5/7/2015)


    Jeff Moden (5/6/2015)


    Lynn Pettis (5/6/2015)


    ramana3327 (5/6/2015)


    We are using Sql 20082 Standard edition. Data drive has enough free space but that table stores the errors that we don't required to store more than 3 months.

    To do this data archiving we need to have linked server otherwise we can't able to do that process b/n servers

    Going between servers I would use SSIS to archive the data on another server. Wouldn't require the use of linked servers.

    Jeff, I know, you would not use SSIS as you have trouble even spelling it. 😀 😉

    I might not know how to spell it but I do know that it's a 4 letter word in my book. 😉 I also love it when folks say they would use SSIS to do such-and-such but don't show how to do it. "It's easy to get to the Moon... just build a rocket ship!" :hehe:

    It's easier to post T-SQL than SSIS packages. Plus, not enough information to actually build a package for the OP. I would recommend getting a book on SSIS and work through it to build the package, that's how I learned SSIS 10 years ago, and I built some moderately complex SSIS packages while working at the school district. I haven't written any since I left the district other than some simple import/export packages for data ingests in Afghanistan that I just used the wizard to accomplish.

    Heh... you're preaching to the choir about the difficultly in posting an SSIS solution. As for getting a book and actually using SSIS, probably/thankfully never going to happen for me. 🙂

    I did it because I was NOT going to modify multiple DTS packages every quarter to import test data into a data mart. It was easier to develop a small database to store test information and require the Assessment department to follow a strict file naming convention that actually provided detailed information regarding the contents of the file for processing purposes. Only thing I never got to build was the small web page that they could use to enter the test information into the database instead of me doing it quarterly.

    I wrote the SSIS packages once, and didn't have to modify them after that. They were data driven.

    The data I entered (wanted them to) was meta data basically about the assessment tests given to the kids.

  • Understood. I'd have just done it all in T-SQL. The more I see about SSIS, the less I like it. Just a personal preference for me. People that like it do well with it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply