How to Insert Million of records in less time ?

  • I need to insert between 1 Million to 4 million of rows into a table. Part of my process is pretty fast, but the insertion of this rows take aprox 6 minutes. I already did some modifications in order to improve the time.

    I need an advice of what is the best way to do it. My destination table is already big (aprox 49,061,939 rows). This table has 3 indexes because they are used from another process in order to speed as well. The destination table basically looks like:

    StbID int

    VersionID int

    SegId int

    Value varchar(10)

    The four fields are my primary key. What I'm inserting is exactly all those values (btw 1 to 4 million rows). So my statement looks like:

    INSERT dbo.STB with(tablock) -- 5:20 minutes

    SELECT * FROM #TempSTB WITH(NOLOCK)

    Any advice?

  • So the problem here is that the server becomes unavailable for 4-6 minutes whenever you do this?

    You could do it in batches of 30 - 50k rows which would lock the tables for only seconds at a time.

    Then wait a couple seconds in between batches to let the server catch up.

  • So the problem here is that the server becomes unavailable for 4-6 minutes whenever you do this?

    i dont really care that much if the server becomes unavailable for 4-6minutes, because they are internal processes. what I really need its to speed the insertion.

  • How long does it take to run the select alone (without insert)?

    5 mins for a couple million rows is not lightning fast but it's not crawling either by most standards.

  • actually, it takes only 22 seconds.. the problem is the insert 🙁

  • How long does it take to do a select into?

    How long does it take to insert in the base table if you disable all NC indexes?

    Does you have triggers, foreign keys, check constraints?

    From http://www.sqlskills.com/BLOGS/PAUL/post/Survey-what-is-the-highest-wait-on-your-system.aspx

    What is your biggest wait type?

    WITH Waits AS

    (SELECT

    wait_type,

    wait_time_ms / 1000.0 AS WaitS,

    (wait_time_ms - signal_wait_time_ms) / 1000.0 AS ResourceS,

    signal_wait_time_ms / 1000.0 AS SignalS,

    waiting_tasks_count AS WaitCount,

    100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage,

    ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS RowNum

    FROM sys.dm_os_wait_stats

    WHERE wait_type NOT IN (

    'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK',

    'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE',

    'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH',

    'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE',

    'FT_IFTS_SCHEDULER_IDLE_WAIT', 'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'BROKER_EVENTHANDLER',

    'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP')

    )

    SELECT

    W1.wait_type AS WaitType,

    CAST (W1.WaitS AS DECIMAL(14, 2)) AS Wait_S,

    CAST (W1.ResourceS AS DECIMAL(14, 2)) AS Resource_S,

    CAST (W1.SignalS AS DECIMAL(14, 2)) AS Signal_S,

    W1.WaitCount AS WaitCount,

    CAST (W1.Percentage AS DECIMAL(4, 2)) AS Percentage

    FROM Waits AS W1

    INNER JOIN Waits AS W2

    ON W2.RowNum <= W1.RowNum

    GROUP BY W1.RowNum, W1.wait_type, W1.WaitS, W1.ResourceS, W1.SignalS, W1.WaitCount, W1.Percentage

    HAVING SUM (W2.Percentage) - W1.Percentage < 95; -- percentage threshold

    GO

  • Start with the answers to Ninja's last questions.

    I am curious to know how well it performs without your query hints?

    Have you constructed out the entire Insert / Select statement using actual column names in both parts in lieu of the * ?

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • How long does it take to do a select into?

    it takes 2 seconds

    How long does it take to insert in the base table if you disable all NC indexes?

    It takes 4:20 min plus 32 seconds to rebuild indexes, almost 5 minutes. (not big difference)

    Does you have triggers, foreign keys, check constraints?

    no, I dont. I have 1 primary key and 2 Nc indexes

    primary key:

    StbID int

    VersionID int

    SegId int

    Value varchar(10)

    index one:

    VersionId

    index two:

    StbId, SegID

    there is not difference if I indicate the columns to be inserted (instead the *)

  • This probably isn't it, but what is the size of the database and/or log file? Is it growing the files every time you run this insert?

  • MTY-1082557 (11/24/2010)


    How long does it take to do a select into?

    it takes 2 seconds

    How long does it take to insert in the base table if you disable all NC indexes?

    It takes 4:20 min plus 32 seconds to rebuild indexes, almost 5 minutes. (not big difference)

    Does you have triggers, foreign keys, check constraints?

    no, I dont. I have 1 primary key and 2 Nc indexes

    primary key:

    StbID int

    VersionID int

    SegId int

    Value varchar(10)

    index one:

    VersionId

    index two:

    StbId, SegID

    there is not difference if I indicate the columns to be inserted (instead the *)

    What's the results of the wait types query I posted? The top 4-5 rows will give us an idea of what might be wrong on the server.

    Other things that may slow you down... unlikely but possible suspects :

    db file growth as already mentionned?

    What's the autogrow setting for the datafile (% / mb and how much)

    change tracking?

    Row version store??

    auto_clause or auto_shrink DB on?

    Is tempdb short on ressources?

    Do you have a big job or backup that is running at the same time you are testing this (yeah I know it's been 8 hours... but just in case).

    Is this adhoc statement or SP? Have you tried flushing the proc cache?

    Are the stats up to date?

    Anti-virus scanning the db files?

    Screensaver or anything else hitting the server's CPU at 100% (yes true story).

    Did you show us the actual code or a modified version? Maybe we're not seeing something there.

    When doing select into, are you perharps creating the table on a different filegroup on a different disk subsystem?

  • MTY-1082557 (11/24/2010)


    How long does it take to do a select into?

    it takes 2 seconds

    How long does it take to insert in the base table if you disable all NC indexes?

    It takes 4:20 min plus 32 seconds to rebuild indexes, almost 5 minutes. (not big difference)

    Does you have triggers, foreign keys, check constraints?

    no, I dont. I have 1 primary key and 2 Nc indexes

    primary key:

    StbID int

    VersionID int

    SegId int

    Value varchar(10)

    index one:

    VersionId

    index two:

    StbId, SegID

    there is not difference if I indicate the columns to be inserted (instead the *)

    Is the table heavily fragmented?

    Does the new data insert at the end of the table / indexes or all over the place?

    What's the fill factor for all indexes? and have you tried changing it to something less aggressive?

  • MTY-1082557 (11/24/2010)


    I need to insert between 1 Million to 4 million of rows into a table. Part of my process is pretty fast, but the insertion of this rows take aprox 6 minutes. I already did some modifications in order to improve the time.

    I need an advice of what is the best way to do it. My destination table is already big (aprox 49,061,939 rows). This table has 3 indexes because they are used from another process in order to speed as well. The destination table basically looks like:

    StbID int

    VersionID int

    SegId int

    Value varchar(10)

    The four fields are my primary key. What I'm inserting is exactly all those values (btw 1 to 4 million rows). So my statement looks like:

    INSERT dbo.STB with(tablock) -- 5:20 minutes

    SELECT * FROM #TempSTB WITH(NOLOCK)

    Any advice?

    I just noticed that you have the TABLOCK hint on the target table. That will make the code wait until it can get the TABLOCK. The only way that it can get the TABLOCK is when no other users are using the table. It could wait for 2 weeks under the right conditions.

    Obviously, my recomendation will be to remove the TABLOCK hint.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 1. i am second to ninja'a first suggestion to do it in batch format

    2. how much data will be migrated ? huge data.....longer time.

    3. also backup the log on small periods.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Remove indexes before insertion, then recreate indexes.

  • wu.wuqiang (11/26/2010)


    Remove indexes before insertion, then recreate indexes.

    That frequently does nothing for the amount of time it takes because reindexing can take a lot of time. In fact, if you look back in this thread, the OP tried that and it made no significant improvement.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply