Bulk Insert

  • Hi,

    I want to insert millions of record into a table. That table already has records. I have 4 composite keys for my table. I have 2nd table. I want to insert the data of my 2nd table into 1st table. But I should avoid the records having the same data into composite keys. When I am trying this with bulk insert it gives me primary key violation error and ends my script. Can anyone help me to resolve this. Basically I want to avoid duplication of records in a table by using bulk insert script.

    Any help will be appreciated....

    Thanks in Advance !!

    Pravin

  • Hi

    Create a temporary table and insert records in to this table...

    compare both the table and remove duplicates from the temporary table.

    do insert from temporary table to your destination table.

    thanks

    jaypee.s

  • Hi Pravin,

    I hope following steps would help you.

    1. Write BULK insert script to load data in 2nd table. (Assuming this table is heap)

    2. Create Non unique clustered/non-clustered index on key columns

    to make sure you have optimized execution plan when u perform step 3.

    Cleans data as per your business needs. this just to make sure your 2nd table

    can be merged with 1st table.

    3.

    Insert INTO 1Sttable

    Select bla...bla

    FROM 1stTable a LEFT JOIN 2ndTABLE b

    on a.Key1 = b.key1 and

    a.Key2 = b.key2 and

    a.Key3 = b.key3 and

    a.Key4 = b.key4 and

    Where b.key1 is null and

    b.key2 is null and

    b.key3 is null and

    b.key4 is null and

    4. Truncate 2ndTABLE.

    Please let me know if you need exact script for resolving this issue.

    Regards
    Shrikant Kulkarni

  • I will just modify the step 3 of the previous post..

    Since you are inserting a million records and if your server is hard pressed for resources you can insert the rows in batches. This will reduce lock contention also.

    Either modify the select query and run it multiple times or use BULK INSERT where you can specify the batch size. Batch size will have to be determined with regard to server activity.

    "Keep Trying"

  • Hi Shrikant,

    Thanks for your reply.

    I have already used that logic for the merging of the recordss. The problem is it is utilising 100% cpu.

    But I think we can use bulk insert query with the check constrains and can do the same very fastly.

    I am trying for that. If i will get any appropraite solution I will post it here...Please reply me if you have any idea about bulk insert or any other way to do the faster execution for this.

    Once again Thanks...

    Pravin

    (India)

  • Hi Chirag,

    Can you post some example how can we use bulk insert ??

    Thanks for the reply...

    Best Regards,

    --Pravin

  • Pravin Vetal (6/10/2008)


    Hi Shrikant,

    Thanks for your reply.

    I have already used that logic for the merging of the recordss. The problem is it is utilising 100% cpu.

    But I think we can use bulk insert query with the check constrains and can do the same very fastly.

    I am trying for that. If i will get any appropraite solution I will post it here...Please reply me if you have any idea about bulk insert or any other way to do the faster execution for this.

    Once again Thanks...

    Pravin

    (India)

    One of the ways to improve your inserts is to divide your large file into multiple files and then run multiple instances of BULK INSERT in pipeline/parallel (if possible). We had this approach in one our projects which improved INSERT performance a lot. I would still suggest you same approach since your table is non-empty, so the cost of re validating the constraint may exceed the cost of applying CHECK constraints to the incremental data. UNIQUE, PRIMARY KEY, and NOT NULL constraints are always enforced, so you will have no control over not inserting dups in the table. (Check out BOL). About Insert statement, there are many reasons of CPU being as high as 100% when u run that query. Can you please post T-SQL and execution plan?

    Regards
    Shrikant Kulkarni

  • I can use bulk insert but for that we need to create data file(.txt, .csv)

    by using bcp command...again need to split the data into different files..that is not much easier way for me. Our left join logic is really awesome...I will again check that and find out the cpu utilization reason. The main condition is we already have data in 1 table and need to insert the data into same table by checking the composite keys....

    Please send me If you have some examples for the same to get more idea how did u use bulk insert for it....

    I also tried for union but was getting some problem......

    Thanks for ur help Shrikant....

    With Regards,

    Pravin

  • You can have a simple PERL script for breaking file into multiple files. Create a new file everytime the number of lines read exceed 10,000 or so and then get this logic run in the loop.

    Regards
    Shrikant Kulkarni

  • Hi Pravin

    There are many expamples of Bulk Insert in BOL and the net.

    The operator that i was talking abt is "BatchSize".

    I think Shrikant has some good suggestions but iam not so sure about creating diff files.

    "Keep Trying"

  • Hi Shrikant,

    We have 2 different servers. On these servers we have same named database[assume RECORD is DB name]. Ideally the data of log_data table in RECORD database should be same on both the servers but sometimes some data miss on local server but present on remote node and vice versa. In this case I want to transfer the only missed records from server to servers. I have linked these 2 servers and used the following query. But it is taking much time as we have millions of the records in a table. I tried to batched it on the basis of time but didnt increase the performance.

    Please suggest me regarding this.

    Let me know if we can have some other way for doing this.

    Thanks in advance !!!

    SELECT @EXECSQL = 'INSERT INTO '+@DBNAME+'..LOG_DATA SELECT PR.*

    FROM '+@REMOTE_NODE_NAME+'.'+@DBNAME+'.dbo.LOG_DATA PR

    left JOIN '+@DBNAME+'..LOG_DATA p

    ON p.ld_sequence_number = pr.ld_sequence_number

    AND p.ld_source_proc_node = pr.ld_source_proc_node

    AND p.ld_rec_datetime = pr.ld_rec_datetime

    AND p.ld_cpu_number = pr.ld_cpu_number

    WHERE P.ld_cpu_number IS NULL

    AND p.ld_sequence_number IS NULL

    AND p.ld_rec_datetime IS NULL

    AND p.ld_source_proc_node IS NULL

    AND pr.ld_rec_datetime >= ''' + cast(@start_date_time as varchar) + '''

    AND pr.ld_rec_datetime < ''' + cast(dateadd(second,@BATCH_JOB_PERIOD_IN_SECS,@start_date_time) as varchar) + ''''

    print @EXECSQL

    exec(@EXECSQL)

    Thanks and Regards,

    Pravin Vetal.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply