Inserting Millions Of records

  • Hi,

    We have a requirement to Merge six 12millions data tables into single Table.All the tables are having duplicate Emails.So We put the Primary key for email column using EnterPrise Manager and We r trying to merge by writing the following insert query, But it's taking more than 20hrs...:-(

    Insert into Total( Email,FirstName,LastName,Address,City,State,Zipcode,Gender,DOB ,

    Phone,WebAddress1,Webaddress2,IPAddress,DateTime,Interest )

    Select Email,FirstName,LastName,Address,City,State,Zipcode,Gender,DOB ,

    Phone,WebAddress1,Webaddress2,IPAddress,DateTime,Interest

    From [Set4]

    Where

    (not exists(select email from Total

    where Total.email=[Set4].email))

    How can we do this most efficiently??? Please Help Us......

    Thanks in advance...

  • One solution is to make use of SSIS data flow tasks.

    Use your select queries as source and in the destination connection specify the commit size as 100 thousand, then after every 100 thousand rows it will commit.

    -Vikas Bindra

  • Hi,

    Could u pls tell us detailed procedure for that as we r new to SQL SERVER..

    Thnx in advance...

  • A data flow task in SSIS can have multiple data sources (you multiple tables/queries), transformation and then single or multiple data destinations.

    Read the about "data flow" in BOL, Google it,

    Read the creating section and try creating SSIS package in visual studio. Read the OLE DB source and OLE DB destination in specific.

    And let us know the issues you face

    -Vikas Bindra

  • One more advise, don't have any indexes on the destination table when you are loading the data. Once you done with loading create the constraints and indexes you want.

    This will speedup you load.

    -Vikas Bindra

  • You may want to try this, and you may also want to be sure that you have an index on each of the table on the column email.

    insert into Total(

    Email,

    FirstName,

    LastName,

    [Address],

    City,

    [State],

    Zipcode,

    Gender,

    DOB,

    Phone,

    WebAddress1,

    Webaddress2,

    IPAddress,

    [DateTime],

    Interest

    )

    Select

    s.Email,

    s.FirstName,

    s.LastName,

    s.[Address],

    s.City,

    s.[State],

    s.Zipcode,

    s.Gender,

    s.DOB,

    s.Phone,

    s.WebAddress1,

    s.Webaddress2,

    s.IPAddress,

    s.[DateTime],

    s.Interest

    from

    [Set4] s

    left outer join [Total] t

    on (s.email = t.email)

    where

    t.email is null;

    Edit: Changed inner join to left outer join.

  • Lynn Pettis (11/17/2009)


    You may want to try this, and you may also want to be sure that you have an index on each of the table on the column email.

    insert into Total(

    Email,

    FirstName,

    LastName,

    [Address],

    City,

    [State],

    Zipcode,

    Gender,

    DOB,

    Phone,

    WebAddress1,

    Webaddress2,

    IPAddress,

    [DateTime],

    Interest

    )

    Select

    s.Email,

    s.FirstName,

    s.LastName,

    s.[Address],

    s.City,

    s.[State],

    s.Zipcode,

    s.Gender,

    s.DOB,

    s.Phone,

    s.WebAddress1,

    s.Webaddress2,

    s.IPAddress,

    s.[DateTime],

    s.Interest

    from

    [Set4] s

    inner join [Total] t

    on (s.email = t.email)

    where

    t.email is null;

    Did you wanted to say LEFT OUTER JOIN instead of INNER JOIN? as INNER JOIN is not making sense to me or I am misunderstanding your query.

    -Vikas Bindra

  • I think that is left outer join instead of inner join

  • Good catch, yes that should be a left outer join. Not enough "coffee" this morning. Getting into work at 6:00 AM local time starting this week and obviously not fully awake yet.

  • If it is b'coz of index

    Can I recreate like this and insert the rows

    alter table dbo.Total add constraint

    pk_test primary key nonclustered (Email)

    Pls suggest me...

  • DBTeam (11/17/2009)


    If it is b'coz of index

    Can I recreate like this and insert the rows

    alter table dbo.Total add constraint

    pk_test primary key nonclustered (Email)

    Pls suggest me...

    If you use the Lynn method then yes you need index on email columns on all source tables and on the destination table as suggested by Lynn.

    And yes you can create the index with the query that you have specified above.

    -Vikas Bindra

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply