Bulk Insert and Inserts so slow ! Any suggestions

Question

Bulk Insert and Inserts so slow ! Any suggestions

John Burchel

Hall of Fame

Points: 3611
More actions
July 7, 2006 at 3:49 am

#61925

Hi All,
I have a situation describe below:
A bulk insert job runs with the code below:
BULK INSERT [#TempHolder] FROM '\\MyPath' WITH ( FIELDTERMINATOR = '~', DATAFILETYPE= 'CHAR', ROWTERMINATOR = '\n', KEEPIDENTITY , TABLOCK )
The table is created: MedicService20060601
A Constraint is created below:
alter table MedicService20060601 add constraint Test_Const Primary Key Clustered( StartDate , ServiceType , CDRFingeUnique_Service_ID)
Then I run an Insert into the table with the code
INSERT MedicService20060601(Medic_ID,
   Unique_Service_ID,
   MedicType,
   StartDate,
   StartTime,
   ServiceType,
   ServiceStartDate,
   Location_id,Contact_ID,Controller_ID,ServiceMeth,Client_id,
   Rate_Card_ID,
   NseMedic_ID,
   Invoice_id,
   Status,
   Comp_Cost,
   Charge_Benefit)
  SELECT
   p.Medic_ID,
   p.Unique_Service_ID,
   p.MedicType,
   p.StartDate,
   p.StartTime,
   p.ServiceType,
   Convert(DateTime,p.ServiceStartDate),
   p.spLocation_id,
   p.Contact_ID,
   p.Controller_ID,
   p.spServiceMeth,
   p.spClient_id,
   p.spRate_Card_ID,
   p.spNseMedic_ID,
   p.spInvoice_id,
   p.spStatus,
   p.spComp_Cost,
   p.spCharge_Benefit
  From #TempHolder P
  Left join MedicService20060601 MyInsert ON p.ServiceStartDate = MyInsert.ServiceStartDate
         and p.MedicType = MyInsert.MedicType
         and p.Unique_Service_ID = MyInsert.Unique_Service_ID
  where MyInsert.ServiceStartDate IS NULL
  and p.MedicType & 8 <> 0
  and p.Medic_ID = 948193
  and RTRIM(Convert(Char(10), p.ServiceStartDate , 112 )) = '20060601' ORDER BY p.ServiceStartDate, p.MedicType,p.Unique_Service_ID

MedicService20060601 has the following constraints

In this scenerio, almost 15 million rows will get inserted into MedicService20060601 . but its terribly slow !

Any Suggestions welcome

Kindest Regards,
John Burchel (Trainee Developer)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Mark Harr SSCrazy Eights Points: 9921 More actions · Answer 1

I'm sure the problem is not the Bulk Insert, but the Insert..Select/join statement. That statement is joining and inserting 15 million rows in a single transaction! I suggest that you look at two options.

First, it appears you are doing the Insert..Select/join because you want to eliminate duplicates based on ServiceStartDate, MedicType, and Unique_Service_ID. How about removing those duplicate rows from the temp table by deleting them using the join. Then you can insert more cleanly by doing a straight Insert..Select without the join on 15 million rows.

But you will still have 15 million inserts in a single transaction. How about batching the inserts. Perhaps there is something in the data where you can make quick 10,000 row sets, and insert those rows wrapping BEGIN TRAN and COMMIT TRAN around them. If no appropriate data field exists to do that, add am identity column to the temp table after the import (and after the delete above), then use that to insert in 10k row transaction blocks.

Edited: I noticed your other similar posts. If this is the same issue, the problem is not BULK INSERT into a #temp table, but the logged INSERT..SELECT/join. Bulk Insert into a temp table will not be affected by primary keys on another table. Very large Bulk Inserts can be made faster using BATCHSIZE parameter to make smaller batches. INSERTS will be logged no matter what the recovery model; the inserts will be written to tlog until the transaction is completed, then if Simple recovery, the log records will be removed from the tlog. But the tlog will grow to hold those 15 million inserts, so if you are shrinking the log, SQL Server has to grow it again, extent by extent. Best is to leave the log large to hold all the transactions.

Hope this helps

Mark

John Burchel Hall of Fame Points: 3611 More actions · Answer 2

Thanks for your post Mark,

Its true that I dont want duplicates geting into the table, but the thing is that the insert goes straight to the temp table, so leaves me with no room to remove duplicates.

Also, If i were to do a transaction that inserts 10,000 rows at a time, how does that work, can you provide a sample code ?

How about the primary key Clustered index, does that stay or go ? Does it have any effect ?

Thanks in advance

Kindest Regards,

John Burchel (Trainee Developer)

Mark Harr SSCrazy Eights Points: 9921 More actions · Answer 3

Re: deleting duplicates. I mean to delete the duplicates from the temp table after the bulk insert, by performing the join.

DELETE #TempHolder 
  From #TempHolder P INNER join MedicService20060601 MyInsert 
  ON p.ServiceStartDate = MyInsert.ServiceStartDate 
         and p.MedicType = MyInsert.MedicType
         and p.Unique_Service_ID = MyInsert.Unique_Service_ID 
  where p.MedicType & 8 <> 0 
  and p.Medic_ID = 948193    
  and RTRIM(Convert(Char(10), p.ServiceStartDate , 112 )) = '20060601'

ALTER TABLE #TempHolder
  ADD COLUMN (rowid int identity(1,1))

DECLARE @range_begin int
DECLARE @range_end int

DECLARE @rowcnt int
SET @range_begin = 1
SET @range_end = 10000
SET @rowcnt = SELECT COUNT(*) from #tempholder
IF (@range_end > @rowcnt) SET @range_end = @rowcnt

WHILE @range_begin < @rowcnt
  BEGIN
    BEGIN TRAN
     INSERT MedicService20060601(Medic_ID,
       Unique_Service_ID,
       MedicType,
       StartDate, 
       StartTime,
       ServiceType,
       ServiceStartDate,
       Location_id,Contact_ID,Controller_ID,ServiceMeth,Client_id,
       Rate_Card_ID,
       NseMedic_ID,
       Invoice_id,
       Status,
       Comp_Cost,
       Charge_Benefit)
      SELECT 
       p.Medic_ID,
       p.Unique_Service_ID,
       p.MedicType,
       p.StartDate,
       p.StartTime, 
       p.ServiceType,
       Convert(DateTime,p.ServiceStartDate),
       p.spLocation_id,
       p.Contact_ID,
       p.Controller_ID,
       p.spServiceMeth,
       p.spClient_id,
       p.spRate_Card_ID,
       p.spNseMedic_ID,
       p.spInvoice_id,
       p.spStatus,
       p.spComp_Cost,
       p.spCharge_Benefit
      From #TempHolder P
      WHERE rowid BETWEEN @range_begin AND @range_end
    COMMIT TRAN
    SET @range_begin = @range_end + 1
    SET @range_end = @range_end + 10000
    IF (@range_end > @rowcnt) SET @range_end = @rowcnt
  END

This way, your INSERT is not performing a joined query just to get a set of data to insert, your #TempHolder table is all prepared ready to go. And the range variables will break this down into 10000 row batches.

RE Clustered Index: Depends on the what is the clustered index. If the clustered index is an id field or date, so that all 15 million rows are added to the end of the table, then there will likely be no effect greater than neccessary for adding table space. If the clustered index is something like last name, where the 15 million rows will get added into the middle of the existing rows, then you will likely have a lot of page splits, page data fragmentation, and very poor performance. This is certainly something to consider.

Mark

dglane SSC Eights! Points: 823 More actions · Answer 4

Mark,

We have been dealing with the batch loading issue too. I like your method.

Regarding the BEGIN/COMMIT TRAN piece, somebody suggested to us that we simply issue the CHECKPOINT statement instead after the INSERT/SELECT. Any thoughts regarding that?

Thanks much.

Bulk Insert and Inserts so slow ! Any suggestions

Cookies on SQLServerCentral