August 7, 2014 at 3:42 am
Bulk Insert works in most cases but if your operation is based on sequence, you can't rely on bulk insert as it loads the data from the file regardless of sequence. we had quiet a few issues with this.
August 7, 2014 at 6:28 am
shah.muhammad (8/7/2014)
Bulk Insert works in most cases but if your operation is based on sequence, you can't rely on bulk insert as it loads the data from the file regardless of sequence. we had quiet a few issues with this.
That's good to know. I wonder why that is?
August 7, 2014 at 7:08 am
Stan Kulp-439977 (8/7/2014)
shah.muhammad (8/7/2014)
Bulk Insert works in most cases but if your operation is based on sequence, you can't rely on bulk insert as it loads the data from the file regardless of sequence. we had quiet a few issues with this.That's good to know. I wonder why that is?
That's because relational databases don't care about physical order.
August 7, 2014 at 7:43 am
Luis Cazares (8/7/2014)
Stan Kulp-439977 (8/7/2014)
shah.muhammad (8/7/2014)
Bulk Insert works in most cases but if your operation is based on sequence, you can't rely on bulk insert as it loads the data from the file regardless of sequence. we had quiet a few issues with this.That's good to know. I wonder why that is?
That's because relational databases don't care about physical order.
Would adding an identity field to the temporary table cause the records to be inserted in order?
August 7, 2014 at 10:04 pm
shah.muhammad (8/7/2014)
Bulk Insert works in most cases but if your operation is based on sequence, you can't rely on bulk insert as it loads the data from the file regardless of sequence. we had quiet a few issues with this.
If you set the BATCHSIZE to 1 the importation of data may be slower than with a larger setting but it will be inserted into your table in the same order in which it is read from the data file. You can also use a VIEW that selects all columns except the IDENTITY column from a staging table to act as a façade for BULK INSERT.
Demo:
Open the attached zip file and save the 32MB text file inside as C:\@\bi.tbl. The first fours lines of the text file are below and are repeated 980,514 times making a file with ~39 million lines in it. You'll see in the code sample below where I check the identity assigned to each line to validate the file was inserted in order.
Joe|10
John|12
Mary|8
Jane|14
USE YourDatabaseName
GO
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'dbo.bi')
AND type IN (N'U') )
DROP TABLE dbo.bi;
GO
CREATE TABLE dbo.bi (line_number INT NOT NULL IDENTITY(1,1), name VARCHAR(100), age INT);
GO
IF EXISTS ( SELECT *
FROM sys.views
WHERE object_id = OBJECT_ID(N'dbo.bi_facade') )
DROP VIEW dbo.bi_facade;
GO
CREATE VIEW dbo.bi_facade AS SELECT name, age FROM dbo.bi;
GO
BULK INSERT YourDatabaseName.dbo.bi_facade
FROM 'C:\@\bi.tbl'
WITH
(
FIELDTERMINATOR ='|',
ROWTERMINATOR ='\n'
);
GO
-- should return 0 rows if lines in the file were inserted into the table in order
SELECT TOP 10
*
FROM dbo.bi
WHERE (
name = 'Joe'
AND line_number % 4 != 1
)
OR (
name = 'John'
AND line_number % 4 != 2
)
OR (
name = 'Mary'
AND line_number % 4 != 3
)
OR (
name = 'Jane'
AND line_number % 4 != 0
);
GO
Note: the script consistently takes about 15 seconds to run on my home machine.
As I mentioned in an earlier post, T-SQL's BULK INSERT, the SQL Server command line tool bcp.exe, the .NET Base Class Library class System.Data.SqlClient.SqlBulkCopy and SSIS's OLE DB Destination Component all connect to SQL Server's BulkCopy API so it would stand to reason that a batch size option would exposed through all of these tools, and it is. In BULK INSERT it is BATCHSIZE. In bcp.exe it is the -b option. In SqlBulkCopy it is the BatchSize property, and in SSIS's OLE DB Destination Component it is the Fast Load Option FastLoadMaxInsertCommitSize.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
August 7, 2014 at 10:08 pm
Stan Kulp-439977 (8/7/2014)
shah.muhammad (8/7/2014)
Bulk Insert works in most cases but if your operation is based on sequence, you can't rely on bulk insert as it loads the data from the file regardless of sequence. we had quiet a few issues with this.That's good to know. I wonder why that is?
It is because once a batch of rows is submitted to SQL Server it is up to the database engine to decide how that data is applied to the table. If you submit a batch of 1000 rows the engine may commit part of the batch of data before another one if it happens to split the batch and send rows to different CPUs for insert into the table. The only way to avoid data in your batches from being committed to your table in an order other than how it was read from the file is to submit a batch containing only a single row.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 6 posts - 16 through 20 (of 20 total)
You must be logged in to reply to this topic. Login to reply