January 5, 2013 at 6:30 am
Hello,
I am importing a flat file from an external data source that does not have an identity column or key of any sort. The first row in the flat file is always the column header and I want to ensure that I can identify that row after importing. The order of insertion for the remaining rows does not matter. Speed is not a primary consideration here.
My understanding is that there is no guarantee that a bulk insert would always insert the records in the same order as the flat file and future SQL server engine updates or unusual database design choices could cause rows to be inserted in a different order.
If I set the 'FIRSTROW' and 'LASTROW' arguments to 1, is it guaranteed that the single record inserted will be the first row in the flat file?
If so, the code I have in mind to grab just the header data and then keep track of it would be as follows (the rest of the data would be imported in a separate step):
--Bulk insert the column header data using Firstrow and Lastrow arguments set to 1
BULK INSERT
MyDatabase..MyDestinationTable
FROM 'C:\SourceFolder\FlatFile.txt'
WITH
(
FIRSTROW = 1
,LASTROW = 1
)
-- Add identity column to bulk inserted table.
ALTER TABLE MyDestinationTable
ADD [IdentityColumn] INT IDENTITY
GO
I would hope that after this insert, since there is only 1 record in the table which is the row that contains column headers, I can identify this later by selecting from the table the record which has IdentityColumn = 1.
My only concern is that since I am relatively new to T-SQL, there may be something I am missing about how the Firstrow and Lastrow arguments work and I may pick up a different record from the flat file than the first one during the bulk insert. The bit that worries me in the bulk insert syntax is:
The FIRSTROW attribute is not intended to skip column headers. Skipping headers is not supported by the BULK INSERT statement. When skipping rows, the SQL Server Database Engine looks only at the field terminators, and does not validate the data in the fields of skipped rows
January 5, 2013 at 1:32 pm
If you set the BATCHSIZE to 1 on BULK INSERT you'll get the rows from the file into the database in the order they appear in the file. It'll be slower than it could potentially be but you said speed is not a primary concern.
Another technique I have used is to BULK INSERT into a VIEW where the VIEW has all columns in the file allowing me to have columns in the underlying table that will not affect the BULK INSERT, i.e. up to everything in the staging minus the IDENTITY.
Like this:
CREATE TABLE dbo.Staging1
(
StagingId INT IDENTITY(1, 1),
Something VARCHAR(100),
);
GO
CREATE VIEW dbo.Staging1Loader
AS
SELECT Something
FROM Staging1;
GO
-- bulk insert into the view
BULK INSERT
dbo.Staging1Loader
FROM 'C:\@\FlatFile.txt'
WITH (
BATCHSIZE = 1 -- guarantees the lines in the file will be committed to the table as they are read, one at a time
);
GO
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 11, 2013 at 4:35 am
Okay great, thanks for your help opc.three, I appreciate it.
January 11, 2013 at 4:50 am
You're quite welcome.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply