bulk insert text file into two tables

Question

bulk insert text file into two tables

brenda meza

SSC Rookie

Points: 25
More actions
April 21, 2006 at 5:56 pm

#93958

I have a text file and I want to use bulk insert, I have header records and detail records, how can I make a difference to insert in boot tables;the text file can have lot a heaaders and details, "^" its then char delimiter; Whic its the better way to do it?
example text file
H^MP^319267228^17-FEB-06^^^^GXB -- its header row invoices
D^GXB10269^US^23^EA^^^^.45632^^MX^^^^-- has two details inv_detail
D^GXB10269^US^23^EA^^^^.45632^^MX^^^^
H^MP^319310332^27-FEB-06^^^^GBX -- an another header row
D^GBX10269^US^4^EA^^^^.07936^^MX^^^^ -- detail

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004447 More actions · Answer 1

Brenda,

Sorry for the delay... it's a tough question and deserves a good answer. Undoubtably, you've had a thousand people tell you that this can't or shouldn't be done in SQL. This is why folks like XML so much... handles header/detail so very easily. That would be the subject of another post but I've found that you can do just about everything you need, so far as data manipulation in SQL, and this is no exception.

Neither BCP nor Bulk Insert deal very well with mixed records when the records have different number of delimiters for each record type. SQL Server also doesn't have a record "splitter" so we'll need to make one. To make a record splitter, we also need a "Tally" table which is nothing more than a relatively short table with a single column of sequential numbers.

So here's what we'll do...

Create a "Tally" table. (ONE TIME CODE. Don't include in stored proc)
Create a record splitter based on the Tally table. (ONE TIME CODE. Don't include in stored proc)
Import all of the records into a "flat" staging table. (Include in stored proc)
Parse the Header records into a Header table. (Include in stored proc)
Parse the Detail records into a Detail table. (Include in stored proc)

Item 1, create the Tally table. Here's the code and it runs very quickly. If it doesn't run nearly instantaneously, something is wrong and you need to stop the query. I make it go from 1 to 9999 even though VARCHAR (things to be split) has a limit of 8000 for other uses like generating 4 digit random numbers and the like.

--===== Create a Tally table of numbers

SELECT TOP 9999

IDENTITY(INT,1,1) AS N

INTO dbo.Tally

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

--===== Add a clustered primary key for speed

ALTER TABLE dbo.Tally

ADD CONSTRAINT PK_Tally_N

PRIMARY KEY CLUSTERED (N)

--===== Grant access to the table

GRANT SELECT ON dbo.Tally TO PUBLIC

That wasn't so painful. Notice that the cross-join in the code was intentional... saves a lot of time and code doing it that way. The code makes a "basis" for making larger temporary tables to solve all sorts of other problems including finding missing dates, etc. It'll make a million row table in about 4 seconds. 9999 has about 27 years worth of days, though, and most people don't need more than that. Again, possible subjects of other posts.

Item 2, Create a record splitter. This is in the form of a convenient reusable function. There are faster methods but they take way more code... sometimes convenience, readability of code, and consistency of use make funtions the way to go even though they are sometimes slower than dedicated code. Again, subject of another post. It would take too long to explain what's going on in this puppy so I'll leave the analysis of the code up to you. Here's the fully documented production code I use (and it's the ONLY place where I have to use a (yech!) table variable)...

CREATE FUNCTION dbo.fSplit

/****************************************************************************************

Purpose:

This function splits names and strings of words based on a delimiter of up to 255

characters.

Notes:

1. Max length of string that can be split is 8000-(Length of delimiter X 2)

2. Designed for use on MS-SQL Server 2000 and MSDE 2000.

3. Will error under the following conditions:

a. Length of delimiter <1 or >255

b. Length of string + (Length of delimiter X 2) > 8000

4. Unexpected results will occur if a leading or trailing delimiter is included in the

string to be parsed

5. Returns a blank (space) when multiple delimiters are encountered (ie. 'Jeff,,Moden'

would return a blank as the 2nd word).

Usage:

dbo.fSplit(StringToBeSplit,SegmentToReturn,DelimiterString)

-----------------------------------------------------------------------------------------

Revisions:

Rev 00 - 06/05/2005 - Jeff Moden - Initial creation and test (single space delimiter)

EMAIL: jbmoden@ameritech.net (has nasty spam filters)

Rev 01 - 06/06/2005 - Jeff Moden - Add delimiter character as an input

Rev 02 - 08/17/2005 - Jeff Moden - Increase possible size of delimiter to 10 characters

Rev 03 - 09/03/2005 - Jeff Moden - Increase possible size of delimiter to 255 characters

****************************************************************************************/

--===== Define the input parameters

(

@pString VARCHAR(8000), --String to be split

@pWordNum SMALLINT, --Number of the word to return

@pDelim VARCHAR(255) --Delimiter to base the split on (max is 255 or BOOM!)

)

--===== Define the return type

RETURNS VARCHAR(8000)

AS

BEGIN

--======================================================================================

-- Body of function

--======================================================================================

--===== Declare the local variables

DECLARE @LenDelim TINYINT --Holds length of delimiter (max is 255 or BOOM!)

DECLARE @Words TABLE

(

WordNum INT IDENTITY(1,1), --The number of the word that has been split out

Word VARCHAR(8000) --The word that has been split out

)

--===== Determine length of delimiter. Had to do this way because spaces have 0 for a

-- value of LEN

SET @LENDelim = LEN('|'+@pDelim+'|')-2

--===== Prepare the string to be split for parsing by adding a delimiter to each end

SET @pString = @pDelim+@pString+@pDelim

--===== Do the parsing and store the parsed words in the table variable

INSERT INTO @Words (Word)

SELECT SUBSTRING(@pString, --What to split

N+@LenDelim, --Where to start

CHARINDEX(@pDelim,@pString,N+@LenDelim)-@LenDelim-N --Length

)

FROM dbo.Tally

WHERE SUBSTRING(@pString,N,@LenDelim) = @pDelim --Finds leading delimiters

AND N < LEN(@pString)-@LenDelim --except the last 1

ORDER BY N --Force the order of words or segments to be correct

--===== Return the word indicated by the input parameter

RETURN (SELECT Word

FROM @Words

WHERE WordNum = @pWordNum)

--======================================================================================

-- End of function

--======================================================================================

END

GO

Item 3, Import all the records into a "Flat" staging table. This is where most folks raise an eyebrow at me. Why a "flat" single column table and then parse the records? Because it's (1) nasty fast, (2) very easy to write and maintain, (3) it allows me to do almost everything set based. This Bulk Insert will import a million records in just a couple of seconds. I use a temp table as the staging table just because it's my nature and my DBA's have made TempDB nice and large just to handle such things (another possibile subject for another post). You could certainly use a permanent table. DO NOT use a table variable for this... table variables do not use statistics which makes them damned slow for large record sets. All the other bad rumors you heard about the use of Temp tables are simply not true. Don't take my word for it, though... see the following URL...

http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k

In order for this to work, we have to have an IDENTITY column on the Staging table to preserve the order that the records are inserted. In order to do that, BULK INSERT is going to need a "BCP FORMAT" file so we can ignore the identity column. Here's what the file should look like...

8.0

2

1 SQLCHAR 0 0 "" 0 IgnoreMe ""

2 SQLCHAR 0 400 "\n" 2 RawData ""

... and it should probably be stored in the same directory as your raw import files...

...and here's the code for Item 3...

--===== If the temporary "staging" table already exists, drop it

IF OBJECT_ID('TempDB..#Staging') IS NOT NULL

DROP TABLE #Staging

--===== Create a temporary "staging" table to hold the flat import

CREATE TABLE #Staging

(

RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

RawData VARCHAR(400)

)

--===== Import the raw data from the file into the staging table

BULK INSERT #Staging

FROM 'C:\Temp\BulkInsertTest.txt' --<< CHANGE THIS TO MATCH YOUR SITUATION!

WITH (

DATAFILETYPE = 'CHAR',

FORMATFILE = 'C:\Temp\BulkInsertTest.fmt', --<< CHANGE THIS TO MATCH YOUR SITUATION!

BATCHSIZE = 50000

)

Items 4 and 5 are nearly identical. Since I don't know what your Header or Detail tables look like, I made my own for this little experiment just to make sure the code works correctly... this code is NOT part of anything you'd need to do as your tables already exist...

--===== Create a temp table to hold header info for this experiment

-- Added a HeaderID column to ensure uniqueness of records

-- Does not include the "H" record type

IF OBJECT_ID('TempDB..#Header') IS NOT NULL

DROP TABLE #Header

CREATE TABLE #Header

(

HeaderID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,

HdrCol02 VARCHAR(10),

PhoneNum CHAR(10), --A guess on my part

XactDate DATETIME, --A guess on my part

HdrCol05 VARCHAR(10),

HdrCol06 VARCHAR(10),

HdrCol07 VARCHAR(10),

HdrCol08 VARCHAR(10)

)

--===== Create a temp table to hold detail info for this experiment

-- Added a DetailID column to ensure uniqueness of records

-- Does not include the "D" record type

IF OBJECT_ID('TempDB..#Detail') IS NOT NULL

DROP TABLE #Detail

CREATE TABLE #Detail

(

DetailID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,

ServiceType VARCHAR(10), --Guess on my part

Country CHAR(2), --Guess on my part

Minutes INT, --Guess on my part

TimeZone VARCHAR(10), --Guess on my part

DtlCol06 VARCHAR(10),

DtlCol07 VARCHAR(10),

DtlCol08 VARCHAR(10),

Cost DECIMAL(10,5), --Guess on my part

DtlCol10 VARCHAR(10),

DtlCol11 VARCHAR(10),

DtlCol12 VARCHAR(10),

DtlCol13 VARCHAR(10),

DtlCol14 VARCHAR(10),

DtlCol15 VARCHAR(10),

HeaderID INT --<<Looky, looky! MAKE SURE YOU HAVE THIS!!

)

Of course, your's is probably different... just bear with me for the experiment...

The following code is for Items 4 and 5 which does the actual import... the big problem to overcome with the data you posted is that we have to have some way to identify which detail records go with which header records once they're parsed and packed away in a table. That means we have to find each header record, post it to the header table, get the HeaderID we just posted, find all the detail records for that header, and post the detail records with the HeaderID... the following code does ALL of that... of course, you would need to change some column names and table names to match your Header and Detail tables...

--=====Declare local variables

DECLARE @CurrentHeader INT

DECLARE @NextHeader INT

DECLARE @MaxDetail INT

DECLARE @HeaderID INT

--===== Suppress the auto-display of rowcounts for appearance and speed

SET NOCOUNT ON

--===== Find the max detail record RowNum

SELECT @MaxDetail = MAX(RowNum)

FROM #Staging

WHERE dbo.fnSplit(RawData,1,'^') = 'D'

--===== Find the first header

SET @CurrentHeader = 0

SELECT @NextHeader = MIN(RowNum)

FROM #Staging

WHERE dbo.fnSplit(RawData,1,'^') = 'H'

AND RowNum > @CurrentHeader

--===== Parse the data by finding each header and the

-- detail that goes with it

WHILE @NextHeader >0

BEGIN

--===== Make the previous next header, the current header

SET @CurrentHeader = @NextHeader

--===== Parse the header info into the Header table

INSERT INTO #Header

(HdrCol02,PhoneNum,XactDate,HdrCol05,HdrCol06,HdrCol07,HdrCol08)

SELECT NULLIF(dbo.fnSplit(RawData,2,'^'),'') AS HdrCol02,

NULLIF(dbo.fnSplit(RawData,3,'^'),'') AS PhoneNum,

NULLIF(dbo.fnSplit(RawData,4,'^'),'') AS XactDate,

NULLIF(dbo.fnSplit(RawData,5,'^'),'') AS HdrCol05,

NULLIF(dbo.fnSplit(RawData,6,'^'),'') AS HdrCol06,

NULLIF(dbo.fnSplit(RawData,7,'^'),'') AS HdrCol07,

NULLIF(dbo.fnSplit(RawData,8,'^'),'') AS HdrCol08

FROM #Staging

WHERE RowNum = @CurrentHeader

--===== Remember the ID of the header record we just inserted

SET @HeaderID = SCOPE_IDENTITY()

--===== Find the next header

SET @NextHeader = NULL --Forces a null if we don't find anything left

SELECT @NextHeader = MIN(RowNum)

FROM #Staging

WHERE dbo.fnSplit(RawData,1,'^') = 'H'

AND RowNum > @CurrentHeader

--===== Parse and insert the detail records for the current header

-- including the ID of the the related header record

INSERT INTO #Detail

(

ServiceType, Country, Minutes, TimeZone, DtlCol06,

DtlCol07, DtlCol08, Cost, DtlCol10, DtlCol11,

DtlCol12, DtlCol13, DtlCol14, DtlCol15, HeaderID

)

SELECT NULLIF(dbo.fnSplit(RawData, 2,'^'),'') AS ServiceType,

NULLIF(dbo.fnSplit(RawData, 3,'^'),'') AS Country,

NULLIF(dbo.fnSplit(RawData, 4,'^'),'') AS Minutes,

NULLIF(dbo.fnSplit(RawData, 5,'^'),'') AS TimeZone,

NULLIF(dbo.fnSplit(RawData, 6,'^'),'') AS DtlCol06,

NULLIF(dbo.fnSplit(RawData, 7,'^'),'') AS DtlCol07,

NULLIF(dbo.fnSplit(RawData, 8,'^'),'') AS DtlCol08,

NULLIF(dbo.fnSplit(RawData, 9,'^'),'') AS Cost,

NULLIF(dbo.fnSplit(RawData,10,'^'),'') AS DtlCol10,

NULLIF(dbo.fnSplit(RawData,11,'^'),'') AS DtlCol11,

NULLIF(dbo.fnSplit(RawData,12,'^'),'') AS DtlCol12,

NULLIF(dbo.fnSplit(RawData,13,'^'),'') AS DtlCol13,

NULLIF(dbo.fnSplit(RawData,14,'^'),'') AS DtlCol14,

NULLIF(dbo.fnSplit(RawData,15,'^'),'') AS DtlCol15,

@HeaderID AS HeaderID

FROM #Staging

WHERE RowNum BETWEEN @CurrentHeader+1

AND ISNULL(@NextHeader-1,@MaxDetail)

END --End of While loop

...And now you know why some folks spend lots of money on 3rd party solutions to do this type of stuff. Some also use DTS (comes with SQL Server) only to turn around and write some VBScript to do most of what was done in the code above. Others spend pot wads of money on genereal purpose ETL (Extract, Translate, Load) programs and they really don't run any faster. They are, however a little easier to use for non-SQL-Developers.

Any questions?

Send beer, I already have enough pretzels

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004447 More actions · Answer 2

Ya know, Brenda... a little feed back on this extent of this type of help would have been appreciated. At least let me know how it worked out for you

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David McFarland SSChampion Points: 11815 More actions · Answer 3

Jeff, that answer was detailed enough that I'll buy you a beer, if you're ever in the Atlanta area, even though it wasn't my question.

Jeff Moden SSC Guru Points: 1004447 More actions · Answer 4

Heh... Thanks David...

I've been doing a lot of BCP work lately and ran into this ol' post of mine... Yeah, I know it's a forum and we shouldn't expect it... but it just struck me as a bit wrong that folks cough up detailed answers like this and never even get acknowledgment that the orginal poster even read it.

Thanks, again.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)