April 21, 2006 at 5:56 pm
I have a text file and I want to use bulk insert, I have header records and detail records, how can I make a difference to insert in boot tables;the text file can have lot a heaaders and details, "^" its then char delimiter; Whic its the better way to do it?
example text file
H^MP^319267228^17-FEB-06^^^^GXB -- its header row invoices
D^GXB10269^US^23^EA^^^^.45632^^MX^^^^-- has two details inv_detail
D^GXB10269^US^23^EA^^^^.45632^^MX^^^^
H^MP^319310332^27-FEB-06^^^^GBX -- an another header row
D^GBX10269^US^4^EA^^^^.07936^^MX^^^^ -- detail
April 23, 2006 at 12:02 pm
Brenda,
Sorry for the delay... it's a tough question and deserves a good answer. Undoubtably, you've had a thousand people tell you that this can't or shouldn't be done in SQL. This is why folks like XML so much... handles header/detail so very easily. That would be the subject of another post but I've found that you can do just about everything you need, so far as data manipulation in SQL, and this is no exception.
Neither BCP nor Bulk Insert deal very well with mixed records when the records have different number of delimiters for each record type. SQL Server also doesn't have a record "splitter" so we'll need to make one. To make a record splitter, we also need a "Tally" table which is nothing more than a relatively short table with a single column of sequential numbers.
So here's what we'll do...
Item 1, create the Tally table. Here's the code and it runs very quickly. If it doesn't run nearly instantaneously, something is wrong and you need to stop the query. I make it go from 1 to 9999 even though VARCHAR (things to be split) has a limit of 8000 for other uses like generating 4 digit random numbers and the like.
--===== Create a Tally table of numbers
SELECT TOP 9999
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a clustered primary key for speed
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N)
--===== Grant access to the table
GRANT SELECT ON dbo.Tally TO PUBLIC
That wasn't so painful. Notice that the cross-join in the code was intentional... saves a lot of time and code doing it that way. The code makes a "basis" for making larger temporary tables to solve all sorts of other problems including finding missing dates, etc. It'll make a million row table in about 4 seconds. 9999 has about 27 years worth of days, though, and most people don't need more than that. Again, possible subjects of other posts.
Item 2, Create a record splitter. This is in the form of a convenient reusable function. There are faster methods but they take way more code... sometimes convenience, readability of code, and consistency of use make funtions the way to go even though they are sometimes slower than dedicated code. Again, subject of another post. It would take too long to explain what's going on in this puppy so I'll leave the analysis of the code up to you. Here's the fully documented production code I use (and it's the ONLY place where I have to use a (yech!) table variable)...
CREATE FUNCTION dbo.fSplit
/****************************************************************************************
Purpose:
This function splits names and strings of words based on a delimiter of up to 255
characters.
Notes:
1. Max length of string that can be split is 8000-(Length of delimiter X 2)
2. Designed for use on MS-SQL Server 2000 and MSDE 2000.
3. Will error under the following conditions:
a. Length of delimiter <1 or >255
b. Length of string + (Length of delimiter X 2) > 8000
4. Unexpected results will occur if a leading or trailing delimiter is included in the
string to be parsed
5. Returns a blank (space) when multiple delimiters are encountered (ie. 'Jeff,,Moden'
would return a blank as the 2nd word).
Usage:
dbo.fSplit(StringToBeSplit,SegmentToReturn,DelimiterString)
-----------------------------------------------------------------------------------------
Revisions:
Rev 00 - 06/05/2005 - Jeff Moden - Initial creation and test (single space delimiter)
EMAIL: jbmoden@ameritech.net (has nasty spam filters)
Rev 01 - 06/06/2005 - Jeff Moden - Add delimiter character as an input
Rev 02 - 08/17/2005 - Jeff Moden - Increase possible size of delimiter to 10 characters
Rev 03 - 09/03/2005 - Jeff Moden - Increase possible size of delimiter to 255 characters
****************************************************************************************/
--===== Define the input parameters
(
@pString VARCHAR(8000), --String to be split
@pWordNum SMALLINT, --Number of the word to return
@pDelim VARCHAR(255) --Delimiter to base the split on (max is 255 or BOOM!)
)
--===== Define the return type
RETURNS VARCHAR(8000)
AS
BEGIN
--======================================================================================
-- Body of function
--======================================================================================
--===== Declare the local variables
DECLARE @LenDelim TINYINT --Holds length of delimiter (max is 255 or BOOM!)
DECLARE @Words TABLE
(
WordNum INT IDENTITY(1,1), --The number of the word that has been split out
Word VARCHAR(8000) --The word that has been split out
)
--===== Determine length of delimiter. Had to do this way because spaces have 0 for a
-- value of LEN
SET @LENDelim = LEN('|'+@pDelim+'|')-2
--===== Prepare the string to be split for parsing by adding a delimiter to each end
SET @pString = @pDelim+@pString+@pDelim
--===== Do the parsing and store the parsed words in the table variable
INSERT INTO @Words (Word)
SELECT SUBSTRING(@pString, --What to split
N+@LenDelim, --Where to start
CHARINDEX(@pDelim,@pString,N+@LenDelim)-@LenDelim-N --Length
)
FROM dbo.Tally
WHERE SUBSTRING(@pString,N,@LenDelim) = @pDelim --Finds leading delimiters
AND N < LEN(@pString)-@LenDelim --except the last 1
ORDER BY N --Force the order of words or segments to be correct
--===== Return the word indicated by the input parameter
RETURN (SELECT Word
FROM @Words
WHERE WordNum = @pWordNum)
--======================================================================================
-- End of function
--======================================================================================
END
GO
Item 3, Import all the records into a "Flat" staging table. This is where most folks raise an eyebrow at me. Why a "flat" single column table and then parse the records? Because it's (1) nasty fast, (2) very easy to write and maintain, (3) it allows me to do almost everything set based. This Bulk Insert will import a million records in just a couple of seconds. I use a temp table as the staging table just because it's my nature and my DBA's have made TempDB nice and large just to handle such things (another possibile subject for another post). You could certainly use a permanent table. DO NOT use a table variable for this... table variables do not use statistics which makes them damned slow for large record sets. All the other bad rumors you heard about the use of Temp tables are simply not true. Don't take my word for it, though... see the following URL...
http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k
In order for this to work, we have to have an IDENTITY column on the Staging table to preserve the order that the records are inserted. In order to do that, BULK INSERT is going to need a "BCP FORMAT" file so we can ignore the identity column. Here's what the file should look like...
8.0
2
1 SQLCHAR 0 0 "" 0 IgnoreMe ""
2 SQLCHAR 0 400 "\n" 2 RawData ""
... and it should probably be stored in the same directory as your raw import files...
...and here's the code for Item 3...
--===== If the temporary "staging" table already exists, drop it
IF OBJECT_ID('TempDB..#Staging') IS NOT NULL
DROP TABLE #Staging
--===== Create a temporary "staging" table to hold the flat import
CREATE TABLE #Staging
(
RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
RawData VARCHAR(400)
)
--===== Import the raw data from the file into the staging table
BULK INSERT #Staging
FROM 'C:\Temp\BulkInsertTest.txt' --<< CHANGE THIS TO MATCH YOUR SITUATION!
WITH (
DATAFILETYPE = 'CHAR',
FORMATFILE = 'C:\Temp\BulkInsertTest.fmt', --<< CHANGE THIS TO MATCH YOUR SITUATION!
BATCHSIZE = 50000
)
Items 4 and 5 are nearly identical. Since I don't know what your Header or Detail tables look like, I made my own for this little experiment just to make sure the code works correctly... this code is NOT part of anything you'd need to do as your tables already exist...
--===== Create a temp table to hold header info for this experiment
-- Added a HeaderID column to ensure uniqueness of records
-- Does not include the "H" record type
IF OBJECT_ID('TempDB..#Header') IS NOT NULL
DROP TABLE #Header
CREATE TABLE #Header
(
HeaderID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
HdrCol02 VARCHAR(10),
PhoneNum CHAR(10), --A guess on my part
XactDate DATETIME, --A guess on my part
HdrCol05 VARCHAR(10),
HdrCol06 VARCHAR(10),
HdrCol07 VARCHAR(10),
HdrCol08 VARCHAR(10)
)
--===== Create a temp table to hold detail info for this experiment
-- Added a DetailID column to ensure uniqueness of records
-- Does not include the "D" record type
IF OBJECT_ID('TempDB..#Detail') IS NOT NULL
DROP TABLE #Detail
CREATE TABLE #Detail
(
DetailID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
ServiceType VARCHAR(10), --Guess on my part
Country CHAR(2), --Guess on my part
Minutes INT, --Guess on my part
TimeZone VARCHAR(10), --Guess on my part
DtlCol06 VARCHAR(10),
DtlCol07 VARCHAR(10),
DtlCol08 VARCHAR(10),
Cost DECIMAL(10,5), --Guess on my part
DtlCol10 VARCHAR(10),
DtlCol11 VARCHAR(10),
DtlCol12 VARCHAR(10),
DtlCol13 VARCHAR(10),
DtlCol14 VARCHAR(10),
DtlCol15 VARCHAR(10),
HeaderID INT --<<Looky, looky! MAKE SURE YOU HAVE THIS!!
)
Of course, your's is probably different... just bear with me for the experiment...
The following code is for Items 4 and 5 which does the actual import... the big problem to overcome with the data you posted is that we have to have some way to identify which detail records go with which header records once they're parsed and packed away in a table. That means we have to find each header record, post it to the header table, get the HeaderID we just posted, find all the detail records for that header, and post the detail records with the HeaderID... the following code does ALL of that... of course, you would need to change some column names and table names to match your Header and Detail tables...
--=====Declare local variables
DECLARE @CurrentHeader INT
DECLARE @NextHeader INT
DECLARE @MaxDetail INT
DECLARE @HeaderID INT
--===== Suppress the auto-display of rowcounts for appearance and speed
SET NOCOUNT ON
--===== Find the max detail record RowNum
SELECT @MaxDetail = MAX(RowNum)
FROM #Staging
WHERE dbo.fnSplit(RawData,1,'^') = 'D'
--===== Find the first header
SET @CurrentHeader = 0
SELECT @NextHeader = MIN(RowNum)
FROM #Staging
WHERE dbo.fnSplit(RawData,1,'^') = 'H'
AND RowNum > @CurrentHeader
--===== Parse the data by finding each header and the
-- detail that goes with it
WHILE @NextHeader >0
BEGIN
--===== Make the previous next header, the current header
SET @CurrentHeader = @NextHeader
--===== Parse the header info into the Header table
INSERT INTO #Header
(HdrCol02,PhoneNum,XactDate,HdrCol05,HdrCol06,HdrCol07,HdrCol08)
SELECT NULLIF(dbo.fnSplit(RawData,2,'^'),'') AS HdrCol02,
NULLIF(dbo.fnSplit(RawData,3,'^'),'') AS PhoneNum,
NULLIF(dbo.fnSplit(RawData,4,'^'),'') AS XactDate,
NULLIF(dbo.fnSplit(RawData,5,'^'),'') AS HdrCol05,
NULLIF(dbo.fnSplit(RawData,6,'^'),'') AS HdrCol06,
NULLIF(dbo.fnSplit(RawData,7,'^'),'') AS HdrCol07,
NULLIF(dbo.fnSplit(RawData,8,'^'),'') AS HdrCol08
FROM #Staging
WHERE RowNum = @CurrentHeader
--===== Remember the ID of the header record we just inserted
SET @HeaderID = SCOPE_IDENTITY()
--===== Find the next header
SET @NextHeader = NULL --Forces a null if we don't find anything left
SELECT @NextHeader = MIN(RowNum)
FROM #Staging
WHERE dbo.fnSplit(RawData,1,'^') = 'H'
AND RowNum > @CurrentHeader
--===== Parse and insert the detail records for the current header
-- including the ID of the the related header record
INSERT INTO #Detail
(
ServiceType, Country, Minutes, TimeZone, DtlCol06,
DtlCol07, DtlCol08, Cost, DtlCol10, DtlCol11,
DtlCol12, DtlCol13, DtlCol14, DtlCol15, HeaderID
)
SELECT NULLIF(dbo.fnSplit(RawData, 2,'^'),'') AS ServiceType,
NULLIF(dbo.fnSplit(RawData, 3,'^'),'') AS Country,
NULLIF(dbo.fnSplit(RawData, 4,'^'),'') AS Minutes,
NULLIF(dbo.fnSplit(RawData, 5,'^'),'') AS TimeZone,
NULLIF(dbo.fnSplit(RawData, 6,'^'),'') AS DtlCol06,
NULLIF(dbo.fnSplit(RawData, 7,'^'),'') AS DtlCol07,
NULLIF(dbo.fnSplit(RawData, 8,'^'),'') AS DtlCol08,
NULLIF(dbo.fnSplit(RawData, 9,'^'),'') AS Cost,
NULLIF(dbo.fnSplit(RawData,10,'^'),'') AS DtlCol10,
NULLIF(dbo.fnSplit(RawData,11,'^'),'') AS DtlCol11,
NULLIF(dbo.fnSplit(RawData,12,'^'),'') AS DtlCol12,
NULLIF(dbo.fnSplit(RawData,13,'^'),'') AS DtlCol13,
NULLIF(dbo.fnSplit(RawData,14,'^'),'') AS DtlCol14,
NULLIF(dbo.fnSplit(RawData,15,'^'),'') AS DtlCol15,
@HeaderID AS HeaderID
FROM #Staging
WHERE RowNum BETWEEN @CurrentHeader+1
AND ISNULL(@NextHeader-1,@MaxDetail)
END --End of While loop
...And now you know why some folks spend lots of money on 3rd party solutions to do this type of stuff. Some also use DTS (comes with SQL Server) only to turn around and write some VBScript to do most of what was done in the code above. Others spend pot wads of money on genereal purpose ETL (Extract, Translate, Load) programs and they really don't run any faster. They are, however a little easier to use for non-SQL-Developers.
Any questions?
Send beer, I already have enough pretzels
--Jeff Moden
Change is inevitable... Change for the better is not.
April 3, 2007 at 7:29 pm
Ya know, Brenda... a little feed back on this extent of this type of help would have been appreciated. At least let me know how it worked out for you
--Jeff Moden
Change is inevitable... Change for the better is not.
April 3, 2007 at 8:01 pm
Jeff, that answer was detailed enough that I'll buy you a beer, if you're ever in the Atlanta area, even though it wasn't my question.
April 4, 2007 at 6:02 am
Heh... Thanks David...
I've been doing a lot of BCP work lately and ran into this ol' post of mine... Yeah, I know it's a forum and we shouldn't expect it... but it just struck me as a bit wrong that folks cough up detailed answers like this and never even get acknowledgment that the orginal poster even read it.
Thanks, again.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply