Loading text file to SQL Server 2005 using bulk insert

Question

Loading text file to SQL Server 2005 using bulk insert

rohanverma

Old Hand

Points: 351
More actions
August 16, 2008 at 4:53 pm

#211252

Hello,
I am trying to load text files (comma delimited) into a table in SQL Server 2005 management studio using bulk insert. I am able to load the file into the table but the bulk insert process isn't identifying correct row delimiter. My text file has different number of columns for each row:
row1: col1,col2,col3
row2: col1,col2
row3: col1,col2,col3,col4
row4: col1,col2,col,3,col4,col5
here is the exact script i am using:
BULK
INSERT ve_load2
FROM 'C:\ve\files\Copy of VACE.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = ';'
)
GO
Here is what I am getting in the table:
row1: col1,col2,col3,row2: col1,col2
row3: col1,col2,col3,col4,row4: col1,col2,col,3,col4,col5
Can someone help me in loading this text file?
Thanks
Ro

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 1

Heh... that's known as a "Headerless Ragged Right" file and they're a bugger to import correctly. BCP and BULK insert require that all the rows have the same number of delimiters so you can't do an import of such a file with either.

However... with a little help from a couple of DOS commands and a little prestidigitation with a text based linked server, it can be done with very little pain.

First, I assume you have a file, like in your original post, in C:\Temp called Test01.txt and it looks like this...

[font="Courier New"]col1,col2,col3

col1,col2

col1,col2,col3,col4

col1,col2,col,3,col4,col5[/font]

Next, let's setup a linked server and give it the necessary login privs...

--===== Create a linked server to the drive and path you desire.

EXEC dbo.sp_AddLinkedServer TxtSvr,

'Jet 4.0',

'Microsoft.Jet.OLEDB.4.0',

'C:\Temp',

NULL,

'Text'

--===== Set up login mappings.

EXEC dbo.sp_AddLinkedSrvLogin TxtSvr, FALSE, NULL, Admin, NULL

Here comes the fun part... if we just read the file directly...

--===== Query one of the files by using a four-part name.

SELECT *

FROM TxtSvr...[Test01#txt]

... we get an awful mess that looks like this...

[font="Courier New"]

F4 F5 col1 col2 col3

col1 col2

col4 col1 col2 col3

col4 col5 col1 col2 col3

(3 row(s) affected)[/font]

notice that the first row was used as a header and that the data has been rearranged AND that two of the headers have been made up! We can fix that with a couple of simple DOS commands...

--===== Create a header that identifies the columns we want... (any col names will do)

EXEC Master.dbo.xp_CmdShell 'ECHO HdrCol1,HdrCol2,HdrCol3,HdrCol4,HdrCol5 > C:\Temp\Header.txt'

--===== Create a new working file that has the header and the original file as one

EXEC Master.dbo.xp_CmdShell 'COPY C:\Temp\Header.txt+C:\Temp\Test01.txt C:\Temp\MyWork.txt'

Then, accessing the data in the file is a breeze... do with it what you want because it can now be referenced as a table with the correct column names...

--===== Read the csv text file as if it were a table

SELECT *

FROM TxtSvr...[MyWork#txt]

[font="Courier New"]HdrCol1 HdrCol2 HdrCol3 HdrCol4 HdrCol5

col1 col2 col3

col1 col2

col1 col2 col3 col4

col1 col2 col3 col4 col5

(4 row(s) affected)[/font]

If you need to drop the linked server after than, the following command will do nicely...

EXEC dbo.sp_DropServer 'TxtSvr', 'DropLogins'

😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 2

Jeff Moden (8/16/2008)
Heh... that's known as a "Headerless Ragged Right" file and they're a bugger to import correctly. BCP and BULK insert require that all the rows have the same number of delimiters so you can't do an import of such a file with either.
However... with a little help from a couple of DOS commands and a little prestidigitation with a text based linked server, it can be done with very little pain.
First, I assume you have a file, like in your original post, in C:\Temp called Test01.txt and it looks like this...
[font="Courier New"]col1,col2,col3
col1,col2
col1,col2,col3,col4
col1,col2,col,3,col4,col5[/font]
Next, let's setup a linked server and give it the necessary login privs...
--===== Create a linked server to the drive and path you desire.
EXEC dbo.sp_AddLinkedServer TxtSvr,
'Jet 4.0',
'Microsoft.Jet.OLEDB.4.0',
'C:\Temp',
NULL,
'Text'
--===== Set up login mappings.
EXEC dbo.sp_AddLinkedSrvLogin TxtSvr, FALSE, NULL, Admin, NULL
Here comes the fun part... if we just read the file directly...
--===== Query one of the files by using a four-part name.
SELECT *
FROM TxtSvr...[Test01#txt]
... we get an awful mess that looks like this...
[font="Courier New"]
F4 F5 col1 col2 col3
col1 col2
col4 col1 col2 col3
col4 col5 col1 col2 col3
(3 row(s) affected)[/font]
notice that the first row was used as a header and that the data has been rearranged AND that two of the headers have been made up! We can fix that with a couple of simple DOS commands...
--===== Create a header that identifies the columns we want... (any col names will do)
EXEC Master.dbo.xp_CmdShell 'ECHO HdrCol1,HdrCol2,HdrCol3,HdrCol4,HdrCol5 > C:\Temp\Header.txt'
--===== Create a new working file that has the header and the original file as one
EXEC Master.dbo.xp_CmdShell 'COPY C:\Temp\Header.txt+C:\Temp\Test01.txt C:\Temp\MyWork.txt'
Then, accessing the data in the file is a breeze... do with it what you want because it can now be referenced as a table with the correct column names...
--===== Read the csv text file as if it were a table
SELECT *
FROM TxtSvr...[MyWork#txt]
[font="Courier New"]HdrCol1 HdrCol2 HdrCol3 HdrCol4 HdrCol5
col1 col2 col3
col1 col2
col1 col2 col3 col4
col1 col2 col3 col4 col5
(4 row(s) affected)[/font]
If you need to drop the linked server after than, the following command will do nicely...
EXEC dbo.sp_DropServer 'TxtSvr', 'DropLogins'
😛

I have tried to use this solution, but could not get it to work. It looks like it could solve my problems I've been having this morning, importing data with different columns in it, but I get the following error:

OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "txtsrv" returned message "Cannot start your application. The workgroup information file is missing or opened exclusively by another user.".

Msg 7399, Level 16, State 1, Procedure sp_tables_ex, Line 41

The OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "txtsrv" reported an error. Authentication failed.

Msg 7303, Level 16, State 1, Procedure sp_tables_ex, Line 41

Cannot initialize the data source object of OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "txtsrv".

Here is the code I used, it's pretty much the same code you posted:

EXEC sp_addlinkedserver txtsrv, 'Jet 4.0',

'Microsoft.Jet.OLEDB.4.0',

'c:\SEND\SPF',

NULL,

'Text'

--Set up login mappings.

EXEC sp_addlinkedsrvlogin txtsrv, false, NULL,'sa','*******'

--List the tables in the linked server.

EXEC sp_tables_ex txtsrv

--Query one of the tables: file1#txt

--using a four-part name.

SELECT *

FROM txtsrv...[merge123#xmt]

EXEC dbo.sp_DropServer 'txtsrv', 'DropLogins'

...

Is there some problems with my authentication, does it need a windows account to access the file, or "sa" to actually have the rights to load the file?

Any help would be appreciated.

Cheers,

J-F

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 3

Couple of things may be happening here... first off, where is 'c:\SEND\SPF' ?? If it's not on the server, then you have to change the path to a UNC. If you do change it to a UNC, then the SQL Server "Service" must be logged in as a user that can actually see that path.

Even if that path IS on the server, it still might not be able to see that path. Here's a way to check...

EXEC Master.dbo.xp_DirTree 'c:\SEND\SPF',1,1

If you get an error, then your server can't see the path and the login the service is logging in as has to be changed. I believe you start that process by right clicking on the server icon in the Registered Server browser and selection Properties...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 4

Hi Jeff, thanks for your answer.

I still get the problem, I used the procedure to see if the server had access and it gave me this result:

subdirectory depthfile

1VND2028.GEN 11

merge.xmt 11

merge123.xmt 11

NEWPRD2028.GEN 11

PRD2028.GEN 11

schema.ini 11

TestImport.gen 11

VND2028.GEN 11

VNDCATEG2028.GEN11

I've searched a lot on the web to find answers to that error, but they keep giving me some "Access Errors", fix the access file, etc, etc. This is bull since i have a txt file accessed by the jet engine driver.

Even when i try to edit the linked server with the Management studio, it gives me the same error when i click OK. What is causing this error, since i can access the path on the server, and i gave it the "SA" credentials, it should have full access...

If it can help anyone help me, here are the properties for the linked server :

/****** Object: LinkedServer [txtsrv] Script Date: 08/20/2008 08:18:27 ******/

EXEC master.dbo.sp_addlinkedserver @server = N'txtsrv', @srvproduct=N'Jet 4.0', @provider=N'Microsoft.Jet.OLEDB.4.0', @datasrc=N'c:\SEND\SPF', @location=N'it020', @provstr=N'Text'

/* For security reasons the linked server remote logins password is changed with ######## */

EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'txtsrv',@useself=N'False',@locallogin=NULL,@rmtuser=N'sa',@rmtpassword='########'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'collation compatible', @optvalue=N'false'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'data access', @optvalue=N'true'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'dist', @optvalue=N'false'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'pub', @optvalue=N'false'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'rpc', @optvalue=N'false'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'rpc out', @optvalue=N'false'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'sub', @optvalue=N'false'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'connect timeout', @optvalue=N'0'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'collation name', @optvalue=null

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'lazy schema validation', @optvalue=N'false'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'query timeout', @optvalue=N'0'

GO

EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'use remote collation', @optvalue=N'true'

Hope to have some quick feedback, as I am stuck right now... 😉

Thanks in advance!

JF

Cheers,

J-F

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 5

Finally got it to work! There was an error with my login, it was not in the right order, needed to put 'Sa' as the third paremeter, not the fourth!!!

Here is the code I use now:

EXEC sp_addlinkedserver txtsrv, 'Jet 4.0',

'Microsoft.Jet.OLEDB.4.0',

'c:\SEND\SPF',

NULL,

'Text'

GO

--Set up login mappings.

EXEC sp_addlinkedsrvlogin txtsrv, FALSE, 'sa', NULL

GO

--List the tables in the linked server.

EXEC sp_tables_ex txtsrv

GO

--Query one of the tables: file1#txt

--using a four-part name.

SELECT *

FROM txtsrv...[Merge123#txt]

EXEC dbo.sp_DropServer 'txtsrv', 'DropLogins'

But, I found this solution to be slower than using the OpenRowSet command (25 secs for 80k rows, compared to 15 seconds for OpenRowSet). And since I will have about 1 million rows for big updates, might as well work with the faster command! Here is the code, if it can help anyone:

USE MASTER

GO

-- This is needed to override the Ad Hoc Distributed Queries

-- To be able to load files with different columns numbers in them.

sp_configure 'Show Advanced Options', 1

GO

Reconfigure with override

GO

sp_configure 'Ad Hoc Distributed Queries', 1

GO

Reconfigure with override

GO

USE "Your Database"

GO

SELECT *

FROM OPENROWSET(

'MSDASQL',

'Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=C:\SEND\SPF;',

'SELECT *

FROM merge123.txt')

So, as I said, this procedure is faster for me, maybe because of the driver that is used to load the data. I can do any data manipulation from there, and import them in temp tables, or update my data. You might need a Schema file to import the data correctly, here is the script I built for that:

EXEC Master.dbo.xp_CmdShell

'ECHO [Merge123.txt] > C:\Send\SPF\Schema.ini'

EXEC Master.dbo.xp_CmdShell

'ECHO ColNameHeader=true >> C:\Send\SPF\Schema.ini'

EXEC Master.dbo.xp_CmdShell

'ECHO Format=TabDelimited >> C:\Send\SPF\Schema.ini'

EXEC Master.dbo.xp_CmdShell

'ECHO MaxScanRows=25 >> C:\Send\SPF\Schema.ini'

EXEC Master.dbo.xp_CmdShell

'ECHO CharacterSet=ANSI >> C:\Send\SPF\Schema.ini'

For more information on how to build a schema file, use this link

http://puna.net.nz/archives/Technical/schemaini.htm

Hope it helps!

Thanks a lot for the help Jeff,

JF

Cheers,

J-F

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 6

Hi,

I have an issue with using openrowSet:

When using on real data, i got several products that with double quotes in the description, and caused me problems.

Here is an example:

ProductID Description UPC

1 Wire 10" 12345678901234

The OpenRowSet imports the data, and works fine, but the description is cut to "Wire 10", (with no ending double quote), and the UPC value is not present in the field, it is NULL.

I googled the problem, and got no solution, is there a parameter I can use to keep the double quotes in the description?

Thanks in advance,

Cheers,

J-F

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 7

jfbergeron (8/20/2008)
Finally got it to work! There was an error with my login, it was not in the right order, needed to put 'Sa' as the third paremeter, not the fourth!!!
Hope it helps!
Thanks a lot for the help Jeff,
JF

Outstanding! Thanks for posting your solutions and the bit of test info about linked serverr being slower. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 8

jfbergeron (8/25/2008)
Hi,
I have an issue with using openrowSet:
When using on real data, i got several products that with double quotes in the description, and caused me problems.
Here is an example:
ProductID Description UPC
1 Wire 10" 12345678901234
The OpenRowSet imports the data, and works fine, but the description is cut to "Wire 10", (with no ending double quote), and the UPC value is not present in the field, it is NULL.
I googled the problem, and got no solution, is there a parameter I can use to keep the double quotes in the description?
Thanks in advance,

Not that I know of... but no real problem either... just use BULK INSERT with a comma delimiter, instead.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

rohanverma Old Hand Points: 351 More actions · Answer 9

This is extremely helpful. One question: is there a way to populate the file name in the schema.ini file using a like statement. For example, I want to say "pick up any and all files (one at a time) where file name starts with words "MD"". Can we do that in schema.ini?

reno812 Newbie Points: 7 More actions · Answer 10

Wow, I was facing a similar scenario and tried this proposed solution, and it worked just perfect :). Thanks, this saved me lots of time!

Jim Russell-390299 SSCrazy Eights Points: 9333 More actions · Answer 11

I wished for that ability also (wildcards in [fileName*] but it doesn't seem to be supported.

But you can always start with some token [FileName#SuffixGoesHere#] and do a string replace in the Schema.ini file when you know what the real suffix is before you invoke the transfer that uses the Schema.ini file.

Alternately, if you have a (small) number of possible suffixes, you can simply copy the entire file specification multiple times, each with the appropriate [fileNamexxx] header in the single Schema.ini file.

(I have noticed that Jet4.0 odbc access to a text file, is not bothered (as is SSIS) by some garbage comma-less preamble rows prior to the comma delimited data rows -- the "missing" columns simply come through as NULL.)