March 3, 2005 at 5:49 am
I want to use Bulk Insert to load data. The data is a CSV file, with text fields being surrounded by "" to prevent errors due to commas appearing in the field value.
I am unable to work out how to achieve this using the T-SQL Bulk Insert command. Is it possible or is using a DTS package the way to go?
March 3, 2005 at 7:34 am
Bulk Insert (BCP) will not honour quotes and will treat them like normal data.
Your best bet is to use DTS which will enable you to utilize the quotes correctly.
Far away is close at hand in the images of elsewhere.
Anon.
March 4, 2005 at 6:11 am
Hello,
you may want to read about format files. The bcp or BULK INSERT allow to use so called "format files", where you may specify that delimiter between fields is ",".
The only problem here is that for each file to import you would need to write format file.
Andrey
March 4, 2005 at 7:01 am
You would have to strip the double quotes after the data is loaded, as in the following example:
CREATE TABLE #tblTest
(
id int IDENTITY(1,1)
, data varchar(70)
)
SET NOCOUNT ON
INSERT #tblTest (data) VALUES ('"This is line 1."')
INSERT #tblTest (data) VALUES ('"This is line 2."')
INSERT #tblTest (data) VALUES ('Since this line ends with a ", we want to keep the "')
INSERT #tblTest (data) VALUES ('This is line four (4)')
SET NOCOUNT OFF
SELECT * FROM #tblTest
UPDATE #tblTest
SET data = CASE
WHEN data = '""' THEN ''
WHEN Left(data,1) = '"' AND Right(data,1) = '"' THEN Substring(data,2,Len(data)-2)
ELSE data
END
SELECT * FROM #tblTest
DROP TABLE #tblTest
March 4, 2005 at 7:12 am
Thanks for the responses. It looks like a SQL script is a no-no. Stripping the quotes is ok, except I can't get the data into the table in the first place using a SQL script.
March 4, 2005 at 7:53 am
If you want to do this in T-SQL then you could do it this way
CREATE TABLE test (F1 varchar(100))
BULK INSERT test FROM 'c:\temp\test.csv' WITH ( ROWTERMINATOR = '\r\n' )
DECLARE @updct int
SET @updct = 1
WHILE (@updct > 0)
BEGIN
UPDATE test
SET F1 = STUFF (F1,
PATINDEX('%,"%',F1)+1,
PATINDEX('%",%',F1)-PATINDEX('%,"%', F1),
REPLACE(REPLACE(SUBSTRING(F1,PATINDEX('%,"%',F1)+1, PATINDEX('%",%',F1)-PATINDEX('%,"%', F1)), '"', ''), ',', ' '))
WHERE CHARINDEX('"',F1) > 0
SET @updct = @@ROWCOUNT
END
This will replace any comma within a set of quotes to space and removes the quotes.
You can then write a function (there are some on this site) to parse the comma separated data into columns.
Far away is close at hand in the images of elsewhere.
Anon.
March 4, 2005 at 7:58 am
Thanks for that suggestion. I'm going to have a play with this, because I think this might be just what I need 🙂
March 4, 2005 at 8:33 am
You are a genius - a bit of tweaking and I think we have a winner 🙂
Thank you.
March 6, 2005 at 8:13 pm
I have used code similar to what follows to import comma, quote delimited text. The number of rows and columns is usually small.
bulk insert dbname.dbo.tablename from '\\servername\directoryname\filename.csv'
with
(
DATAFILETYPE = 'char',
FIELDTERMINATOR = '","', --uses "," as a field delimiter
ROWTERMINATOR = '"\n' -- uses " and CRLF as end of row)
The foregoing addresses all of the comma quote delimiters except the first quote. A simple update such as what follows will rectify that.
update tablename
set field1 = replace(field1,'"','')
March 7, 2005 at 6:20 am
All good suggestions and I like the simplicity of Fred Wadley's solution. However, sometimes you can't use Bulk Insert because the DBA won't give you either "SA" permissions or "BULK Admin" permissions which is required to be able to use BULK INSERT. DTS is almost always slower that either BULK INSERT or BCP and there's that permissions thing, again.
With all of that in mind, I usually use BCP especially when loading millions of records. If you read up on all of the nifty parameters in Books-on-Line, BCP can be set up to capture "failed" records to a separate file for further examination and possibe correction, as well as other things.
As most have found and some have pointed out in the above postings, BCP would require a format file to use the fields wrapped in double quotes (which is actually a very common format). Here's an example format file to do such a thing...
10
1 SQLCHAR 0 1 "" 0 FirstTerm ""
2 SQLCHAR 0 LLL "\",\"" NNN FirstField ""
3 SQLCHAR 0 LLL "\",\"" NNN SecondField ""
4 SQLCHAR 0 LLL "\",\"" NNN ThirdField ""
5 SQLCHAR 0 LLL "\",\"" NNN FourthField ""
6 SQLCHAR 0 LLL "\",\"" NNN FifthField ""
7 SQLCHAR 0 LLL "\",\"" NNN SixthField ""
8 SQLCHAR 0 LLL "\",\"" NNN SeventhField ""
9 SQLCHAR 0 LLL "\",\"" NNN EighthField ""
10 SQLCHAR 0 LLL "\"\r\n" NNN NinethField ""
There are a couple of things about this format file that you need to know...
Of course, you can add as many intermediatory fields as you need. And, the BCP batch job can be scheduled using the Windows Task Scheduler (or any other scheduler) instead of having to bug the DBA's to setup a Job in SQL-Server.
It takes a minute or two per field (including figuring out the mapping of each field) but, in the long run, it's really worth it. It will be the fastest BULK insert you can write (especially if you temporarily set the "Select Into/Bulk Copy" option to "TRUE"), provides error logging and error record capture, and really doesn't require much in the form of permissions especially if the target table (should usually be a staging table) has been blessed by the DBA's.
Even if you do have the necessary permissions to use BULK INSERT, it is almost always better to define the input file with a format file so you can do things like ignoring the first double-quote and setting up to log incorrectly formatted input lines or input lines that don't meet the datatype requirements of each column of the target table. Since I've done the "hard work" for you, all you have to do is a little CPR (cut, paste, replace) on the format file I posted above.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 7, 2005 at 6:53 am
Thanks to everyone who has responded: I've got a lot of info now (and one working prototype).
March 8, 2005 at 2:29 am
Jeff,
Nice solution if it was possible. As stated in BOL (SQL2K SP3) and in practice escaping double quotes (\") is not allowed with BCP.
Far away is close at hand in the images of elsewhere.
Anon.
March 8, 2005 at 4:05 am
Hi David,
The example I gave was modified to be generic from production code that I currently have working in the same fashion offered. Dunno what to say about the notation in BOL. I never read that before so it may be that it only works because I didn't know it couldn't be done kinda like the bumble-bee isn't supposed to be able to fly.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 8, 2005 at 5:13 am
I can't get it to work at all
Far away is close at hand in the images of elsewhere.
Anon.
March 8, 2005 at 11:40 am
I'll find the production code and see if I did anything different... sorry for the confusion.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 42 total)
You must be logged in to reply to this topic. Login to reply