August 19, 2004 at 9:24 pm
My ultimate goal is to import some mix of Chinese and English from
a set of files into a database. Since I have multiple files and since
I'm not good at writing iterations in DTS, I'd like to use stored
procs with BULK INSERT. The input format will be csv with quoted
strings and there is no guarantee that there are no commas in the
strings. Therefore I'll need a format file to strip the quotes. But
for now, I'm trying very simple things...
To make the long story short: as long as I am using ASCII files, I
don't have any problem, but if I save the input file as UNICODE,
things start falling apart. Here are the details:
/* Contents of the the input files xx-a.txt and xx-u.txt: */
one,two,three
un,deux,troix
eins,zwei,drei
/* Contents of the the input files yy-a.txt and yy-u.txt: */
one two three
un deux troix
eins zwei drei
/* Input files were written in Notepad,
-a saved as ANSI and -u saved as UNICODE */
/* Contents of format file xx.fmt: */
8.0
3
1 SQLCHAR 0 10 "," 1 a SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 10 "," 2 b SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 10 "\r\n" 3 c SQL_Latin1_General_CP1_CI_AS
/* Contents of format file yy.fmt: */
8.0
3
1 SQLCHAR 0 10 "" 1 a SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 10 "" 2 b SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 10 "" 3 c SQL_Latin1_General_CP1_CI_AS
/* the test table */
CREATE TABLE [dbo].[xx] (
[a] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[c] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
)
/* test 1: ASCII input with FORMATFILE */
BULK INSERT xx
FROM 'E:\test\xx-a.txt'
WITH (
FORMATFILE = 'e:\test\xx.fmt'
)
/* test 2: UNICODE input with FORMATFILE */
BULK INSERT xx
FROM 'E:\test\xx-u.txt'
WITH (
DATAFILETYPE = 'widechar',
FORMATFILE = 'e:\test\xx.fmt'
)
/* test 3: UNICODE input with FIELDTERMINATOR */
BULK INSERT xx
FROM 'E:\test\xx-u.txt'
WITH (
DATAFILETYPE = 'widechar',
FIELDTERMINATOR = ','
)
/* test 4: ASCII input, fixed length */
BULK INSERT xx
FROM 'E:\test\yy-a.txt'
WITH (
FORMATFILE = 'e:\test\yy.fmt'
)
/* test 5: UNICODE input, fixed length */
BULK INSERT xx
FROM 'E:\test\yy-u.txt'
WITH (
DATAFILETYPE = 'widechar',
FORMATFILE = 'e:\test\yy.fmt'
)
/*------------------------------------------------------------------*/
Results:
Test 1, Test3 and Test 4 work fine.
Test 2 gives the following error message:
Bulk insert data conversion error (truncation) for row 1, column 2 (b).
Test 5 produces a checker board:
line 1: a='one', b='', c='two'
line 2: a='', b='three', c=''
etc
System used: English Windows 2000 AS SP4, SQL Server 2000 EE 8.00.760 (SP3)
/*------------------------------------------------------------------*/
August 23, 2004 at 8:00 am
This was removed by the editor as SPAM
September 6, 2004 at 10:30 pm
For the archive:
This here is not a solution either, but it maybe a hint that what I want is simply not possible:
I tried Test 2 (that's the only relevant one anyway) in bcp, and I got the following:
--------------------------------------------------
E:\test>bcp testDB..xx in xx-u.txt -f xx.fmt -Psecret -w
Warning: -w overrides -f.
Starting copy...
SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC SQL Server Driver]Unexpected EOF encountered in BCP dat
a-file
0 rows copied.
--------------------------------------------------
That warning clearly tells us that in bcp you can't have Unicode data and format files at the same time. That leads me to the next question: Are BULK INSERT and bcp two different implementations or just different interfaces for the same thing? If it is the latter, then I can give up. In DTS, on the other hand, I don't have any problem doing the imports...
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply