Double quotes on Bulk Insert

Question

Double quotes on Bulk Insert

Luis Cazares

SSC Guru

Points: 183706
More actions
December 20, 2013 at 3:25 pm

#302572

Hi all,
I'm migrating some DTS packages into SP. I managed to load the information using BULK INSERT and format files to remove double quotes(") that identify character values.
However, I've got a problem when a string value is supposed to have double quotes in it and they inserted double double quotes("") to represent a single double quote("). e.g.
1,"This is some text","Some other text"
2,"This is a ""quoted"" text", ,"Some other text"
And I should end with values like this:
Col1 Col2 Col3
1 This is some text Some other text
2 This is a "quoted" text Some other text
Do I need a staging table? or is there a better solution?
Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?
How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004558 More actions · Answer 1

Luis Cazares (12/20/2013)
Hi all,
I'm migrating some DTS packages into SP. I managed to load the information using BULK INSERT and [font="Arial Black"]format files [/font]to remove double quotes(") that identify character values.
However, I've got a problem when a string value is supposed to have double quotes in it and they inserted double double quotes("") to represent a single double quote("). e.g.
1,"This is some text","Some other text"
2,"This is a ""quoted"" text", ,"Some other text"
And I should end with values like this:
Col1 Col2 Col3
1 This is some text Some other text
2 This is a "quoted" text Some other text
Do I need a staging table? or is there a better solution?

Are you actually using a BCP Format File? If so, please post it and I'll show you how to modify it to do this. If not, lemme know and I'll build one for you.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

Thank you Jeff.

I'm using xml format files as described here: http://technet.microsoft.com/en-us/library/ms187833.aspx

But I really shouldn't have posted so late on friday because I'll return to the office after Christmas and I don't have an example here. I'll repost in a few days.

Enjoy the holidays!

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jeff Moden SSC Guru Points: 1004558 More actions · Answer 3

Luis Cazares (12/22/2013)
Thank you Jeff.
I'm using xml format files as described here: http://technet.microsoft.com/en-us/library/ms187833.aspx
But I really shouldn't have posted so late on friday because I'll return to the office after Christmas and I don't have an example here. I'll repost in a few days.
Enjoy the holidays!

Heh... XML... "Et tu, Brute?"

Looking forward to your return. Merry Christmas, Luis.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Luis Cazares SSC Guru Points: 183706 More actions · Answer 4

I'm back

Somehow, I expected a reaction like that from you

I'm attaching a sample file from what I'm loading and the format file I'm using. I had to edit the file names, but by the code you can get the correct names.

My problem is on line 3, char 292, Column 16

Here's DDL

CREATE TABLE [dbo].[PEDT001_load](

[NUM_CUS] [char](8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[COD_ENTITY] [char](4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[COD_ID] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[KEY_ID] [char](15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[CDG_DIGID] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[SEQID] [char](2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[COD_ID2] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[KEY_ID2] [char](15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[CDG_DIGID2] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[SURNAME] [char](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[SCDSURNAME] [char](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[NAME] [char](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[CARDNAME] [char](26) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[TYP_CNTAADR] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[TXT_ADR] [char](90) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[DES_COMPRCTY] [char](65) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[NUM_TPH] [char](15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[NUM_TELEX] [char](15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[MRST] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[GENDER] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[SUBJGRO] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[SUBJCPT] [char](2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[DAT_BIRTH] [char](10) NULL,

[NCBA] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[NCO] [char](3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[ACSLEVEL] [char](2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[COD_LNG] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[NUM_LADR] [decimal](3, 0) NOT NULL,

[NUM_NTFLAST] [decimal](3, 0) NOT NULL,

[NUM_CONFLAST] [decimal](3, 0) NOT NULL,

[DAT_SNR] [char](10) NULL,

[DAT_LASTANN] [char](10) NULL,

[DAT_CUSREG] [char](10) NULL,

[BRN_LMODP] [char](4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[DATE_STP] [char](26) NOT NULL,

[SEGMENTP] [char](2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[PREFCHAN] [char](2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[CUS_STATUS] [char](2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[BRN_REG] [char](4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[BONUS] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[REGCHAN] [char](2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[TYP_LEGAL] [char](2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[SUBSEG] [char](3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[NAME2] [char](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[FAS] [char](30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL

)

And here's the code:

DECLARE @Date AS CHAR(6),

@BulkInsertvarchar(300)

SET @Date = RIGHT( CONVERT( char(8), DATEADD(DAY, -1, GETDATE()), 112), 6)

SET @BulkInsert = 'BULK INSERT PERDWREP.dbo.PEDT001_LOAD

FROM ''\\SomePath\PEDT001.D' + @Date + '''

WITH( BATCHSIZE = 500000, FORMATFILE = ''\\SomePath\PEDT001_format.xml'', TABLOCK)'

EXEC( @BulkInsert)

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jeff Moden SSC Guru Points: 1004558 More actions · Answer 5

Crud. My apologies. I misread the question.

Yes, you could use a staging table. That way you can do an update to "singularize" the embedded double quotes and to get rid of the trailing spaces that are going to take up space even in a VARCHAR.

That takes this conversation on a bit of a tangent, though. From the sounds of it, you aren't currently using a staging table. A lot of people will think that I'm a bit obsessive about it but, to date, I've always loaded data into a staging table so that I can do other checks and validations. I can even mark the rejected rows with reasons for their failure and then send them back to the originator in hopes of getting better loads in the future. It also turns out to be really handy for pre-marking rows for inserts and updates as well as giving me the opportunity to easily conduct any data repairs prior to moving the rows to their final resting place. Last, but not least, I can take all of the marked rows and do a little reporting on them saving the results of those reports (in the form of a table, of course) for batch historical purposes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004558 More actions · Answer 6

As a bit of a sidebar, I've been looking at your format file and have a couple of suggestions.

First, the XML format requires that you set the length in both the FIELD and the COLUMN section. My recommendation is to set the length for both to 9999 and let the table do the checking during the import. The reason for this suggestion is that I've seen a lot of people update the length in a table and then totally forget about the format file which, of course, can lead to some serious problems. The other problem with it is that I've also seen the provider of the data just flat out decide to send wider data. Yes, you'll get a nice truncation error but then you have to go find out where the truncation is and if it turns out to be ok, you then have to update both sections of the format file AND the table.

You have also taken the opportunity to define the datatype of each column in the format file. I'll recommend that you just use SQLCHAR everywhere and, once again, let the table do the work. This is, again, because I've seen people change a datatype in their table and forget to make the same changes in their format files. Yes, discrepancies will be uncovered by either the format file or the table but wouldn't it be nice if you only needed to maintain one of them?

I also leave collation off (just use double quotes) unless some of the columns use a collation that's different than the server default. That way, there's less clutter and anything that is different will stick out like a sore thumb. Yeah... I know... you auto-magically built yours. That's how I start out, as well but then I customize it with all the bullet-proofing that I've been talking about.

Here's what I would use for the BCP format file and not just because I have a genuine hatred from the unnecessary bloat that XML causes.

8.0

46

1 SQLCHAR 0 1 "" 0 LeadingQuote ""

2 SQLCHAR 0 9999 "\",\"" 1 NUM_CUS ""

3 SQLCHAR 0 9999 "\",\"" 2 COD_ENTITY ""

4 SQLCHAR 0 9999 "\",\"" 3 COD_ID ""

5 SQLCHAR 0 9999 "\",\"" 4 KEY_ID ""

6 SQLCHAR 0 9999 "\",\"" 5 CDG_DIGID ""

7 SQLCHAR 0 9999 "\",\"" 6 SEQID ""

8 SQLCHAR 0 9999 "\",\"" 7 COD_ID2 ""

9 SQLCHAR 0 9999 "\",\"" 8 KEY_ID2 ""

10 SQLCHAR 0 9999 "\",\"" 9 CDG_DIGID2 ""

11 SQLCHAR 0 9999 "\",\"" 10 SURNAME ""

12 SQLCHAR 0 9999 "\",\"" 11 SCDSURNAME ""

13 SQLCHAR 0 9999 "\",\"" 12 NAME ""

14 SQLCHAR 0 9999 "\",\"" 13 CARDNAME ""

15 SQLCHAR 0 9999 "\",\"" 14 TYP_CNTAADR ""

16 SQLCHAR 0 9999 "\",\"" 15 TXT_ADR ""

17 SQLCHAR 0 9999 "\",\"" 16 DES_COMPRCTY ""

18 SQLCHAR 0 9999 "\",\"" 17 NUM_TPH ""

19 SQLCHAR 0 9999 "\",\"" 18 NUM_TELEX ""

20 SQLCHAR 0 9999 "\",\"" 19 MRST ""

21 SQLCHAR 0 9999 "\",\"" 20 GENDER ""

22 SQLCHAR 0 9999 "\",\"" 21 SUBJGRO ""

23 SQLCHAR 0 9999 "\",\"" 22 SUBJCPT ""

24 SQLCHAR 0 9999 "\",\"" 23 DAT_BIRTH ""

25 SQLCHAR 0 9999 "\",\"" 24 NCBA ""

26 SQLCHAR 0 9999 "\",\"" 25 NCO ""

27 SQLCHAR 0 9999 "\",\"" 26 ACSLEVEL ""

28 SQLCHAR 0 9999 "\",\"" 27 COD_LNG ""

29 SQLCHAR 0 9999 "\",\"" 28 NUM_LADR ""

30 SQLCHAR 0 9999 "\",\"" 29 NUM_NTFLAST ""

31 SQLCHAR 0 9999 "\",\"" 30 NUM_CONFLAST ""

32 SQLCHAR 0 9999 "\",\"" 31 DAT_SNR ""

33 SQLCHAR 0 9999 "\",\"" 32 DAT_LASTANN ""

34 SQLCHAR 0 9999 "\",\"" 33 DAT_CUSREG ""

35 SQLCHAR 0 9999 "\",\"" 34 BRN_LMODP ""

36 SQLCHAR 0 9999 "\",\"" 35 DATE_STP ""

37 SQLCHAR 0 9999 "\",\"" 36 SEGMENTP ""

38 SQLCHAR 0 9999 "\",\"" 37 PREFCHAN ""

39 SQLCHAR 0 9999 "\",\"" 38 CUS_STATUS ""

40 SQLCHAR 0 9999 "\",\"" 39 BRN_REG ""

41 SQLCHAR 0 9999 "\",\"" 40 BONUS ""

42 SQLCHAR 0 9999 "\",\"" 41 REGCHAN ""

43 SQLCHAR 0 9999 "\",\"" 42 TYP_LEGAL ""

44 SQLCHAR 0 9999 "\",\"" 43 SUBSEG ""

45 SQLCHAR 0 9999 "\",\"" 44 NAME2 ""

46 SQLCHAR 0 9999 "\r\ n" 45 FAS ""

[font="Arial Black"]EDIT... [/font]I had to add a space between the "\" and the "n" on line 46 because the forum software consumed it. You need to remove that space if you use this format file.

Last but not least, you should set MAXERRORs to 2 billion and identify an ERRORFILE. That way, if you do get an error, you'll at least have some way of finding it. Actually, this will allow you to find ALL of the errors in a file at once and the bad rows will be sequestered in the ERRORFILE so you don't actually have to go looking for them. I would also set the CODEPAGE to RAW just incase someone changes the default code page for some reason. As with setting COLLATION to some binary format, it also seems to help performance a bit. Of course, you should be using a staging table for this unless missing data doesn't cause a problem or you always have perfect data files and no one ever changes anything on you and there were no transmission errors and... etc, etc, etc.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

Thank you for all of your recommendations. I will try all of them (I might not be able to use the error file because of permissions to write on disk).

I will try to work again with text format files as you made them look so simple. I had some troubles because SQL Server couldn't read them and I couldn't figure out what happened.

Happy New Year and thanks for all your help.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jeff Moden SSC Guru Points: 1004558 More actions · Answer 8

Luis Cazares (1/1/2014)
Thank you for all of your recommendations. I will try all of them (I might not be able to use the error file because of permissions to write on disk).
I will try to work again with text format files as you made them look so simple. I had some troubles because SQL Server couldn't read them and I couldn't figure out what happened.
Happy New Year and thanks for all your help.

You may have to set CODEPAGE = 'RAW' in the BULK INSERT statement.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Double quotes on Bulk Insert

Cookies on SQLServerCentral