BULK INSERT and CSVs

  • Sure:

    "ID"|"Name"|"URL"

    443|Sally Mae|http://www.google.com/privacypolicy/ours.asp?myval1=12345_Standard&secondaryopt=abc|123

    123|Apples|http://123.456.789.123/amazing_g?hl=en&groups=dogs|cats&referer=http://www.cnn.com/International/Russia

    456|Apples|http://www.mywebsite.com/stupidvar=abc|123

    789|Oranges|http://www.mywebsite.com/approach=north|west&a=2&b=4&c=23

    In this case, the source of the problem is a pipe in a URL, in a pipe-delimited field. When generating these files, we would be able to escape the pipe however necessary, for example we could place a backslash before every pipe. However, it doesn't seem like that will work.

    Also, that's just one example. I will have to parse other data files that include pipes in other fields (e.g. the first field, or a field in the middle).

    I could avoid the problem if I used a different character separator, but it's not clear that there are any characters that are prohibited. I think some of the fields contain arbitrary textual data.

    I could also avoid the problem by using a multi-character separator, but that would increase datafile size by a lot. Also, there'd still be the remote possibility of a clash.

  • I have modified your file to use CHAR(7) delimiters which seems to solve the problem. You can then BULK INSERT this file into a table as illustrated in the example below

    IF NOT OBJECT_ID('tempdb.dbo.#t', 'U') IS NULL DROP TABLE #t;

    CREATE TABLE #t (ID VARCHAR(50), Name VARCHAR(255), URL VARCHAR(8000));

    BULK INSERT #t FROM 'C:\BellDelimitedFile.txt'

    WITH

    (

    FIELDTERMINATOR = '', FIRSTROW = 2, TABLOCK

    );

    SELECT * FROM #t;

    BellDelimitedFile.txt

    "ID""Name""URL"

    443Sally Maehttp://www.google.com/privacypolicy/ours.asp?myval1=12345_Standard&secondaryopt=abc|123

    123Appleshttp://123.456.789.123/amazing_g?hl=en&groups=dogs|cats&referer=http://www.cnn.com/International/Russia

    456Appleshttp://www.mywebsite.com/stupidvar=abc|123

    789Orangeshttp://www.mywebsite.com/approach=north|west&a=2&b=4&c=23

  • Cool... so if I understand you correctly, the change you're proposing is just to use a non-textual ASCII character (in this case ASCII 7), since it wouldn't show up in any textual data I am trying to import?

    For the part where you say, FIELDTERMINATOR = '', are you proposing that I copy and paste the BEL character into my actual command? Or is there maybe an escape sequence I can use?

    And just to clarify, this probably wouldn't work on binary data, right, because that could contain a matching byte, right? I don't think I will use binary data anyway. (EDIT: Or SHA1 etc output I suppose, too?)

  • The idea of using the BELL character as a separator is to minimise the risk of a clash with any real characters in the file. The BELL character won't appear in the table because it is being used as the separator in the BULK INSERT. If you copy and paste my code snippet, you will see what I mean. Another possibility, to obviate the requirement of a separator altogether, would be to use a fixed width file. Conversely, this may increase the file size unacceptably.

  • Ok -- but just to be extra clear, you're saying that I should actually use a literal , instead of say \b (or whatever the escape sequence is for bel)?

  • It depends how you're using it. For instance in the format file it manifests itself as "\7"

    SELECT * FROM OPENROWSET (BULK 'C:\BellDelimitedFile.txt', FORMATFILE = 'C:\Bell.fmt', FIRSTROW = 2) AS Z

    Bell.fmt

    7.0

    3

    1 SQLCHAR 0 0 "\7" 1 ID

    2 SQLCHAR 0 0 "\7" 2 Name

    3 SQLCHAR0 0 "\r\n" 3 URL

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply