September 10, 2010 at 8:41 am
Sure:
"ID"|"Name"|"URL"
443|Sally Mae|http://www.google.com/privacypolicy/ours.asp?myval1=12345_Standard&secondaryopt=abc|123
123|Apples|http://123.456.789.123/amazing_g?hl=en&groups=dogs|cats&referer=http://www.cnn.com/International/Russia
456|Apples|http://www.mywebsite.com/stupidvar=abc|123
789|Oranges|http://www.mywebsite.com/approach=north|west&a=2&b=4&c=23
In this case, the source of the problem is a pipe in a URL, in a pipe-delimited field. When generating these files, we would be able to escape the pipe however necessary, for example we could place a backslash before every pipe. However, it doesn't seem like that will work.
Also, that's just one example. I will have to parse other data files that include pipes in other fields (e.g. the first field, or a field in the middle).
I could avoid the problem if I used a different character separator, but it's not clear that there are any characters that are prohibited. I think some of the fields contain arbitrary textual data.
I could also avoid the problem by using a multi-character separator, but that would increase datafile size by a lot. Also, there'd still be the remote possibility of a clash.
September 10, 2010 at 11:23 am
I have modified your file to use CHAR(7) delimiters which seems to solve the problem. You can then BULK INSERT this file into a table as illustrated in the example below
IF NOT OBJECT_ID('tempdb.dbo.#t', 'U') IS NULL DROP TABLE #t;
CREATE TABLE #t (ID VARCHAR(50), Name VARCHAR(255), URL VARCHAR(8000));
BULK INSERT #t FROM 'C:\BellDelimitedFile.txt'
WITH
(
FIELDTERMINATOR = '', FIRSTROW = 2, TABLOCK
);
SELECT * FROM #t;
BellDelimitedFile.txt
"ID""Name""URL"
443Sally Maehttp://www.google.com/privacypolicy/ours.asp?myval1=12345_Standard&secondaryopt=abc|123
123Appleshttp://123.456.789.123/amazing_g?hl=en&groups=dogs|cats&referer=http://www.cnn.com/International/Russia
456Appleshttp://www.mywebsite.com/stupidvar=abc|123
789Orangeshttp://www.mywebsite.com/approach=north|west&a=2&b=4&c=23
September 10, 2010 at 12:06 pm
Cool... so if I understand you correctly, the change you're proposing is just to use a non-textual ASCII character (in this case ASCII 7), since it wouldn't show up in any textual data I am trying to import?
For the part where you say, FIELDTERMINATOR = '', are you proposing that I copy and paste the BEL character into my actual command? Or is there maybe an escape sequence I can use?
And just to clarify, this probably wouldn't work on binary data, right, because that could contain a matching byte, right? I don't think I will use binary data anyway. (EDIT: Or SHA1 etc output I suppose, too?)
September 10, 2010 at 1:06 pm
The idea of using the BELL character as a separator is to minimise the risk of a clash with any real characters in the file. The BELL character won't appear in the table because it is being used as the separator in the BULK INSERT. If you copy and paste my code snippet, you will see what I mean. Another possibility, to obviate the requirement of a separator altogether, would be to use a fixed width file. Conversely, this may increase the file size unacceptably.
September 10, 2010 at 1:34 pm
Ok -- but just to be extra clear, you're saying that I should actually use a literal , instead of say \b (or whatever the escape sequence is for bel)?
September 10, 2010 at 1:56 pm
It depends how you're using it. For instance in the format file it manifests itself as "\7"
SELECT * FROM OPENROWSET (BULK 'C:\BellDelimitedFile.txt', FORMATFILE = 'C:\Bell.fmt', FIRSTROW = 2) AS Z
Bell.fmt
7.0
3
1 SQLCHAR 0 0 "\7" 1 ID
2 SQLCHAR 0 0 "\7" 2 Name
3 SQLCHAR0 0 "\r\n" 3 URL
Viewing 6 posts - 16 through 20 (of 20 total)
You must be logged in to reply to this topic. Login to reply