August 29, 2008 at 1:58 am
Here is what I did.
create table data(data varchar(max));
INSERT INTO data(data)
SELECT * FROM
OPENROWSET(BULK 'D:\docs\DB\oracle.txt', SINGLE_BLOB) as f ;
declare @cmd varchar(512);
SET @cmd =
'bcp "SELECT data FROM woof.dbo.data" queryout c:\1.txt -T -c -C 1251';
EXEC master..xp_cmdshell @cmd;
Even the Russian characters were preserved perfectly whithin the text file and no jarbled ones.
August 29, 2008 at 2:00 am
Errrrrr this line is too long. You could've shown me at least the code
of the create table I would've believed you.
I've just tried even to dump the mp3 and pdf files with this code
I haven't noticed any corruption.
Show me please the output of your
xp_msver.
August 29, 2008 at 3:48 am
Errrrrr this line is too long. You could've shown me at least the code
of the create table I would've believed you.
I've just tried even to dump the mp3 and pdf files with this code
I haven't noticed any corruption.
Show me please the output of your
xp_msver.
OK - the Create table and insert is as follows:
[highlight="#FFFFCC"]CREATE TABLE TestFiles (
FileId bigint IDENTITY(1,1),
FileDesc nvarchar(100),
FileData varbinary(max)
)
INSERT INTO TestFiles(FileDesc, FileData)
SELECT
'This is my uploaded file',
(SELECT * FROM OPENROWSET(BULK '\\NTSERVER3\Users\Robert\Data\Dump1.txt', SINGLE_BLOB) as f);[/highlight]
When I bcp using your latest suggestion the output file contains ascii characters that are the hex representation of those ascii characters. So if the original file began with the characters 'pti', the output file begins '707469'. Hence the output file is exactly twice as long as the source file except that it has a carriage return line feed combination immediately after the strangely formatted data.
Output from xp_msver is as follows:
[highlight="#FFFFCC"]IndexNameInternal_ValueCharacter_Value
1ProductNameNULLMicrosoft SQL Server
2ProductVersion5898249.00.1399.06
3Language1033English (United States)
4PlatformNULLNT INTEL X86
5CommentsNULLNT INTEL X86
6CompanyNameNULLMicrosoft Corporation
7FileDescriptionNULLSQL Server Windows NT
8FileVersionNULL2005.090.1399.00
9InternalNameNULLSQLSERVR
10LegalCopyrightNULL© Microsoft Corp. All rights reserved.
11LegalTrademarksNULLMicrosoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation
12OriginalFilenameNULLSQLSERVR.EXE
13PrivateBuildNULLNULL
14SpecialBuild91684864NULL
15WindowsVersion2483819575.2 (3790)
16ProcessorCount44
17ProcessorActiveMask150000000f
18ProcessorType586PROCESSOR_INTEL_PENTIUM
19PhysicalMemory81918191 (8588959744)
20Product IDNULLNULL[/highlight]
And yes I know I'm running this on a SQL installation that hasn't had the service packs installed yet.
August 30, 2008 at 2:45 am
Robert (8/29/2008)
Errrrrr this line is too long. You could've shown me at least the code
of the create table I would've believed you.
I've just tried even to dump the mp3 and pdf files with this code
I haven't noticed any corruption.
Show me please the output of your
xp_msver.
OK - the Create table and insert is as follows:
[highlight="#FFFFCC"]CREATE TABLE TestFiles (
FileId bigint IDENTITY(1,1),
FileDesc nvarchar(100),
FileData varbinary(max)
)[/highlight]
First of all I would strongly reccomend you to use
the varchar(max) type for the text files.
[highlight="#FFFFCC"]INSERT INTO TestFiles(FileDesc, FileData)
SELECT
'This is my uploaded file',
(SELECT * FROM OPENROWSET(BULK '\\NTSERVER3\Users\Robert\Data\Dump1.txt', SINGLE_BLOB) as f);[/highlight]
Try to use (SELECT * FROM OPENROWSET(BULK '\\NTSERVER3\Users\Robert\Data\Dump1.txt', SINGLE_CLOB) in conjunction with the varchar(max)
When I bcp using your latest suggestion the output file contains ascii characters that are the hex representation of those ascii characters. So if the original file began with the characters 'pti', the output file begins '707469'. Hence the output file is exactly twice as long as the source file except that it has a carriage return line feed combination immediately after the strangely formatted data.
Ok looks like by default the bcp dumps the file using the UTF8 that what makes your
file twice as big - for each character it takes two bytes although you have to use -w
option in order to dump the text file using Unicode characters instead of -N for binaries.
So try to use e.g.
bcp "select field from yourdb.table" -T -c -C 1250.
Output from xp_msver is as follows:
[highlight="#FFFFCC"]IndexNameInternal_ValueCharacter_Value
1ProductNameNULLMicrosoft SQL Server
2ProductVersion5898249.00.1399.06
3Language1033English (United States)
4PlatformNULLNT INTEL X86
5CommentsNULLNT INTEL X86
6CompanyNameNULLMicrosoft Corporation
7FileDescriptionNULLSQL Server Windows NT
8FileVersionNULL2005.090.1399.00
9InternalNameNULLSQLSERVR
10LegalCopyrightNULL© Microsoft Corp. All rights reserved.
11LegalTrademarksNULLMicrosoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation
12OriginalFilenameNULLSQLSERVR.EXE
13PrivateBuildNULLNULL
14SpecialBuild91684864NULL
15WindowsVersion2483819575.2 (3790)
16ProcessorCount44
17ProcessorActiveMask150000000f
18ProcessorType586PROCESSOR_INTEL_PENTIUM
19PhysicalMemory81918191 (8588959744)
20Product IDNULLNULL[/highlight]
And yes I know I'm running this on a SQL installation that hasn't had the service packs installed yet.
Nasty 🙂
September 1, 2008 at 3:22 am
First of all I would strongly reccomend you to use the varchar(max) type for the text files.
For the purposes of experimentation I did so but really I'm looking for a generic solution to store and retrieve any kind of file to and from a database.
Try to use (SELECT * FROM OPENROWSET(BULK '\\NTSERVER3\Users\Robert\Data\Dump1.txt', SINGLE_CLOB) in conjunction with the varchar(max)
I did this with the amended table structure.
Ok looks like by default the bcp dumps the file using the UTF8 that what makes your
file twice as big - for each character it takes two bytes although you have to use -w
option in order to dump the text file using Unicode characters instead of -N for binaries.
So try to use e.g.
bcp "select field from yourdb.table" -T -c -C 1250.
The output file was 2 bytes larger than the input file and those two bytes were an extra carriage return line feed combination on the end of the output file.
I'm beginning to wish I'd never started this!
September 4, 2008 at 2:27 pm
Another option is to use CLR, even though not everyone is a fan of that option. The SQL# library that is available at http://www.SQLsharp.com/ has functions to do this. Since those functions are not available in the free version, I can say that you can accomplish this by using the BinaryReader and BinaryWriter classes and the SqlBytes datatype to map to VARBINARY(MAX).
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
September 5, 2008 at 1:24 am
Another option is to use CLR, even though not everyone is a fan of that option. The SQL# library that is available at http://www.SQLsharp.com/ has functions to do this. Since those functions are not available in the free version, I can say that you can accomplish this by using the BinaryReader and BinaryWriter classes and the SqlBytes datatype to map to VARBINARY(MAX).
Thanks I'll try that.
Regards
Robert
September 15, 2008 at 4:28 pm
Robert (8/28/2008)
I'm also having trouble implementing this without corruption of the file after it's undergone a round trip. I am importing a text file using OPENROWSET and as far as I can see that arrives in the varbinary field perfectly. The query:SELECT DATALENGTH(FileData) FROM TestFiles
returns a number exactly equal to the test file size.
However when I output it back to a file using the options you specify for bcp, I get a file that is 4 bytes longer than the original. The 4 extra bytes are right at the beginning of the file before the expected data starts and are as follows (in hex):
B5 42 00 00
Any ideas?
OK. Here is the answer for this I've done some research during the spare time.
The research is called RTFM 🙂
The article here called
http://msdn.microsoft.com/en-us/library/ms191212.aspx
The MSSQL2005-2008 create the 8 byte prefix at the beginning of the file
during the bulk export. To avoid it - use the .fmt file.
My bcp.fmt file looks like this:
9.0
1
1 SQLBINARY 0 0 "" 1 data ""
Then the bcp command :
bcp "SELECT data FROM testdb.dbo.data WHERE file_id=1" queryout 0.dat -T -f bcp.fmt
That would be the solution.
September 22, 2008 at 2:42 am
Great - that was indeed the answer EXCEPT that I had to have 8.0 at the top of the format file rather than 9.0 (my output file had a 4 byte prefix rather than 8 bytes).
Thanks for your help.
Robert
April 7, 2010 at 10:24 am
EDIT: Wow... This was boneheaded of me. I didn't see there were two other pages after the first, in which DE's questioned were eventually answered. Leaving the rest of my post here since I can't delete it outright...
Apologies for the necroposting, but this thread came up a handful of times when I was googling around trying to find a solution to the very same problem.
In short, DeafEater's problem is real, not imagined. I've had the same problem, and it doesn't matter whether I'm connecting to SQL Server 2000, 2005, or 2008. It doesn't matter what version of the client tools I'm using.
In my case, I'm importing and exporting binary files, mostly ZIPs. I'm importing files into the table like so:
CREATE TABLE FileStore (
[id] [int] IDENTITY(1,1) NOT NULL,
[filename] [varchar](255) NULL,
[created] [datetime] NULL,
[filedata] [varbinary](max) NULL,
)
insert FileStore (filename, created, filedata) select '$(_FILENAME)', GETDATE(), BulkColumn from Openrowset ( Bulk '$(_FILENAME)', Single_Blob) as filedata
Originally, I exported ZIP files from the database with:
bcp "select top 1 filedata from dbiutil.dbo.FileStore order by created desc" queryout test01.zip -S fssrv -T -n
Which appeared to work fine, but test01.zip was always four bytes larger than it should be- and WinZip wasn't able to open it. Later, I figured out that Info-ZIP (zip.exe and unzip.exe) and 7zip (7za.exe) had no problems extracting the ZIP. However, Info-ZIP gives this warning:
Archive: test0.zip
warning [test0.zip]: 4 extra bytes at beginning or within zipfile
(attempting to process anyway)
creating: test01/
inflating: test01/a
inflating: test01/b
Naturally, this stumped me. After running into a few forum threads here and elsewhere, I tried doing the BCP without the -n (native type) switch...
c:\bcptesting>bcp "select top 1 filedata from dbiutil.dbo.FileStore order by created desc" queryout test02.zip -S fssrv -T
Enter the file storage type of field filedata [image]:
Enter prefix-length of field filedata [4]: 0
Enter length of field filedata [0]:
Enter field terminator [none]:
Do you want to save this format information in a file? [Y/n] y
Host filename [bcp.fmt]:
Starting copy...
1 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.): total 6000
This time, test01.zip had no extra bytes, and I was able ot open it in WinZip, Info-ZIP, and 7zip without errors or warnings. Indeed, the md5sum hash of the test02.zip that I exported with this method was the same as the original file.
It should be obvious to anyone reading this that those four extra bytes is the "prefix-length of field filedata" mentioned above. By specifically telling bcp to use 0 things are imported and exported without any corruption or added bytes. Since running through this interactively is hardly what we want, just take the resulting bcp.fmt and use it for any future exports...
c:\cbtesting>bcp "select top 1 filedata from dbiutil.dbo.FileStore order by created desc" queryout test03.zip -S fssrv -T -f bcp.fmt
Starting copy...
1 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.): total 5922
Again, test03.zip has the same md5sum hash as the original file and test02.zip.
It's a bit tedious to have to ship an extra file like this, but at least there's some solution. It works the same way no matter what version of SQL Server you're using from 2000 on; same goes for the client tools. Microsoft forgot to add a parameter for bcp.exe for specifying the "prefix-length of field filedata," forcing you to run through the manual process once first.
BCP doesn't have a bug, it's just poorly designed. 😛
Aaron
May 9, 2013 at 6:56 am
Hey,
I'm having the exact same problem and just posted a detailed description at MSDN forums yesterday [/url] Unfortunately, I haven't gotten any answers there yet.
I found a posting this morning that suggested the problem is that the extra characters are because the BCP QUERYOUT utility is adding the file length at the beginning of the VARBINARY field contents before reading it in the external file. The solution is supposed to be creating format files, though I don't know if you need a different file for each file type (PDF, XLS, DOC, etc.) Here's the posting
I'll post something back if I can get this to work.
Good luck,
Bill
May 9, 2013 at 7:21 am
Hi,
I'm having the exact same problem. Here's my post on it over in MSDN forums [/url]
I found some postings elsewhere that suggested the problem is the BCP QUERYOUT is adding the file length to the beginning of the contents--those are the extra characters [/url]
Apparently the solution is to specify a format file during your output that stops the utility from adding the file length data. I'm going to play with that in the next day or two and will post back if I find something.
Good luck,
Bill
May 9, 2013 at 9:41 am
You should read the whole thread first because it has been answered years ago.
May 9, 2013 at 10:27 am
Hey, thanks. Yes, the interface was very confusing. I responded twice because I was Replying to the post at the bottom of Page 1 and it kept refreshing without showing my reply appended the post I was replying to. I will study the second and third pages carefully. Cheers, Bill
January 16, 2018 at 9:48 am
Off topic, but I've been playing with R on and off for the last 6 months and it's a lot better for this type of thing than SQL. You can write a function to read every file in a directory or one at a time and analyze them.
Last week I even found it easier to use R to analyze some MSDB data than doing it through SQL
Viewing 15 posts - 16 through 29 (of 29 total)
You must be logged in to reply to this topic. Login to reply