August 15, 2006 at 12:03 pm
Any one has any script or knows how to create a fixed width file from a table using BCP? I have about 300 mill rows in the table so I need to use BCP.
I am creating the format file as follows:
master..xp_cmdshell 'bcp DB.dbo.myTable format nul -c -f c:\ABC.fmt -T'
and using the BCP out as:
exec master..xp_cmdshell 'bcp "SELECT top 10000 * FROM myServer.DB.dbo.myTable WHERE col1 is not null" queryout "c:\test\ABC.txt" -f"c:\ABC.fmt" -SmyServer -T'
The format file seems good. One of the columns in the table is a Comments column which could have line feeds or carriage returns in the field values which is causing the values to skip the formatting.
sample data:
----------------------------------------------------------------------------
2272237Claassen, White and Asociates, P.C.Joliet36-4129374
2286837Mahoney, Silverman and Cross, LtdJoliet36-4201623
2288737Nadelhoffer, Kuhn, Mitche, Moss, SalWarrenville36-3539212
Deactivated per Colleen Ritter - See ID AA1159. 9/3/2004 11:27:43 AM
2345337Sachnoff & Weaver, Ltd.Chicago36-2735711
----------------------------------------------------------------------------
The value "Deactivated..." is in a separate line because of a carriage return. How do we fix these.
Thanks in advance.
******************
Dinakar Nethi
Life is short. Enjoy it.
******************
August 15, 2006 at 12:21 pm
Use native format instead of text
or
Update the data before exporting, replacing line breaks
August 15, 2006 at 3:22 pm
The text files I create have to be passed through some 3rd party ETL tool for encryption. So the files should be readable. If I use native format, the files come out in a non-readable format.
Also, I cannot modify existing data.
any other ideas?
******************
Dinakar Nethi
Life is short. Enjoy it.
******************
August 17, 2006 at 4:49 am
If you can't touch the existing data, you may scrub it while retrieving.
You can use a UDF to strip out CR/LF chars, something like below.
create function dbo.stripCRLF(@s as Nvarchar(4000))
returns Nvarchar(4000)
as
begin
while patindex('%' + char(10) + '%', @s-2) > 0
begin
set @s-2 = replace(@s, substring(@s, patindex('%' + char(10) + '%', @s-2), 1), '')
end
while patindex('%' + char(13) + '%', @s-2) > 0
begin
set @s-2 = replace(@s, substring(@s, patindex('%' + char(13) + '%', @s-2), 1), '')
end
return @s-2
end
go
Optionally you can create a view of the table with the 'stripped' column.
create view dbo.strippedFoo
as
select col1, col2, dbo.stripCRLF(someText) as col3, col4
from dbo.foo
go
Then you can bcp out either through the view or by query
bcp "select col1, col2, dbo.stripCRLF(someText), col4 from foo" queryout c:\myfile.txt -c -t; -T
bcp mydb.dbo.strippedFoo out c:\myfile.txt -c -t; -T
=;o)
/Kenneth
August 17, 2006 at 11:48 pm
Dinakar,
Kenneth's suggestion is an excellent one but, if the Comments column is VARCHAR or NVARCHAR, you don't need to create the UDF... just create the view with a small modification.... should run quite a bit faster...
create view dbo.strippedFoo
as
select col1, col2, REPLACE(REPLACE(REPLACE(someText,CHAR(13),' '),CHAR(10),' '),' ',' ') as col3, col4
from dbo.foo
go
--Jeff Moden
Change is inevitable... Change for the better is not.
August 21, 2006 at 12:37 pm
The view idea seems more feasible. I have tables with 200 mil rows. Need to see how it works out when I batch the bcp util. Thanks for the suggestion.
I also have issues with listing out column names as there are about 40+ columns and the length of the string is exceeding the limit of 1k. So the view might help there too.
Also I am having issues with NULL and empty values disturbing the alignment. Anyone has any ideas on how to fix that?
******************
Dinakar Nethi
Life is short. Enjoy it.
******************
August 21, 2006 at 7:46 pm
Rather than trying to BCP directly from a 200 Million row table (with or without the view for the mod), why don't you simply do a SELECT/INTO with the correct criteria (and mod formula, by the way) into a new table and BCP from that?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 21, 2006 at 11:21 pm
We are encrypting some of our columns in the table. The encrypting tool needs a fixed width file. So I need to create multiple smaller fixed width files which will be fed into the tool. The tool returns another text file with the columns encrypted. Then I would need to BCP IN or BULK INSERT the text files into a new table. Finally, add indexes, constraints, drop the old table, rename the new table to old.
******************
Dinakar Nethi
Life is short. Enjoy it.
******************
August 22, 2006 at 6:32 am
Wouldn't it be easier to spend a little extra money to buy a tool that does it in SQL Server instead of playing around with text files and all?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 22, 2006 at 9:50 am
I agree. But thats not my call. But this is a one time ordeal for encrypting existing data. For any new data it will be encrypted through the application layer.
******************
Dinakar Nethi
Life is short. Enjoy it.
******************
August 22, 2006 at 11:14 pm
Ok... so why can't you use the application to encrypt the data on a "one time basis"? Sorry, Dinakar, I just don't understand why this is a problem.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2006 at 7:31 am
Use a view as Jeff decribes to strip out cr/lf and format data
use -c and -t parameters instead of format file
exec master..xp_cmdshell 'bcp "SELECT top 10000 * FROM myServer.DB.dbo.myView WHERE col1 is not null" queryout "c:\test\ABC.txt" -c -t "" -SmyServer -T'
output layout from using -c and -t parameters will not be affected by nulls or empty columns just make sure column widths are correct and the data is formatted correctly where necessary
Far away is close at hand in the images of elsewhere.
Anon.
August 23, 2006 at 2:55 pm
Jeff
How would I encrypt existing data? I would need to write a tool to read each record, then encrypt it and update the record back in the table which will take forever.
David
I am trying out the view option. for a set of 5000 records the file turned out to be 80 MB. I was thinking of creating files of 500,000 records each. I had created flat files earlier using BCP + format files, and for 500,000 record batch the file size was around 150 MB. I just realized the ETL tool provided by the encrypting company also accepts flat files with delimiters too and not just fixed width files. So I would like to try that option. That will greatly reduce the file size.
******************
Dinakar Nethi
Life is short. Enjoy it.
******************
August 23, 2006 at 5:30 pm
I thought you said you have an app that will do the encryption of new data that goes into the table... I thought that maybe you could "trick" it into thinking that all the data was new data without having to do an export/encrypt/import.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2006 at 8:01 pm
yes it would but it would take a very long time to feed all the data through the app and let it encrypt and put the data in the table. Our web application will actually make a call to a separate server that is hosting the encryption passing in the SSN/other info that needs to be encrypted. The encrypting server will return an encrypted string. The string along with other info is fed into the table. So to simulate thie for millions of records would be almost impossible. The company providing the encryption has provide us an ETL tool that will take a fixed width file and return the file encrypting the columns that needed to be.
******************
Dinakar Nethi
Life is short. Enjoy it.
******************
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply