Remove Quotes from CSV

Question

Post reply

Remove Quotes from CSV

Kristen-173977

SSCrazy Eights

Points: 8687
More actions
December 1, 2018 at 5:06 am

#369211

I'm importing a bunch of CSVs. They are TAB delimited and most of the char columns are quoted (I haven't looked yet to see what happens with any embedded quotes ... might be doubled-up and need sorting out too).
I don't particularly want to use FORMAT files because of the hassle of maintaining them - external to the normal code base.
I could knock up a pre-processor that strips out the chaff before the files are imported, but that's going to take me longer than "a bit of SQL".
However, my concern is that the "bit of SQL" might be a performance/TLog/etc hog? They are multi-million row tablesI'm looking at either using STUFF/REVERSE or SUBSTRING/LEN
UPDATE MyTable
SET
[MyCharCol1] = NullIf(
CASE WHEN [MyCharCol1] LIKE '"%"'
THEN REVERSE(STUFF(
REVERSE(STUFF([MyCharCol1], 1, 1, ''))
  , 1, 1, '')
   )
ELSE [MyCharCol1]
END
, '')
   , [MyCharCol2] = NullIf(CASE WHEN [MyCharCol2] LIKE '"%"'
   THEN SUBSTRING([MyCharCol2], 2, LEN([MyCharCol2])-2)
   ELSE [MyCharCol2]
   END
  , '')
, ...
Any suggestions as to which would perform better? I suppose it depends whether LEN has to count-characters, or has that metadata available anyway. Or if there are any other (i.e. undesirable) side-effects of SUBSTRING / LEN
Do I need to use DATALENGTH() rather than LEN() ? I've got a mixture of NVarchar and Varchar columns to process, probably some of them will have trailing spaces ... probably ought to RTrim() those too
January 20, 2019 at 11:24 pm

This was removed by the editor as SPAM
May 9, 2020 at 10:07 am

This was removed by the editor as SPAM

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Ken McKelvey SSCoach Points: 18855 More actions · Answer 1

I think BCP needs a format file for this.

I tend to use Powershell for this type of processing with the Out-DataTable function:

https://gallery.technet.microsoft.com/scriptcenter/4208a159-a52e-4b99-83d4-8048468d29dd

I then use something like:
. "YourPath\Out-DataTable.ps1"

$Folder = 'YourFolder' $NameDate = "$((Get-Date -Format s).substring(0,13).replace('T', '-'))" $ConnectionString = "Data Source=YourDS; Database=YourDB; Trusted_Connection=True;";

$Files = Get-ChildItem -Path $Folder | Where-Object{!($_.PSIsContainer) -and ($_.Name -like "$NameDate*.csv")} | Sort-Object Name $bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString $connection = New-Object System.Data.SqlClient.SqlConnection($ConnectionString) $connection.Open() $sqlCommand = $connection.CreateCommand()

foreach ($File in $Files) { $Table = $File.Name.Substring(19).replace('.csv','') #echo $Table $sqlCommand.CommandText = "TRUNCATE TABLE dbo." + $Table $result = $sqlCommand.ExecuteNonQuery() $csvDataTable = Import-CSV -Path $File.FullName | Out-DataTable $bulkCopy.DestinationTableName = $Table $bulkCopy.WriteToServer($csvDataTable) }

$bulkCopy.Close()

$connection.Close()

This works well enough for me as my maximum file size is 65K lines.

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 2

Kristen-173977

SSCrazy Eights

Points: 8687

December 1, 2018 at 9:25 am

#2015201

Thanks Ken, I'll give it a go 🙂

frederico_fonseca SSCoach Points: 16167 More actions · Answer 3

as your file is tab delimited you need to add
-Delimiter "`t" to the import-csv command.

also and if volumes are high you will wish to add batch size to it $bulkCopy.BatchSize = 50000 -- 50k rows batch size

If for some reason your file is not fully CSV compliant you may need to look at other methods like
Microsoft.VisualBasic.FileIO.TextFieldParser example at https://www.powershellgallery.com/packages/dbatools/0.7.9.8/Content/functions%5CCsvSqlimport.ps1

or a combination of the above plus regex