December 1, 2018 at 5:06 am
I'm importing a bunch of CSVs. They are TAB delimited and most of the char columns are quoted (I haven't looked yet to see what happens with any embedded quotes ... might be doubled-up and need sorting out too).
I don't particularly want to use FORMAT files because of the hassle of maintaining them - external to the normal code base.
I could knock up a pre-processor that strips out the chaff before the files are imported, but that's going to take me longer than "a bit of SQL".
However, my concern is that the "bit of SQL" might be a performance/TLog/etc hog? They are multi-million row tablesI'm looking at either using STUFF/REVERSE or SUBSTRING/LENUPDATE MyTable
SET
[MyCharCol1] = NullIf(
CASE WHEN [MyCharCol1] LIKE '"%"'
THEN REVERSE(STUFF(
REVERSE(STUFF([MyCharCol1], 1, 1, ''))
, 1, 1, '')
)
ELSE [MyCharCol1]
END
, '')
, [MyCharCol2] = NullIf(CASE WHEN [MyCharCol2] LIKE '"%"'
THEN SUBSTRING([MyCharCol2], 2, LEN([MyCharCol2])-2)
ELSE [MyCharCol2]
END
, '')
, ...
Any suggestions as to which would perform better? I suppose it depends whether LEN has to count-characters, or has that metadata available anyway. Or if there are any other (i.e. undesirable) side-effects of SUBSTRING / LEN
Do I need to use DATALENGTH() rather than LEN() ? I've got a mixture of NVarchar and Varchar columns to process, probably some of them will have trailing spaces ... probably ought to RTrim() those too
December 1, 2018 at 5:47 am
I think BCP needs a format file for this.
I tend to use Powershell for this type of processing with the Out-DataTable function:
https://gallery.technet.microsoft.com/scriptcenter/4208a159-a52e-4b99-83d4-8048468d29dd
I then use something like:
. "YourPath\Out-DataTable.ps1"
$Folder = 'YourFolder'
$NameDate = "$((Get-Date -Format s).substring(0,13).replace('T', '-'))"
$ConnectionString = "Data Source=YourDS; Database=YourDB; Trusted_Connection=True;";
$Files = Get-ChildItem -Path $Folder | Where-Object{!($_.PSIsContainer) -and ($_.Name -like "$NameDate*.csv")} | Sort-Object Name
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString
$connection = New-Object System.Data.SqlClient.SqlConnection($ConnectionString)
$connection.Open()
$sqlCommand = $connection.CreateCommand()
foreach ($File in $Files)
{
$Table = $File.Name.Substring(19).replace('.csv','')
#echo $Table
$sqlCommand.CommandText = "TRUNCATE TABLE dbo." + $Table
$result = $sqlCommand.ExecuteNonQuery()
$csvDataTable = Import-CSV -Path $File.FullName | Out-DataTable
$bulkCopy.DestinationTableName = $Table
$bulkCopy.WriteToServer($csvDataTable)
}
$bulkCopy.Close()
$connection.Close()
This works well enough for me as my maximum file size is 65K lines.
December 1, 2018 at 9:25 am
Thanks Ken, I'll give it a go 🙂
December 1, 2018 at 9:34 am
as your file is tab delimited you need to add -Delimiter "`t"
to the import-csv command.
also and if volumes are high you will wish to add batch size to it
$bulkCopy.BatchSize = 50000 -- 50k rows batch size
If for some reason your file is not fully CSV compliant you may need to look at other methods like
Microsoft.VisualBasic.FileIO.TextFieldParser example at https://www.powershellgallery.com/packages/dbatools/0.7.9.8/Content/functions%5CCsvSqlimport.ps1
or a combination of the above plus regex
January 20, 2019 at 11:24 pm
This was removed by the editor as SPAM
May 9, 2020 at 10:07 am
This was removed by the editor as SPAM
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply