July 2, 2018 at 11:04 pm
Comments posted to this topic are about the item Importing Tricky CSV Files with PowerShell
____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”
July 3, 2018 at 1:48 am
This is great, exactly what I was looking for....
July 3, 2018 at 2:43 am
This has been a personal annoyance for years.
Surely, the import engine should properly respect the double quotes (or whatever specified character) as string delimiters, if so directed. We would not then have to hack at the user's text by removing commas or any such workarounds.
July 3, 2018 at 3:58 am
Interesting to see PowerShell used for this. (Must add "Learn PowerShell" to my @todo list!) To circumvent the issue where there are pipes already in the data, you could add something to replace them in the data line variable before processing char-by-char, then put them back in before writing.
We did similar to this using c# in SSIS, including directly importing the files. I solved the CSV format issues by pre-processing the CSV files using a little routine I wrote in Go (@golang). Go, PHP and several other languages do not suffer from the crippling flaw in Microsoft products of being unable to correctly import CSV files unless they are of the very simplest form. Many, including Go, can correctly process files containing the separator, carriage returns, linefeeds, tabs, backspaces, etc embedded within quoted fields.
It would be great for all of us if Microsoft addressed that!
p.s. I wrote, compiled and tested my Go programme on my Mac. I then compiled a Microsoft Server executable still on my Mac, copied the (single stand-alone) executable to the server; it ran perfectly. That is one of Go's party tricks! It is also fast, typically processing a 1.5Gb csv file with 3.6M rows in around 90s on our server.
July 3, 2018 at 5:28 am
This is cool!
If a quote-delimited string looks like this:
"1313 Mockingbird Lane, Detroit, MI 48690"
(two commas), does this logic still work?
Same question for quote-delimited strings like this:
"This is a ""test"" of parsing"
July 3, 2018 at 6:19 am
SADSAC - Tuesday, July 3, 2018 1:48 AMThis is great, exactly what I was looking for....
Glad to have been of some assistance
____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”
July 3, 2018 at 6:20 am
Jerry Kelk - Tuesday, July 3, 2018 2:43 AMThis has been a personal annoyance for years.
Surely, the import engine should properly respect the double quotes (or whatever specified character) as string delimiters, if so directed. We would not then have to hack at the user's text by removing commas or any such workarounds.
Agree wholeheartedly...
____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”
July 3, 2018 at 6:24 am
ssc 21649 - Tuesday, July 3, 2018 5:28 AMThis is cool!If a quote-delimited string looks like this:
"1313 Mockingbird Lane, Detroit, MI 48690"
(two commas), does this logic still work?
Same question for quote-delimited strings like this:"This is a ""test"" of parsing"
It does not matter how many comma's are encountered within the quotes, the logic will still handle it correctly.
even double-delimiting will work correctly.
____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”
July 3, 2018 at 6:52 am
Stewart "Arturius" Campbell - Tuesday, July 3, 2018 6:24 AMssc 21649 - Tuesday, July 3, 2018 5:28 AMThis is cool!If a quote-delimited string looks like this:
"1313 Mockingbird Lane, Detroit, MI 48690"
(two commas), does this logic still work?
Same question for quote-delimited strings like this:"This is a ""test"" of parsing"
It does not matter how many comma's are encountered within the quotes, the logic will still handle it correctly.
even double-delimiting will work correctly.
Cool, thanks for sharing this!
July 3, 2018 at 7:04 am
With the data I needed to import, the pipe ended up not being a good delimiter. So, especially in data fields full of user input, it's worth keeping in mind different alternatives. My solution was using a triple semicolon, which you should be hard pressed to find in normal user input, but I'd be interested to hear if anyone else had similar issues with different workaround delimiters.
July 3, 2018 at 7:11 am
Brian F - Tuesday, July 3, 2018 7:04 AMI used a VB script inside an SSIS package to accomplish the same functionality. This is a smart way to take care of this issue.With the data I needed to import, the pipe ended up not being a good delimiter. So, especially in data fields full of user input, it's worth keeping in mind different alternatives. My solution was using a triple semicolon, which you should be hard pressed to find in normal user input, but I'd be interested to hear if anyone else had similar issues with different workaround delimiters.
In the past, the trick I have often used is replacing the delimiters with ASCII 7 (the bell character). I have NEVER encountered an ASCII 7 in normal text data, so it has worked well for me (although, that's not to say it won't ever happen). 😉
July 3, 2018 at 7:31 am
How do you invoke the script -thanks
July 3, 2018 at 7:32 am
FYI, The Import-Csv cmdlet handles embedded quotes.
e.g.
$Header = "ProductKey","Code","Description","Narrative","Comment"
$J = Import-Csv test.csv -Header $Header
$J | Format-Table
Here's another version as an example that work too: (TMTOWTDI)
[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")
$Date = $args[0]
$ConnectionString = $args[1]
$Directory = $args[2]
$Table = $args[3]
$file="$Directory\$Table.csv" #This could be anything, the file just happens to be the same as the table name in this instance
$FirstRowColumnNames = $true
$batchsize = 75000 #This could, theoretically, be any number - i found that this has the least drain on cpu & resources
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($Connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $Table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize
$datatable = New-Object System.Data.DataTable
# USE Import-Csv -- BEGIN
$CSV_CONTENT = Import-Csv $file
# Get Headers from first line
$headers = $CSV_CONTENT[0].psobject.Properties.Name
foreach ($hdr in $headers) {
$datatable.Columns.Add($hdr) | Out-Null
}
# Values (rows)
foreach ( $csvRow in $CSV_CONTENT ) {
$row = $datatable.NewRow()
foreach ( $c in $headers ) {
$row[$c] = $csvRow.$c
}
$datatable.Rows.Add($row) | Out-Null
}
# USE Import-Csv -- END
if($datatable.Rows.Count -gt 0) {
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
$bulkcopy.Close()
$bulkcopy.Dispose()
$datatable.Dispose()
July 3, 2018 at 7:45 am
ssc 21649 - Tuesday, July 3, 2018 7:11 AMBrian F - Tuesday, July 3, 2018 7:04 AMI used a VB script inside an SSIS package to accomplish the same functionality. This is a smart way to take care of this issue.With the data I needed to import, the pipe ended up not being a good delimiter. So, especially in data fields full of user input, it's worth keeping in mind different alternatives. My solution was using a triple semicolon, which you should be hard pressed to find in normal user input, but I'd be interested to hear if anyone else had similar issues with different workaround delimiters.
In the past, the trick I have often used is replacing the delimiters with ASCII 7 (the bell character). I have NEVER encountered an ASCII 7 in normal text data, so it has worked well for me (although, that's not to say it won't ever happen). 😉
Ding-ding! Same here!
July 3, 2018 at 7:52 am
Bruin - Tuesday, July 3, 2018 7:31 AMHow do you invoke the script -thanks
Using the following setup:
1) CSV file = c:\tmp\ProductsTest.csv
2) PS script: c:\myscripts\csvLoad.ps1
3) Create table ProductsTest
create table ProductsTest (
ProductKey int
,Code varchar(50)
,Description varchar(100)
,Narrative varchar(500)
,Comment varchar(500)
)
Open Powershell in c:\myscripts directory and execute ( replace MyDB with your database name ):
PS C:\myscripts > .\csvLoad.ps1 20180703 "Data Source=(local);Initial Catalog=MyDB;Integrated Security=true" C:\tmp ProductsTest
Viewing 15 posts - 1 through 15 (of 76 total)
You must be logged in to reply to this topic. Login to reply