July 7, 2018 at 12:44 pm
Ali M - Saturday, July 7, 2018 7:03 AMBruin - Friday, July 6, 2018 12:04 PMDoes the order of the csv have to match sql schema and what about if schema has 8 fields and csv has 4 fields.Many thanks again!!!
- The order of the columns in the CSV must match that of the table schema
- The number of columns in the CSV must also match the table schema.
- If, as previously discussed you use a staging table for the import, then the two constraints above are satisfied
- If the CSV is without headers, and you modify my script as detailed then it should still work fine. The datatable columns are populated by the select top 1 * on the table you're importing to
- Note also that if you want a delimiter other than comma, there is a -Delimiter argument to Import-Csv to achieve this
Thanks!!!!
July 8, 2018 at 11:34 pm
been away (out of mobile range) for a few days. This has been an extremely enlightening discussion.
thanks everyone for contributing
____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
βlibera tute vulgaris exβ
July 9, 2018 at 6:30 am
Stewart "Arturius" Campbell - Sunday, July 8, 2018 11:34 PMbeen away (out of mobile range) for a few days. This has been an extremely enlightening discussion.
thanks everyone for contributing
Good Sir, thank you for kicking it off with a great article!
--Jeff Moden
Change is inevitable... Change for the better is not.
July 9, 2018 at 9:16 am
If not to much trouble could you post the entire script adding in the piece for Headers I tried to add in the extra code, but maybe a copy paste problem as I have
a syntax problem.
Thanks again.
July 9, 2018 at 11:28 am
I was testing If I use the switch TruncateLongValues it gets this error.
ForEach-Object : Cannot index into a null array.
July 9, 2018 at 11:42 am
Thanks for sharing this. Reminded me how much I hate comma delimited files, arguably the worst idea ever to take off in data processing. I'm mean seriously, of all the delimiters to choose from someone chose commas?? And now the years of inertia behind it continue to slow down many an IT worker.
July 9, 2018 at 11:56 am
HighPlainsDBA - Monday, July 9, 2018 11:42 AMThanks for sharing this. Reminded me how much I hate comma delimited files, arguably the worst idea ever to take off in data processing. I'm mean seriously, of all the delimiters to choose from someone chose commas?? And now the years of inertia behind it continue to slow down many an IT worker.
Nope... not the worst idea. That would be XML. π
--Jeff Moden
Change is inevitable... Change for the better is not.
July 9, 2018 at 11:59 am
How would script handle:
What about input files where the absence of a value is indicated by empty string? We get raw files that look like
"Some String";value1;value2;;;;;value7;value8
Thanks again.
July 9, 2018 at 2:10 pm
Jeff Moden - Monday, July 9, 2018 11:56 AMHighPlainsDBA - Monday, July 9, 2018 11:42 AMThanks for sharing this. Reminded me how much I hate comma delimited files, arguably the worst idea ever to take off in data processing. I'm mean seriously, of all the delimiters to choose from someone chose commas?? And now the years of inertia behind it continue to slow down many an IT worker.Nope... not the worst idea. That would be XML. π
That's a good one too! I wonder how many countless hours of productivity have been lost to that one as well? What's even worse about XML IMO is all the hype that came with it. I think the Segway scooter was only slightly more anticipated at the time but I don't know.... the years all start to blur together after awhile. π
July 9, 2018 at 2:36 pm
HighPlainsDBA - Monday, July 9, 2018 2:10 PMJeff Moden - Monday, July 9, 2018 11:56 AMHighPlainsDBA - Monday, July 9, 2018 11:42 AMThanks for sharing this. Reminded me how much I hate comma delimited files, arguably the worst idea ever to take off in data processing. I'm mean seriously, of all the delimiters to choose from someone chose commas?? And now the years of inertia behind it continue to slow down many an IT worker.Nope... not the worst idea. That would be XML. π
That's a good one too! I wonder how many countless hours of productivity have been lost to that one as well? What's even worse about XML IMO is all the hype that came with it. I think the Segway scooter was only slightly more anticipated at the time but I don't know.... the years all start to blur together after awhile. π
Heh... while we're on the subject... EDI has to be the 2nd worst and Adobe Post Script has to be tied with PDF for the worst. And, yeah, I agree... have seen a whole lotta whiz-bang stuff come and go over the decades and wish some of the rest of it would go. π
--Jeff Moden
Change is inevitable... Change for the better is not.
July 9, 2018 at 2:58 pm
Bruin - Monday, July 9, 2018 11:59 AMHow would script handle:What about input files where the absence of a value is indicated by empty string? We get raw files that look like
"Some String";value1;value2;;;;;value7;value8Thanks again.
Bear in mind that the more logic that goes in the PowerShell will affect the performance!
Should now handle the following
param
(
[string]$Path,
[string]$ConnectionString,
[string]$Table,
[string]$Delimiter = ',',
[int]$BatchSize = 75000,
[switch]$StopOnError,
[switch]$TruncateLongValues,
[switch]$NoHeaders,
[switch]$EmptyValueIsNull
)$ErrorActionPreference = 'Stop'
$columnLengthSql = @"
select
c.[max_length]
from sys.columns c
inner join sys.objects o
on c.object_id = o.object_id and o.type = 'U'
where o.Name = N'{0}'
order by c.column_id
"@
try
{
# Connect to SQL Server
$conn = New-Object System.Data.SqlClient.SqlConnection($ConnectionString)
$conn.Open()
# Read the column widths, which we will use to check string (char type) columns
$columnWidths = Invoke-Command -NoNewScope -ScriptBlock {
try
{
$cmd = $conn.CreateCommand()
$cmd.CommandText = [string]::Format($columnLengthSql, $Table)
$cmd.CommandType = 'Text'
$rdr = $cmd.ExecuteReader()
while ($rdr.Read())
{
# Emit with into array
[int]$rdr[0]
}
}
finally
{
($rdr, $cmd) |
ForEach-Object {
$_.Dispose()
}
}
}
# Number of columns in the target table
$columnCount = $columnWidths.Count
# Set up BCP stuff
$bcpOption = ([System.Data.SqlClient.SqlBulkCopyOptions]::TableLock -bor [System.Data.SqlClient.SqlBulkCopyOptions]::UseInternalTransaction)
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($conn, $bcpOption, $null)
$bulkcopy.DestinationTableName = $Table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $BatchSize
$datatable = New-Object System.Data.DataTable
# Get the column data for the given table
# Sneakily selecting 1 row from the table puts the schema into the datatable
try
{
$sql = 'select top 1 * from [' + $Table + ']'
$dad = New-Object System.Data.SqlClient.SqlDataAdapter($sql, $conn)
[void]$dad.Fill($datatable)
}
finally
{
$dad.Dispose()
}
# If we read a row, clear it.
$datatable.Clear()
# Init row counter
$i = 0
# Headers, or not?
$headerArgument = @{}
if ($NoHeaders)
{
# If user specifies -NoHeaders, generate dummy headers - as many as there are columns
$headings = Invoke-Command -NoNewScope -ScriptBlock {
for ($i = 0; $i -lt $columnWidths.Length; $i++)
{
"Dummy$($i)"
}
}
$headerArgument = @{ Header = $headings }
}
# Let Import-Csv deal with delimiter nonsense!
Import-Csv -Path $Path -Delimiter $Delimiter @headerArgument |
ForEach-Object {
try
{
# Validate imput column count
# Import-Csv *ignores* extra columns in the input, so we will never know about them!
# Empty columns, e.g. blah,,blah are rendered as empty strings.
# If there are too few values, the remaining columns are rendered as $null (.NET null, not database null)
$populatedColumnCount = ($_.PSObject.Properties.Value | Where-Object { $_ -ne $null } | Measure-Object).Count
if ($populatedColumnCount -ne $columnCount)
{
throw "Incorrect number of columns in input. Got $($populatedColumnCount), expected $columnCount"
}
if ($TruncateLongValues -or $EmptyValueIsNull)
{
# Check columns - this will slow things down somewhat
for ($col = 0; $col -lt $datatable.Columns.Count; ++$col)
{
$inputValue = $_.PSObject.Properties.Value[$col].ToString()
if ($EmptyValueIsNull -and [string]::IsNullOrEmpty($inputValue))
{
$_."$($_.PSObject.Properties.Name[$col])" = [System.DBNull]::Value
}
elseif ($datatable.Columns[$col].DataType.FullName -eq 'System.String' -and $inputValue.Length -gt $columnWidths[$col])
{
Write-Warning "Row $($i + 1), Col $($col + 1): Value truncated"
$_."$($_.PSObject.Properties.Name[$col])" = $inputValue.Substring(0, $columnWidths[$col])
}
}
}
[void]$datatable.Rows.Add($_.PSObject.Properties.Value)
}
catch
{
# Column datatype mismatch
if ($StopOnError)
{
# Stop immediately
throw
}
# Warn the user a row didn't go in and continue
Write-Warning "Row $($i + 1): $($_.Exception.Message)"
}
if ((++$i % $BatchSize) -eq 0)
{
# Write batch
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
}
if ($datatable.Rows.Count -gt 0)
{
# Write remaining rows
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
}
catch
{
# A good script will handle exceptions here,
# e.g. failure to connect to SQL server, incorrect/un-castable datatype for column etc.
# for now, just re-throw
throw
}
finally
{
# Clean up in finally block - ensures resources are released even in event of errors.
($bulkcopy, $datatable, $conn) |
Where-Object { $_ -ne $null } |
ForEach-Object {
$_.Dispose()
}
}
July 9, 2018 at 5:45 pm
Awesome!!!
Will run some test cases.
Thanks again.
July 9, 2018 at 7:34 pm
I created a view so my csv matches my sql schema(staging table). I used these switches
-TruncateLongValues -EmptyValueIsNull
Message from script I got this for all rows
WARNING: Row 1: Incorrect number of columns in input. Got 168, expected 0
The actual staging table has 169 fields, but the view has 168 and matches the csv.
Have I got something setup wrong?
Thanks for your help
Script is working great!!! the error above was when it went to find column lengths is was looking for a Table in the query
I just changed it to look for the View 'V'
select
c.[max_length]
from sys.columns c
inner join sys.objects o
on c.object_id = o.object_id and o.type = 'V' -- View instead of Table 'U'
where o.Name = N'{0}'
order by c.column_id
Many Thanks again!!!
July 9, 2018 at 10:42 pm
Jeff Moden - Monday, July 9, 2018 11:56 AMHighPlainsDBA - Monday, July 9, 2018 11:42 AMThanks for sharing this. Reminded me how much I hate comma delimited files, arguably the worst idea ever to take off in data processing. I'm mean seriously, of all the delimiters to choose from someone chose commas?? And now the years of inertia behind it continue to slow down many an IT worker.Nope... not the worst idea. That would be XML. π
I must definitely agree with you here, Jeff
____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
βlibera tute vulgaris exβ
July 10, 2018 at 10:47 am
When using $TruncateLongValues -or $EmptyValueIsNull the process slows way down. I have a 168 row csv and loads perfectly, but
is there anything that could speed up the process?
Thanks again.
Viewing 15 posts - 46 through 60 (of 76 total)
You must be logged in to reply to this topic. Login to reply