July 5, 2018 at 7:34 am
Bruin - Thursday, July 5, 2018 4:05 AM"Creating specific datatypes for the datatable's columns is another exercise altogether, though I would do it by querying the schema of the table."can you drop a example of how this is done?
another issue I always run into is field lengths of csv versus sql def. How can you
1) detect mismatch and write to error log or
2) truncate on insert and show error log of data row.I was testing using your example and ran into this condition
Exception calling "WriteToServer" with "1" argument(s): "Received an invalid column length from the bcp client
Thanks very interested on how people handle these conditions without affecting the performance of a large import.
I tell BCP or BULK INSERT to use the built-in features to sequester bad rows to a separate file. The bad part is that the built in error indicators that are stored are rather cryptic and, even if you understand them, you come to find out that the error messages being stored are rather useless for troubleshooting. The error messages are no where near as helpful as when you run things manually, which tell you row, column number, column name, and what the error is.
My work around is to have the machine "pretend" that a human was running it. I use xp_CmdShell to call SQLCMD and capture the output from the xp_CmdShell call. It not only identifies what the errors were in a human readable form that's easy to find and "parse" but it also provides confirmation of the row counts, number of errors, and duration.
As for those that think using xp_CmdShell for such things constitutes a security risk, you really need to reconsider because, when done properly, it's not and it brings a whole 'nuther world of functionality into play that most only dream of or have to write some pretty nasty external code to do some of the same things.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 5, 2018 at 11:23 am
Jeff Moden - Thursday, July 5, 2018 7:34 AMAs for those that think using xp_CmdShell for such things constitutes a security risk, you really need to reconsider because, when done properly, it's not and it brings a whole 'nuther world of functionality into play that most only dream of or have to write some pretty nasty external code to do some of the same things.
I agree, if a single entity owns both the database and the server. The risk comes into play with a scenario like: you have multiple customers in separate databases on one server. In that case you may want a customer to have control over their database, but not over the server (such that they could compromise security or stability of other databases). In such a case, granting the customer xp_CmdShell does constitute a security risk. Same thing is true for TRUSTWORTHY databases, "unsafe" CLR assemblies, etc.
This would be a topic worthy of it's own discussion / article / etc.
July 5, 2018 at 4:02 pm
@bruin - here you go. It's somewhat longer now but does type checking and (optionally with command line switch) char column width
param
(
[string]$Path,
[string]$ConnectionString,
[string]$Table,
[int]$BatchSize = 75000,
[switch]$StopOnError,
[switch]$TruncateLongValues
)
$ErrorActionPreference = 'Stop'
$columnLengthSql = @"
select
c.[max_length]
from sys.columns c
inner join sys.objects o
on c.object_id = o.object_id and o.type = 'U'
inner join sys.types t on c.system_type_id = t.system_type_id
where o.Name = N'{0}'
order by c.column_id
"@
try
{
# Connect to SQL Server
$conn = New-Object System.Data.SqlClient.SqlConnection($ConnectionString)
$conn.Open()
# Read the column widths, which we will use to check string (char type) columns
$columnWidths = Invoke-Command -NoNewScope -ScriptBlock {
try
{
$cmd = $conn.CreateCommand()
$cmd.CommandText = [string]::Format($columnLengthSql, $Table)
$cmd.CommandType = 'Text'
$rdr = $cmd.ExecuteReader()
while ($rdr.Read())
{
# Emit width into array
[int]$rdr[0]
}
}
finally
{
($rdr, $cmd) |
ForEach-Object {
$_.Dispose()
}
}
}
$bcpOption = ([System.Data.SqlClient.SqlBulkCopyOptions]::TableLock -bor [System.Data.SqlClient.SqlBulkCopyOptions]::UseInternalTransaction)
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($conn, $bcpOption, $null)
$bulkcopy.DestinationTableName = $Table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $BatchSize
$datatable = New-Object System.Data.DataTable
# Get the column data for the given table
# Sneakily selecting 1 row from the table puts the schema into the datatable
try
{
$sql = 'select top 1 * from [' + $Table + ']'
$dad = New-Object System.Data.SqlClient.SqlDataAdapter($sql, $conn)
$dad.Fill($datatable)
}
finally
{
$dad.Dispose()
}
# If we read a row, clear it.
$datatable.Clear()
# Init row counter
$i = 0
# Let Import-Csv deal with delimiter nonsense!
Import-Csv -Path $Path |
ForEach-Object {
if ($TruncateLongValues)
{
# Check column widths - this will slow things down somewhat
for ($col = 0; $col -lt $datatable.Columns.Count; ++$col)
{
if ($datatable.Columns[$col].DataType.FullName -eq 'System.String' -and $_.PSObject.Properties.Value[$col].ToString().Length -gt $columnWidths[$col])
{
Write-Warning "Row $($i + 1), Col $($col + 1): Value truncated"
$_."$($_.PSObject.Properties.Name[$col])" = $_.PSObject.Properties.Value[$col].ToString().Substring(0, $columnWidths[$col])
}
}
}
try
{
[void]$datatable.Rows.Add($_.PSObject.Properties.Value)
}
catch
{
# Column datatype mismatch
if ($StopOnError)
{
# Stop immediately
throw
}
# Warn the user a row didn't go in and continue
Write-Warning "Row $($i + 1): $($_.Exception.Message)"
}
if ((++$i % $BatchSize) -eq 0)
{
# Write batch
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
}
if ($datatable.Rows.Count -gt 0)
{
# Write remaining rows
$bulkcopy.WriteToServer($datatable)
$datatable.Clear()
}
}
catch
{
# A good script will handle exceptions here,
# e.g. failure to connect to SQL server, incorrect/un-castable datatype for column etc.
# for now, just re-throw
throw
}
finally
{
# Clean up in finally block - ensures resources are released even in event of errors.
($bulkcopy, $datatable, $conn) |
Where-Object { $_ -ne $null } |
ForEach-Object {
$_.Dispose()
}
}
July 5, 2018 at 7:25 pm
very cool.. I'll run some test and report back results
Thanks again.
July 5, 2018 at 7:46 pm
Is there a way to force a value(N/A) to a SQL field defined for NOT Null if the csv field is empty?
error
does not allow DBNull.Value.
Thanks.
July 5, 2018 at 9:55 pm
David Rueter - Thursday, July 5, 2018 11:23 AMJeff Moden - Thursday, July 5, 2018 7:34 AMAs for those that think using xp_CmdShell for such things constitutes a security risk, you really need to reconsider because, when done properly, it's not and it brings a whole 'nuther world of functionality into play that most only dream of or have to write some pretty nasty external code to do some of the same things.
I agree, if a single entity owns both the database and the server. The risk comes into play with a scenario like: you have multiple customers in separate databases on one server. In that case you may want a customer to have control over their database, but not over the server (such that they could compromise security or stability of other databases). In such a case, granting the customer xp_CmdShell does constitute a security risk. Same thing is true for TRUSTWORTHY databases, "unsafe" CLR assemblies, etc.
This would be a topic worthy of it's own discussion / article / etc.
That's what I mean. People don't know how to use xp_CmdShell correctly. You NEVER grant a customer privs to execute xp_CmdShell directly nor provide them with the ability to create/modify a proc that does (and, yes... that does mean who or what you make as the owner of the database). You only grant such privs to stored procedures that they can use but cannot change (which is sometimes a bit of a trick). And, yes... I totally agree that the use of TRUSTWORTHY in such a case is a bad thing to do because it is a mostly hidden security risk that people frequently miss.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 5, 2018 at 9:58 pm
Bruin - Thursday, July 5, 2018 7:46 PMIs there a way to force a value(N/A) to a SQL field defined for NOT Null if the csv field is empty?error
does not allow DBNull.Value.
Thanks.
Yes... stop trying to insert into the final table. I ALWAYS use a staging table so I can pre-validate and delouse the data I've imported. Then I'll move the validated data to the final table. You would do such a thing just before you do the move.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 6, 2018 at 12:37 am
Bruin - Thursday, July 5, 2018 7:46 PMIs there a way to force a value(N/A) to a SQL field defined for NOT Null if the csv field is empty?error
does not allow DBNull.Value.
Thanks.
If the DB column is NOT NULL and the CSV field is empty, then it probably should error. You'd need more logic (slowing down the import even more) to check for empty values and then supply a suitable default value for the column's data type - which would vary from case to case.
I'd agree with Jeff's point above in that you bulk import to a staging table that has fewer constraints and massage the data to the final table using SQL i.e. a stored proc designed for the task.
And - you're not taking out a table lock on your production table.
July 6, 2018 at 3:33 am
agreed.. Staging table is the preferred method before loading live(Prod) table.
What the best solution for files with "No Headers"?
Thanks again for the script and comments .. just getting started with PS and very cool what you created to load csv to SQL.
July 6, 2018 at 3:52 am
We need to add the -Header argument to Import-Csv and give it dummy column names. The actual column names in the datatable are now determined from the table schema so it doesn't matter what the headers are in the CSV
We also need to use argument splatting to pass the optional -Header parameter to Import-Csv. This makes it assume all input rows are data.
Add a new argument to the script to indicate you want to load a file without headers
[switch]$NoHeaders
Since we know how many columns to expect because we've read the column widths, we can do this (after the bit that gets the column widths)
$headerArgument = @{}
if ($NoHeaders)
{
$headings = Invoke-Command -NoNewScope -ScriptBlock {
for($i = 0; $i -lt $columnWidths.Length; $i++)
{
"Dummy$($i)"
}
}
$headerArgument = @{ Header = $headings }
}
And now change the Import-Csv line to read
Import-Csv -Path $Path @headerArgument |
July 6, 2018 at 6:13 am
many thanks!!!
From this reply
"The actual column names in the datatable are now determined from the table schema so it doesn't matter what the headers are in the CSV"
It really doesn't need Headers ?
Thanks.
July 6, 2018 at 12:04 pm
Does the order of the csv have to match sql schema and what about if schema has 8 fields and csv has 4 fields.
Many thanks again!!!
July 6, 2018 at 1:46 pm
Brian F - Tuesday, July 3, 2018 7:04 AMI used a VB script inside an SSIS package to accomplish the same functionality. This is a smart way to take care of this issue.With the data I needed to import, the pipe ended up not being a good delimiter. So, especially in data fields full of user input, it's worth keeping in mind different alternatives. My solution was using a triple semicolon, which you should be hard pressed to find in normal user input, but I'd be interested to hear if anyone else had similar issues with different workaround delimiters.
What about input files where the absence of a value is indicated by empty string? We get raw files that look like
"Some String";value1;value2;;;;;value7;value8
July 7, 2018 at 7:03 am
Bruin - Friday, July 6, 2018 12:04 PMDoes the order of the csv have to match sql schema and what about if schema has 8 fields and csv has 4 fields.Many thanks again!!!
To this, and the other questions...
July 7, 2018 at 9:41 am
Bruin - Friday, July 6, 2018 12:04 PMDoes the order of the csv have to match sql schema and what about if schema has 8 fields and csv has 4 fields.Many thanks again!!!
Yes, but there is a work around. Import to a view that does the mapping between the table and the data being imported.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 31 through 45 (of 76 total)
You must be logged in to reply to this topic. Login to reply