September 15, 2017 at 9:25 pm
This solution works fine for all the columns except the one with 10,000+ characters in it. I think the final solution is going to add the CommandBehavior.SequentialAccess to the data reader and then subdivide the value somehow. Maybe the script needs to separate the raw string into two or more [DT_STR] output columns? Or maybe it can assign the full value to a text stream [DT_TEXT] output column a few characters at a time?
If someone out there has done this and can explain how to code it, I'd be greatly appreciative.
September 18, 2017 at 8:33 am
I have no clue how to fix this because I've never run into it before. I am interested, though, so this will act as a "bump" for your post.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 18, 2017 at 9:34 am
What is the format of your destination column, the input from Excel is Unicode, if your output is varchar then you will have truncation issues.
For a test a created an Excel 97-2003 workbook with one worksheet containing a single row with mixed values and one column with 10,000 chars.
I created an SSIS (SQL Server 2008 R2) package with an Excel data connection pointing to the workbook
It identified the large column as DT_NTEXT
I add a conversion transformation to change DT_NTEXT to DT_TEXT
I then added an output flat file connection (using the converted column)
Ran the package and it successfully read the workbook and output the correct values including the 10,000 char column.
Far away is close at hand in the images of elsewhere.
Anon.
September 18, 2017 at 11:40 am
If any of the first eight rows of data in the Excel sheet contains the extra-long string of more than 8,000 characters, then the Excel Source will apply the DT_NTEXT data type, but if not it applies DT_STRING. Most of the rows in the source document are well below 8,000 characters; only a few exceed that maximum. So rather than rely on some sort of artificial manipulation of the Excel document or on some arbitrary "template" file to sort of "trick" the Excel Source into using DT_NTEXT, I wanted to proactively control the data type, and the Script Component is the only way to do that, or so I've concluded at this point.
After many hours of struggling with my own ignorance of using the SSIS Script Component, of handling BLOBs in general, and the C# language in particular, I have finally been able to work out the solution. It begins with the approach outlined in the link from my original post, with these variations and additions:
Here is the key portion of my modified script showing the process of iterating through the BLOB text in 8K chunks and using .AddBlobData to put it into the output column value. The first column of the SQL SELECT clause is the one with BLOB text, so it is referenced as the "0" column of the data because the .GetChars method only allows referencing the column by position and not by name. The other two columns are non-BLOB data and are referenced by name rather than by position.
public override void CreateNewOutputRows()
{
int buffer_size = 8192;
char[] blob_chars = new char[buffer_size];
long start_index = 0;
long ret_value = 0;
string blob_chunk;
while (reader.Read())
{
Sheet1Buffer.AddRow();
start_index = 0;
ret_value = reader.GetChars(0, start_index, blob_chars, 0, buffer_size);
blob_chunk = new string(blob_chars);
blob_chunk = blob_chunk.Replace('\u2018', '\u0027').Replace('\u2019', '\u0027').Replace('\u201c', '\u0022').Replace('\u201d', '\u0022');
while (ret_value == buffer_size)
{
Sheet1Buffer.MajorDesc.AddBlobData(Encoding.UTF8.GetBytes(blob_chunk));
start_index += buffer_size;
blob_chars = new char[buffer_size];
ret_value = reader.GetChars(0, start_index, blob_chars, 0, buffer_size);
blob_chunk = new string(blob_chars);
blob_chunk = blob_chunk.Replace('\u2018', '\u0027').Replace('\u2019', '\u0027').Replace('\u201c', '\u0022').Replace('\u201d', '\u0022');
}
Sheet1Buffer.MajorDesc.AddBlobData(Encoding.UTF8.GetBytes(blob_chunk));
Sheet1Buffer.Title = (string)reader["Title"];
Sheet1Buffer.Code = (string)reader["Code"];
}
}
September 18, 2017 at 12:13 pm
Jeff Moden - Monday, September 18, 2017 8:33 AMI have no clue how to fix this because I've never run into it before. I am interested, though, so this will act as a "bump" for your post.
Thanks for the "bump." I have used your "numbers table" solutions to avoid RBAR on many occasions, so I'm honored that you took an interest in my problem and its solution.
By the way, you may be interested to know I plan on extending the script in order to add a [RowNumber] column to the output columns of the Script Component as a way to avoid RBAR assignment of an iterating uniqueness key column value in the database
September 19, 2017 at 4:10 pm
Thanks for the feedback both on your fix and on nonRBAR solutions, Geoff. I appreciate it.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 7, 2021 at 9:31 pm
I found this topic via a Google search, and it kinda-sorta relates to my issue. I need a Script Component in SSIS to serve as a data source for an inbound text file. The component reads the data into a variable using the StreamReader objects' ReadLine() method in Visual Basic. The problem I run into is exactly the same... I am not allowed to assign a value to the BLOB column I specified for the Output, as it complains that the property is "Read Only". My data is a string, but I can't be sure that the row I get will be 8,000 characters or less, so my staging table to hold the data uses VARCHAR(MAX) as the data type. I'm not sure that the method being used here can actually work for me, as I need this to be fairly simple instead of code that is going to be troublesome to modify in the future. Also, the data from the file is coming into a String data type variable named RowData. Might there be a way to use a loop to just grab 8,000 characters from the string at a time and then put them into a BLOB somehow? That is what your solution does, but it also uses C# as well as it's reading the data from the reader object, and I don't need to do that. Just not sure how to translate that code, or if the methods will be available if the reader isn't being used. Anyone?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 7, 2021 at 10:03 pm
I found this topic via a Google search, and it kinda-sorta relates to my issue. I need a Script Component in SSIS to serve as a data source for an inbound text file. The component reads the data into a variable using the StreamReader objects' ReadLine() method in Visual Basic. The problem I run into is exactly the same... I am not allowed to assign a value to the BLOB column I specified for the Output, as it complains that the property is "Read Only". My data is a string, but I can't be sure that the row I get will be 8,000 characters or less, so my staging table to hold the data uses VARCHAR(MAX) as the data type. I'm not sure that the method being used here can actually work for me, as I need this to be fairly simple instead of code that is going to be troublesome to modify in the future. Also, the data from the file is coming into a String data type variable named RowData. Might there be a way to use a loop to just grab 8,000 characters from the string at a time and then put them into a BLOB somehow? That is what your solution does, but it also uses C# as well as it's reading the data from the reader object, and I don't need to do that. Just not sure how to translate that code, or if the methods will be available if the reader isn't being used. Anyone?
Instead of using SSIS or Visual Basic, why not just do a BULK INSERT from the file, which would handle the whole 9 yards quite nicely.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 8, 2021 at 1:42 pm
sgmunson wrote:I found this topic via a Google search, and it kinda-sorta relates to my issue. I need a Script Component in SSIS to serve as a data source for an inbound text file. The component reads the data into a variable using the StreamReader objects' ReadLine() method in Visual Basic. The problem I run into is exactly the same... I am not allowed to assign a value to the BLOB column I specified for the Output, as it complains that the property is "Read Only". My data is a string, but I can't be sure that the row I get will be 8,000 characters or less, so my staging table to hold the data uses VARCHAR(MAX) as the data type. I'm not sure that the method being used here can actually work for me, as I need this to be fairly simple instead of code that is going to be troublesome to modify in the future. Also, the data from the file is coming into a String data type variable named RowData. Might there be a way to use a loop to just grab 8,000 characters from the string at a time and then put them into a BLOB somehow? That is what your solution does, but it also uses C# as well as it's reading the data from the reader object, and I don't need to do that. Just not sure how to translate that code, or if the methods will be available if the reader isn't being used. Anyone?
Instead of using SSIS or Visual Basic, why not just do a BULK INSERT from the file, which would handle the whole 9 yards quite nicely.
I need a row number based on the actual order in the original file, plus I need to include other metadata that I'm supplying via SSIS variables. That row number might not be reproducible some other way, and it's not something we can do without. It's an essential part of the process. If you're aware of a way to do that via BULK INSERT, I'm all ears. Unfortunately, there's nothing in the file itself that would provide that order beyond the order of the rows in the file.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 8, 2021 at 2:23 pm
It now appears I may have found the solution:
Row.MyVARCHARMAXColumn.AddBlobData(System.Text.Encoding.UTF8.GetBytes("something")) or
Row.MyVARCHARMAXColumn.AddBlobData(System.Text.Encoding.ASCII.GetBytes("something"))
At least now my script doesn't generate an error for compilation. We'll see what happens at runtime...
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 8, 2021 at 3:13 pm
Jeff Moden wrote:sgmunson wrote:I found this topic via a Google search, and it kinda-sorta relates to my issue. I need a Script Component in SSIS to serve as a data source for an inbound text file. The component reads the data into a variable using the StreamReader objects' ReadLine() method in Visual Basic. The problem I run into is exactly the same... I am not allowed to assign a value to the BLOB column I specified for the Output, as it complains that the property is "Read Only". My data is a string, but I can't be sure that the row I get will be 8,000 characters or less, so my staging table to hold the data uses VARCHAR(MAX) as the data type. I'm not sure that the method being used here can actually work for me, as I need this to be fairly simple instead of code that is going to be troublesome to modify in the future. Also, the data from the file is coming into a String data type variable named RowData. Might there be a way to use a loop to just grab 8,000 characters from the string at a time and then put them into a BLOB somehow? That is what your solution does, but it also uses C# as well as it's reading the data from the reader object, and I don't need to do that. Just not sure how to translate that code, or if the methods will be available if the reader isn't being used. Anyone?
Instead of using SSIS or Visual Basic, why not just do a BULK INSERT from the file, which would handle the whole 9 yards quite nicely.
I need a row number based on the actual order in the original file, plus I need to include other metadata that I'm supplying via SSIS variables. That row number might not be reproducible some other way, and it's not something we can do without. It's an essential part of the process. If you're aware of a way to do that via BULK INSERT, I'm all ears. Unfortunately, there's nothing in the file itself that would provide that order beyond the order of the rows in the file.
Just double checking... what's the target here... An Excel spreadsheet or a table? I'm pretty sure that it's not possible to do in a single blob in Excel.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 8, 2021 at 3:56 pm
Jeff Moden wrote:sgmunson wrote:I found this topic via a Google search, and it kinda-sorta relates to my issue. I need a Script Component in SSIS to serve as a data source for an inbound text file. The component reads the data into a variable using the StreamReader objects' ReadLine() method in Visual Basic. The problem I run into is exactly the same... I am not allowed to assign a value to the BLOB column I specified for the Output, as it complains that the property is "Read Only". My data is a string, but I can't be sure that the row I get will be 8,000 characters or less, so my staging table to hold the data uses VARCHAR(MAX) as the data type. I'm not sure that the method being used here can actually work for me, as I need this to be fairly simple instead of code that is going to be troublesome to modify in the future. Also, the data from the file is coming into a String data type variable named RowData. Might there be a way to use a loop to just grab 8,000 characters from the string at a time and then put them into a BLOB somehow? That is what your solution does, but it also uses C# as well as it's reading the data from the reader object, and I don't need to do that. Just not sure how to translate that code, or if the methods will be available if the reader isn't being used. Anyone?
Instead of using SSIS or Visual Basic, why not just do a BULK INSERT from the file, which would handle the whole 9 yards quite nicely.
I need a row number based on the actual order in the original file, plus I need to include other metadata that I'm supplying via SSIS variables. That row number might not be reproducible some other way, and it's not something we can do without. It's an essential part of the process. If you're aware of a way to do that via BULK INSERT, I'm all ears. Unfortunately, there's nothing in the file itself that would provide that order beyond the order of the rows in the file.
I know you mentioned other metadata, but to get the line numbers could you run the file through a simple powershell script to append these to each line in the source file before using Jeff's suggestion? Or is that going to cause you other grief?
"Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. When we enquire into any subject, the first thing we have to do is to know what books have treated of it. This leads us to look at catalogues, and at the backs of books in libraries."
— Samuel Johnson
I wonder, would the great Samuel Johnson have replaced that with "GIYF" now?
October 8, 2021 at 6:59 pm
sgmunson wrote:Jeff Moden wrote:sgmunson wrote:I found this topic via a Google search, and it kinda-sorta relates to my issue. I need a Script Component in SSIS to serve as a data source for an inbound text file. The component reads the data into a variable using the StreamReader objects' ReadLine() method in Visual Basic. The problem I run into is exactly the same... I am not allowed to assign a value to the BLOB column I specified for the Output, as it complains that the property is "Read Only". My data is a string, but I can't be sure that the row I get will be 8,000 characters or less, so my staging table to hold the data uses VARCHAR(MAX) as the data type. I'm not sure that the method being used here can actually work for me, as I need this to be fairly simple instead of code that is going to be troublesome to modify in the future. Also, the data from the file is coming into a String data type variable named RowData. Might there be a way to use a loop to just grab 8,000 characters from the string at a time and then put them into a BLOB somehow? That is what your solution does, but it also uses C# as well as it's reading the data from the reader object, and I don't need to do that. Just not sure how to translate that code, or if the methods will be available if the reader isn't being used. Anyone?
Instead of using SSIS or Visual Basic, why not just do a BULK INSERT from the file, which would handle the whole 9 yards quite nicely.
I need a row number based on the actual order in the original file, plus I need to include other metadata that I'm supplying via SSIS variables. That row number might not be reproducible some other way, and it's not something we can do without. It's an essential part of the process. If you're aware of a way to do that via BULK INSERT, I'm all ears. Unfortunately, there's nothing in the file itself that would provide that order beyond the order of the rows in the file.
Just double checking... what's the target here... An Excel spreadsheet or a table? I'm pretty sure that it's not possible to do in a single blob in Excel.
Data source is the Script Component, which must populate the column in the output buffer. Data destination is a SQL Server 2019 database table.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 8, 2021 at 7:01 pm
sgmunson wrote:Jeff Moden wrote:sgmunson wrote:I found this topic via a Google search, and it kinda-sorta relates to my issue. I need a Script Component in SSIS to serve as a data source for an inbound text file. The component reads the data into a variable using the StreamReader objects' ReadLine() method in Visual Basic. The problem I run into is exactly the same... I am not allowed to assign a value to the BLOB column I specified for the Output, as it complains that the property is "Read Only". My data is a string, but I can't be sure that the row I get will be 8,000 characters or less, so my staging table to hold the data uses VARCHAR(MAX) as the data type. I'm not sure that the method being used here can actually work for me, as I need this to be fairly simple instead of code that is going to be troublesome to modify in the future. Also, the data from the file is coming into a String data type variable named RowData. Might there be a way to use a loop to just grab 8,000 characters from the string at a time and then put them into a BLOB somehow? That is what your solution does, but it also uses C# as well as it's reading the data from the reader object, and I don't need to do that. Just not sure how to translate that code, or if the methods will be available if the reader isn't being used. Anyone?
Instead of using SSIS or Visual Basic, why not just do a BULK INSERT from the file, which would handle the whole 9 yards quite nicely.
I need a row number based on the actual order in the original file, plus I need to include other metadata that I'm supplying via SSIS variables. That row number might not be reproducible some other way, and it's not something we can do without. It's an essential part of the process. If you're aware of a way to do that via BULK INSERT, I'm all ears. Unfortunately, there's nothing in the file itself that would provide that order beyond the order of the rows in the file.
I know you mentioned other metadata, but to get the line numbers could you run the file through a simple powershell script to append these to each line in the source file before using Jeff's suggestion? Or is that going to cause you other grief?
That would cause one of the basic principles I adhere to, to be grossly violated: "NEVER destroy your inputs." Not now, not ever. We retain the original files in an Archive folder, so that's not going to be allowed, even if I had wanted to.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 8, 2021 at 7:06 pm
Jeff Moden wrote:Just double checking... what's the target here... An Excel spreadsheet or a table? I'm pretty sure that it's not possible to do in a single blob in Excel.
Data source is the Script Component, which must populate the column in the output buffer. Data destination is a SQL Server 2019 database table.
Ok... what does that script look like? Is it pulling from another table, a file, or ???? Can you post that script, please?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply