March 4, 2008 at 9:55 pm
Comments posted to this topic are about the item Simple Method for Importing Ragged files
Paul Ibison
Paul.Ibison@replicationanswers.com
March 4, 2008 at 11:50 pm
would it be possible for you to mail me the package please.
March 4, 2008 at 11:50 pm
March 5, 2008 at 2:38 am
*sigh* ANOTHER misleading title. This is NOT how to import a ragged-right file. It's how to remove unwanted header & trailer records from a perfectly formed NON-ragged-right file. I was hoping for the import of the classic ragged-right file - a variable number of fields in records that are wanted.
BTW Using the usual defaults, record sets can blow up once they reach about 200MB (of input data) with multiple fields of variable text.
March 5, 2008 at 3:07 am
True - the title is a little misleading. The problem is that there isn't a 'correct' term for such files and this is the nearest commonly understood one. Anyway, what you need is this link: http://www.sql-server-performance.com/article_print.aspx?id=1056&type=art
Rgds,
Paul Ibison
March 5, 2008 at 5:30 am
Re: " I didn't try using the identity method, as I'm always concerned (probably needlessly) that the order of inserts might not necessarily be guaranteed to be identical to that of the text file..."
Not needlessly at all, I've been bit by that, and after the same import routine had preserved the order some 3-4 runs prior. When you are warned that relational tables have no order, you are completely correct to take the warning seriously!
March 5, 2008 at 5:44 am
Hi Jim,
that's reassuring in a sense that my cautious approach was OK:)
BTW we're talking about reading from a text file rather than a table.
Even so I'm still a little mystified by this - if I did this in .NET, the sequence of reading would be consistant. My guess is that in SSIS it is read into a sequence of separate buffers and somehow it's the order of the buffers which is not sequential, but am not sure why this logically should be so.
Rgds,
Paul Ibison
March 5, 2008 at 5:49 am
I don't think it is SSIS, it is the SQL engine. Even though you read the file sequentially, where the rows wind up in the destination table is not guaranteed.
(I got caught by it doing imports into MS Access from .txt files.)
March 5, 2008 at 6:54 am
OK - I was hooked into thinking that the buffers were changing the order, but as you rightly point out in the absence of a clustered index on the table, the insertion order is not consistant/guaranteed.
Cheers,
Paul Ibison
March 5, 2008 at 7:01 am
I think a combination of omitting the first row and doing a conditional split would do just fine. This the method I use since I receive a number of flat files from legacy systems having header and trailer rows. The conditional split needs to evaluate only one column to see if it's a valid row. Maybe I'm missing something, but I don't understand the reference to manually defining the column names and use of substrings. You don't have to rename the columns and using a substring seems a heck of a lot easier than all of the manipulations as described.
March 5, 2008 at 7:48 am
If the goal is to skip the "incorrect" rows, the simplest way is to set a DTS package for data import, and set error number to 10 (assuming less then ten short rows as in your example).
March 5, 2008 at 7:58 am
Laura Meyerovich (3/5/2008)
If the goal is to skip the "incorrect" rows, the simplest way is to set a DTS package for data import, and set error number to 10 (assuming less then ten short rows as in your example).
This also assumes that the data rows you want to omit actually create an error. In some cases, the package may just import the row values into the defined data columns without error.
March 5, 2008 at 8:03 am
I looked at the conditional split, but if you have >50 columns (as I have), you'll need to manually define each column - that means define the name and write 50 substring clauses. If you have extracted the data to a csv staging flat file, with the column headers as the first row, this part is all done automatically.
Cheers,
Paul Ibison
March 5, 2008 at 8:15 am
paul.ibison (3/5/2008)
I looked at the conditional split, but if you have >50 columns (as I have), you'll need to manually define each column - that means define the name and write 50 substring clauses. If you have extracted the data to a csv staging flat file, with the column headers as the first row, this part is all done automatically.Cheers,
Paul Ibison
OK. I misread your statement about defining the columns. I generally follow a supplied file layout to define the columns. It's not been my experience where a file would have column names given and have a header and trailer row.
March 5, 2008 at 8:51 am
paul.ibison (3/5/2008)
True - the title is a little misleading. The problem is that there isn't a 'correct' term for such files and this is the nearest commonly understood one. Anyway, what you need is this link: http://www.sql-server-performance.com/article_print.aspx?id=1056&type=artRgds,
Paul Ibison
Actually, there is... it's called a "mixed rowtype" file.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 51 total)
You must be logged in to reply to this topic. Login to reply