May 10, 2010 at 8:51 am
johnzabroski (5/10/2010)
Paul White NZ,Your solution doesn't work on edge cases where data elements themselves may contain pipes.
As for using CLR, we use CLR heavily here in many exotic ways, especially for small-size clients (<50GB databases). Most articles on SQLServerCentral target performance reasons for using the CLR, which misses the point entirely: it should also make you more productive.
I've thought of using Parsing Expression Grammars (PEG) to automatically build a bcp format file; whether it is XML or the older format file plaintext-format doesn't matter, since the constraints on the solution are the same. XML is merely simpler to read but takes slightly longer for BCP to parse. XML is also not "human type-able". With a PEG, I would also be able to input the PEG to SQL CLR and verify the PEG against the database's dictionaries. This is the cleanest solution, and nobody does it because nobody understands grammar theory. XML is fundamentally for data interchange and not for DSLs.
I've run into that with CSV files, luckily the source was internal and the data could be corrected at the source and the export file recreated. The idea behing using delimited files is to choose a delimiter that isn't in the data. If you have pipes in the data, then it doesn't make a good delimiter.
I haven't had a chance to work on this yet, been too busy getting things ready for a trip to Texas for my daughters graduation from AIT. Perhaps when I return.
May 10, 2010 at 2:36 pm
Paul White NZ,
I appreciate everyone's help. I simply haven't found a solution that fits my awkward/dumb problem.
I'll thank everyone when/if I have a solution, giving credit to the lucky solution that actually solves my problem.
I hope I didn't offend you by pointing out your XML config file is isomorphic to any bcp format file. The big differences are (1) bcp format file version number in plaintext version (2) plaintext version requires row count, whereas XML matches record description to column data description. Given a PEG description of the grammar, we can recursively backtrack through all the columns listed in the DSL example file, counting the number of columns, and shove that in the plaintext bcp format file's description.
I think you'll find that if you represent the problem using a Parsing expression grammar (PEG) then you'll get the best solution for what you appeared to be talking about: DSL definition (which seemed a bit off-topic to me). A pointer to a unique solution strategy is the best way I can say thanks for your time; most DBAs have never heard of a PEG. I am giving you input on how I would solve a problem elegantly, iff I had no weird constraints beyond my control (like the ones I described in this thread). (Technically speaking, you could probably use a Generalized Left-Right Scannerless parsing tool like the Oslo toolchain to do the same work as a PEG. DBAs might be more familiar with that since that toolchain is now being folded into SQLServer as "SQLServer Modeling Platform".)
May 10, 2010 at 2:49 pm
Lynn,
I'll agree that Paul's OPENROWSET idea in conjunction with the "don't blit, just load" trick you mentioned at the beginning of this thread can form a good solution. I would still recommend specifying the blitting declaratively using a parsing expression grammar. Whether you choose to have the parsing expression grammar emit its rules as a bcp format file (plaintext or even XML) or as an OPENROWSET is really up to you.
Unfortunately, none of those more elegant solutions are possible here. I know, it's ridiculous...
You could possibly write-up an article for SQLServerCentral.com about what I mentioned above and compare the various approaches. It would be useful for others, even if it fails to solve my problem here. I can appreciate elegance even when I desperately need a kludge.
May 10, 2010 at 3:17 pm
As a brief aside, I envy those of you who can fix these problems internally.
We recently had a customer decide for us without our input that they would hire a consultant (probably >$250/hr) to "generate" a new data feed for our software to consume, because they are upgrading their systems and can no longer support the existing data feed. The new version literally does not support the same concepts as the old version of the software, but a consultant told them they could "generate" something resembling the old data. For some reason, they assumed we did not care about that decision at all, and would happily waste probably ~100-300 man-hours debugging whatever problems resulted from such a change. Not to mention the fact our software is used to help improve price setting in contracts... that can influence gross income by about 40M for a 1B company. It's like, "Who sat down and came up with the plan for this and thought it would work?" The point being is that IT decision can literally equate to layoffs, people losing their jobs for no good reason other than somebody making a decision and not seeing the consequences of their actions.
May 10, 2010 at 3:23 pm
I've re-read this thread a few times. Your original question is the output of sed. Have you tried going to the advanced page of the job step and setting up an output file?
I just set up a test job to execute a dos command as dir y:\*.* (a directory that didn't exist). all the output that would of been in the cmd window ended up in the output file.
---------------------------------------------------------------------
Use Full Links:
KB Article from Microsoft on how to ask a question on a Forum
May 10, 2010 at 3:41 pm
johnzabroski (5/10/2010)
Lynn,I'll agree that Paul's OPENROWSET idea in conjunction with the "don't blit, just load" trick you mentioned at the beginning of this thread can form a good solution. I would still recommend specifying the blitting declaratively using a parsing expression grammar. Whether you choose to have the parsing expression grammar emit its rules as a bcp format file (plaintext or even XML) or as an OPENROWSET is really up to you.
Unfortunately, none of those more elegant solutions are possible here. I know, it's ridiculous...
You could possibly write-up an article for SQLServerCentral.com about what I mentioned above and compare the various approaches. It would be useful for others, even if it fails to solve my problem here. I can appreciate elegance even when I desperately need a kludge.
Can you post some sample data that represents the data you receive, including edge conditions that can cause problems and then what the corrected data should like before it is imported? that could help us solve your problem.
May 20, 2010 at 1:06 pm
Lynn Pettis (5/10/2010)
Can you post some sample data that represents the data you receive, including edge conditions that can cause problems and then what the corrected data should like before it is imported? that could help us solve your problem.
I am not sure what to tell you Lynn.
I solved the problem - by stuffing the sed command inside an sed.bat file and running that in CmdExec instead. There is no problem that cannot be solved with another level of indirection!
I don't know why I didn't think of this any sooner, but can't feel too bad, since a forum full of SQL Server experts didn't either.
I guess the lesson here is that when somebody just wants to run sed, figure out how to let them 🙂
BUT I do appreciate the discussion of tricks in this thread; I knew all of them except the OPENROWSET trick, which is very cool but cannot be used in this situation.
June 1, 2010 at 7:07 am
Paul White NZ (5/5/2010)
I've had a much better idea. Create an XML format file like so:<snip>
And use the BULK OPENROWSET provider to allow you to pre-process the raw data before inserting into the destination table:
INSERT dbo.zz_u_test_TEMP WITH (TABLOCK)
<trim>
FROM OPENROWSET
(
BULK 'C:\Documents and Settings\Paul\Test.data',
FORMATFILE = 'C:\Documents and Settings\Paul\format.xml',
CODEPAGE = 'RAW',
FIRSTROW = 0,
LASTROW = 0,
MAXERRORS = 0,
ROWS_PER_BATCH = 0
) RowSource;
This preserves all the benefits of minimally-logged fast bulk load, while giving you the opportunity to apply transformations in the SELECT clause. Very cool.
Paul
Paul thank you for a great example; I did not know you could transform a BULK INSERT with the OPENROWSET and XML; this is a great example I added to my snippets.
Awesome post! Thanks again !
Lowell
Lowell
June 1, 2010 at 8:06 am
Lowell (6/1/2010)
Paul thank you for a great example; I did not know you could transform a BULK INSERT with the OPENROWSET and XML; this is a great example I added to my snippets.Awesome post! Thanks again!l
Hey Lowell, thanks! It is a pretty well-kept secret - so only tell people you trust :laugh:
It would be even better if we could provide the XML format file in an XML variable - and if the file name could be parameterised for that matter. Might see that in SQL11 😉
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply