October 9, 2008 at 6:35 pm
If a good delimiter is that hard to pick up front, then first scan the file to make sure if there are any thorns in the data. If need be - have a list of possible delimiters. Non printing characters might work well also.
That must be some serious ugly data if there are "known facts" about it.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 9, 2008 at 8:32 pm
Jeff Moden (10/9/2008)
True enough... but have you ever see it used for anything other than a delimiter? 🙂
I've seen it used for emoticons-> :Þ
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
October 9, 2008 at 9:53 pm
Heh... I agree... but in a data file?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 9, 2008 at 10:34 pm
One day some developer will read a conversation like this and decide it's good to use "thorn" character as a replacement for CR/LF.
He/she will agree it with other side and they'll start to exchange data with files uploaded by my tool.
Then what?
I really don't like preprogrammed failures.
Especially in my code.
_____________
Code for TallyGenerator
October 10, 2008 at 3:01 am
Sergiy (10/9/2008)
One day some developer will read a conversation like this and decide it's good to use "thorn" character as a replacement for CR/LF.He/she will agree it with other side and they'll start to exchange data with files uploaded by my tool.
Then what?
I really don't like preprogrammed failures.
Especially in my code.
Hi chaps, this might sound a little blasphemous, but have you tried Visual Foxpro for importing the data? I worked with the product extensively in the 90's (MCP in '96) and it's a bit of a whizz for working with import files. It can handle huge strings too, I forget the size now but I think it was something like a gig in version 7. It talks to SQL Server almost seamlessly.
Most of us who worked with both VFP and VB had a distinct preference for VFP. It was, at least back in those days, far more powerful and elegant. One software house I worked for in those days had a number of projects going in both tools, almost always CS and often 3-tier. VFP projects typically rolled out with a third of the effort of VB projects.
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 10, 2008 at 5:00 am
Sergiy (10/9/2008)
One day some developer will read a conversation like this and decide it's good to use "thorn" character as a replacement for CR/LF.He/she will agree it with other side and they'll start to exchange data with files uploaded by my tool.
Then what?
I really don't like preprogrammed failures.
Especially in my code.
Then, they'll call is "SML". 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
October 10, 2008 at 5:04 am
Chris Morris (10/10/2008)
Hi chaps, this might sound a little blasphemous, but have you tried Visual Foxpro for importing the data? I worked with the product extensively in the 90's (MCP in '96) and it's a bit of a whizz for working with import files. It can handle huge strings too, I forget the size now but I think it was something like a gig in version 7. It talks to SQL Server almost seamlessly.Most of us who worked with both VFP and VB had a distinct preference for VFP. It was, at least back in those days, far more powerful and elegant. One software house I worked for in those days had a number of projects going in both tools, almost always CS and often 3-tier. VFP projects typically rolled out with a third of the effort of VB projects.
Cheers
ChrisM
I appreciate the tip. But, the problem isn't importing, nor parsing the data. Sergiy has managed to already do that using T-SQL. The problem is doing it in T-SQL in a set based fashion because of all the caveats in the data.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 10, 2008 at 5:33 am
Ahah. Thanks for the clarification, Jeff.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 10, 2008 at 6:37 am
"Thorn"
wut?
October 10, 2008 at 6:50 am
Chris Morris (10/10/2008)
Hi chaps, this might sound a little blasphemous, but have you tried Visual Foxpro for importing the data? I worked with the product extensively in the 90's (MCP in '96) and it's a bit of a whizz for working with import files. It can handle huge strings too, I forget the size now but I think it was something like a gig in version 7. It talks to SQL Server almost seamlessly.
What's the speed? Is it comparable to BCP?
See, that's the thing, there's really 3 requirements here: Speed, Size and Formatting, and nothing seems to cover all three.
BCP (and to a lesser extent, Bulk Insert) can handle size and speed, but cannot handle the formatting. Other tools, however, cannot match the speed. For instance, my custom code can handle the formatting and with some work I could handle the size, but I am stuck at only 40% of BCP's speed.
And what bothers me most about this is that I know that it is not the extra formatting that is killing my speed, but the built-in formatting that BCP already does. It has always seemed to me that it would be a simple matter for BCP or BULK INSERT to be moderately enhanced to handle the real formatting requirements of a CSV but for some reason, Microsoft has never seen fit to do it.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
October 10, 2008 at 7:16 am
Hi Barry, it's a while since I worked with the product, but back then it was blindingly fast vs the competition when importing files or splitting files as strings into rows and columns. Having imported stuff into VFP dbf's, VFP has something like a bulk import wizard to upload to SQL Server. It has an implementation of SQL which is now very similar to SQL Server TSQL.
I guess the only way to find out is to try it yourself. Last I heard it was free inside packets of cornflakes. Damn shame.
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 10, 2008 at 7:29 am
Jeff Moden (10/8/2008)
ggraber (10/8/2008)
An article on this topic would be really cool, Jeff.I just presented this concept at work using your split csv example.
I was just trying to think what other examples I could use for future presentations.
One question that came up, though. If your string is longer than the tally table the split doesn't work.
It would seem that this is a risk factor in using the tally table which you don't have when doing the split with a while loop.
How would you answer that question?
Since VARCHAR(8000) is the largest datatype (save the really unwieldy Text datatype) in SQL Server 2000, and 11,000 row (for mortgage calcs) Tally table always suffices.
For SQL Server 2005, I use a Tally "CTE" which runs just as fast and will easily create 121 million numbers much faster than any While loop can iterate with a limit set by the length of whatever is being split.
Phil Factor and I did do a test on his older box though. Same test on his machine (parsing the entire Moby Dick novel) and the While loop won. On the 4 machines I tested on with identical code, the Tally table always won quite handily.
Bottom line is, you always need to test "insitu" for maximum performance benefits.
Lemme know if you don't know what I mean by a "Tally CTE"... most folks fail performance wise when they try to make one but have still written potloads of bad examples on the internet.
Nope, I'm not 100% sure I know what you mean by a "Tally CTE."
Sounds like you're gonna create a tally table on the fly. Which sounds pretty performance intensive to me.
October 10, 2008 at 7:31 pm
J (10/10/2008)
"Thorn"wut?
"Thorn" is actually the name of a character in the ASCII character set. A good description and history of the character is found at the following URL...
http://en.wikipedia.org/wiki/%C3%9E
--Jeff Moden
Change is inevitable... Change for the better is not.
October 10, 2008 at 7:36 pm
rbarryyoung (10/10/2008)
And what bothers me most about this is that I know that it is not the extra formatting that is killing my speed, but the built-in formatting that BCP already does. It has always seemed to me that it would be a simple matter for BCP or BULK INSERT to be moderately enhanced to handle the real formatting requirements of a CSV but for some reason, Microsoft has never seen fit to do it.
I actually said the very same thing at work whilst training some folks, today. If bloody Excel and DTS can do it so easily, why can't BCP? It's been years, but I think I'm gonna have to write an EXE.
Hey... wait a minute... is there an available DLL in Excel or DTS that could be used in either a CLR or as an extended stored proc?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 10, 2008 at 7:43 pm
ggraber (10/10/2008)
Nope, I'm not 100% sure I know what you mean by a "Tally CTE." Sounds like you're gonna create a tally table on the fly. Which sounds pretty performance intensive to me.
That's exactly correct... and, compared to a While loop, it's still blazingly fast. In fact, if you hold your mouth just right while building it, it's very comparable to and sometimes beats a Tally table. Take a look at the following article...
http://www.sqlservercentral.com/articles/T-SQL/63003/
... and do a search within the article for "CTE".
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 46 through 60 (of 66 total)
You must be logged in to reply to this topic. Login to reply