October 9, 2011 at 4:03 pm
I have a SQL parsing routine for extracting information from text documents. It works ok for 100k records, about 20 minutes, but when i do 200k or more records, it goes a LOT slower.
Thanks
October 9, 2011 at 4:34 pm
No... T-SQL isn't a good parsing tool for such a thing. What are you trying to parse out of the Text documents and what's the average length of these documents?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 9, 2011 at 4:45 pm
This may be a perfect occasion to get started with Powershell or SSIS to parse your documents.
SSC has some nice articles to get you started with both.
For powershell this one may just do it: http://www.sqlservercentral.com/articles/powershell/72051/
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
October 9, 2011 at 5:53 pm
Name, Address, City, State, Zip, Phone Number
I do have an SSIS procedure that works very well at extracting emails, but this is more complicated!
eg. I need to search for 'Name' string, but get the data AFTER the 'Name' string.
I'm just not sure how to do this, probably a combination of C# and regex in a script, which is how the SSIS procedure works.
October 9, 2011 at 6:22 pm
ALZDBA (10/9/2011)
This may be a perfect occasion to get started with Powershell or SSIS to parse your documents.SSC has some nice articles to get you started with both.
For powershell this one may just do it: http://www.sqlservercentral.com/articles/powershell/72051/
I'd love to see a Powershell script do the kind of parsing I think the OP is doing. Do you have a link or code example you can point to for such parsing, Johan?
IMHO, SSIS sucks as bad at what I think of as the "parsing of text" documents as does T-SQL. In fact, I'm not sure such a capability exists in SSIS. I believe a script would need to be called. I could be wrong and look forward to any corrections you might want to make on my statements.
The only thing that I know of that works reasonable well in SQL Server for the "parsing of text documents" is FULL TEXT INDEXING. Even that pales in comparison to some of the commercially made FULL TEXT search engines such as DtSearch and the like.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 9, 2011 at 6:48 pm
Jeff Moden (10/9/2011)
ALZDBA (10/9/2011)
This may be a perfect occasion to get started with Powershell or SSIS to parse your documents.SSC has some nice articles to get you started with both.
For powershell this one may just do it: http://www.sqlservercentral.com/articles/powershell/72051/
I'd love to see a Powershell script do the kind of parsing I think the OP is doing. Do you have a link or code example you can point to for such parsing, Johan?
IMHO, SSIS sucks as bad at what I think of as the "parsing of text" documents as does T-SQL. In fact, I'm not sure such a capability exists in SSIS. I believe a script would need to be called. I could be wrong and look forward to any corrections you might want to make on my statements.
The only thing that I know of that works reasonable well in SQL Server for the "parsing of text documents" is FULL TEXT INDEXING. Even that pales in comparison to some of the commercially made FULL TEXT search engines such as DtSearch and the like.
I was thinking about using split
get-help about_split
or a combination with
Get-Help Select-String
edited:
Here's a nice ref: http://www.leeholmes.com/blog/2005/11/14/parse-textobject-%e2%80%93-awk-with-a-vengeance/
http://blogs.msdn.com/b/powershell/archive/2008/03/23/select-string-and-grep.aspx
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
October 9, 2011 at 7:38 pm
Very interesting. Thanks, Johan. Have you tested it for performance at all? And, yes... I'll check out the link you provided.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply