September 24, 2006 at 6:10 pm
>>BTW. Why do you need SQL Server? What's wrong with file system?
>>There is no way. It must be proper database solution.
You just confused the heck out of me The two things you posted above seem to be diametrically opposite from one another...
--Jeff Moden
Change is inevitable... Change for the better is not.
September 24, 2006 at 7:39 pm
They are not opposite.
They are about different things.
For the approach we can see in Sergei's post there is no need in SQL Server. I don't see any of RDMS features used there.
If you need really quick search you need different approach involving RDMS feachures. But it has nothing to do with the code Sergei has posted.
_____________
Code for TallyGenerator
September 24, 2006 at 7:57 pm
test
September 24, 2006 at 8:04 pm
Sergiy,
to my defense :=) I can say that I was referring to a different situation, when a lot more than simple search by a predefined sting is going on. I wasn't searching for a certain string but rather for a more complex pattern like e.g. [this data analysis step of chemical names ]
Logic:
find/flag out a chemical name [in SQL table] if it has a number >=131 [as a part of its name] NOT Preceded by str 'C.I.' OR 'FD&C'; the name should not have str 'freon' Or 'antioxidant' OR 'refrigerant' appearing anywhere
the query for this is using a CLR function Regex_is_match taking 2 params:
string_to_analyze and a regular expression pattern that concisely captures the above logic and delegates all the work on finding the matches to a Regular Expression Class Library:
select InputStringToAnalyze from Names_All where dbo.Regex_is_match(InputStringToAnalyze,
'^(?!.*?(?:1[0-2]\d|13[01]))(?!.*?(C\.I\.|FD&C|freon|antioxidant|refrigerant)).*?\d{3}') = 1
This type of search (as a part of data analysis) facilitated by RegularExpression Libraries (I use .NET) involvves a lot more operations to be completed by pre-compiled .NET code that got invoked when a CLR fuction is called. This is a huge overhead, and that's why it naturally runs slower than a search by a string (even a long one) in T-SQL. Still being able to process 250k recs (not 10k as u said: those were matches returned by SQL) is a good speed for data analysis applications, where speed is not that critical ( I work as a data analyst and my raw data comes to me in the form of Access/SQL tables).
Regards,
Sergei
September 24, 2006 at 8:11 pm
sorry for the typo. I wanted to say:
*250k records in 15 sec*
when i was referring to the speed of the processing
September 24, 2006 at 8:30 pm
I'm not attacking your solution.
I just don't see any point of purchasing SQL Server licenses if you are using this kind of solutuions.
What's gonna change if we make little change to your statement:
Logic:
find/flag out a chemical name [in flat file string] if it has a number >=131 [as a part of its name] NOT Preceded by str 'C.I.' OR 'FD&C'; the name should not have str 'freon' Or 'antioxidant' OR 'refrigerant' appearing anywhere
Can you point what's gonna change in your solution?
It does not matter where you get that string from - NTEXT column in SQL Sever database or flat file. You are processing it in memory by your application code anyway.
P.S. There is timeout for creating posts. If you were writing it for too long time copy your entire post to clipboard and paste it back if it failed.
_____________
Code for TallyGenerator
September 24, 2006 at 10:34 pm
I need SQL license b/c I need to process SQL tables before they go get uploaded to a database on our site (http://www.knovel.com) And b/c the original data i'm getting in the form of Access/SQL tables, i dont see a better solution than use SQL 2005 coupled with .NET capabilities to do data processing/modifications. Would you agree?
September 25, 2006 at 1:47 am
But you don't process data in SQL Server, as I can see.
You store it as you get it in free text. And when you need to get something from it you read the whole thind and process it in external C# application.
Where is a job for SQL Server?
_____________
Code for TallyGenerator
September 25, 2006 at 6:00 am
the job for SQL Server is :
1. to keep the data in relational DB datastructure
2. to process/modify the data by using native T-SQL capabilities
3. to happily delegate some processing steps [like the one I describe above] to .NET environment, when it's easier to accomplish compared to pure T-SQL means.
September 25, 2006 at 6:04 am
i think that was more or less the idea behind MSOFT's integration of SQL 2005 and .NET platform, at least when it comes to SQL Server enhansments/improvements.
September 25, 2006 at 6:10 am
Just a note... if you take to long writing a post, this site looses it's mind and destroys your post... ALWAYS remember to COPY the post before you click POST REPLY or PREVIEW...
--Jeff Moden
Change is inevitable... Change for the better is not.
September 25, 2006 at 6:14 am
thanks Jeff, I got it.
September 25, 2006 at 7:36 pm
Probably you have some data kept in relational DB datastructure.
Probably you process it using native T-SQL capabilities.
But it's definetely not all of the data.
At least example we discussed here has nothing in common with relational DB. It's just flat file for some freaky reason uploaded to BLOB field in database.
And you use flat file approach to treat the data dumped in that field.
I'm not sure what you mean by "it's easier to accomplish compared to pure T-SQL means", but for me it's much more easier to create 3 tables + 1 UDF + 1 VIEW than debug such an expression as you used in your code.
And performance of SQL solution will be much better than .NET one. It's guaranteed.
Not 20% faster, not 100%. As I said - under 1 seconds for 4 mil rows search.
Not to mention flexiblity and additional bonuses from getting data sorted on arrival.
_____________
Code for TallyGenerator
September 25, 2006 at 9:16 pm
Be patient with me, Serqiy... I'm confused ... I thought you were saying before that you didn't need SQL, just the file system to do this. Now you're talking an SQL solution with 3 tables, a UDF, and a View. Not sure where you're going with all this ...
--Jeff Moden
Change is inevitable... Change for the better is not.
September 25, 2006 at 10:57 pm
Oh, no!
Again!
File system is quite enogh for rational expression solution. This solution does not take any advantage of relational model.
That's why I was asking the guy who is practicing that approach why he needs SQL Server.
I'm not practicing rational expressions. I'm doing indexes and keys.
And I use to sort my data on arrival. That's why my databases don't need NTEXT columns. Except the cases when nobody cares about content of those texts.
As you can guess search against indexed entities is really fast.
_____________
Code for TallyGenerator
Viewing 15 posts - 16 through 30 (of 39 total)
You must be logged in to reply to this topic. Login to reply