words of a string

  • >>BTW. Why do you need SQL Server? What's wrong with file system?

    >>There is no way. It must be proper database solution.

    You just confused the heck out of me The two things you posted above seem to be diametrically opposite from one another...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • They are not opposite.

    They are about different things.

    For the approach we can see in Sergei's post there is no need in SQL Server. I don't see any of RDMS features used there.

    If you need really quick search you need different approach involving RDMS feachures. But it has nothing to do with the code Sergei has posted.

    _____________
    Code for TallyGenerator

  • test

  • Sergiy,

    to my defense :=) I can say that I was referring to a different situation, when a lot more than simple search by a predefined sting is going on. I wasn't searching for a certain string but rather for a more complex pattern like e.g. [this data analysis step of chemical names ]

    Logic:

    find/flag out a chemical name  [in SQL table] if it has a number >=131 [as a part of its name] NOT Preceded by str 'C.I.' OR 'FD&C'; the name should not have str 'freon' Or 'antioxidant' OR 'refrigerant' appearing anywhere

    the query for this is using a CLR function Regex_is_match taking 2 params:

    string_to_analyze and a regular expression pattern that concisely captures the above logic and delegates all the work on finding the matches to a Regular Expression Class Library:

    select InputStringToAnalyze from Names_All where dbo.Regex_is_match(InputStringToAnalyze,

     '^(?!.*?(?:1[0-2]\d|13[01]))(?!.*?(C\.I\.|FD&C|freon|antioxidant|refrigerant)).*?\d{3}') = 1

    This type of search (as a part of data analysis) facilitated by RegularExpression Libraries (I use .NET) involvves a lot more operations to be completed by pre-compiled .NET code that got invoked when a CLR fuction is called. This is a huge overhead, and that's why it naturally runs slower than a search by a string (even a long one) in T-SQL. Still being able to process 250k recs (not 10k as u said: those were matches returned by SQL) is a good speed for data analysis applications, where speed is not that critical ( I work as a data analyst and my raw data comes to me in the form of Access/SQL tables).

    Regards,

    Sergei

  • sorry for the typo. I wanted to say:

    *250k records in 15 sec*

    when i was referring to the speed of the processing

  • I'm not attacking your solution.

    I just don't see any point of purchasing SQL Server licenses if you are using this kind of solutuions.

    What's gonna change if we make little change to your statement:

    Logic:

    find/flag out a chemical name  [in flat file string] if it has a number >=131 [as a part of its name] NOT Preceded by str 'C.I.' OR 'FD&C'; the name should not have str 'freon' Or 'antioxidant' OR 'refrigerant' appearing anywhere

    Can you point what's gonna change in your solution?

    It does not matter where you get that string from - NTEXT column in SQL Sever database or flat file. You are processing it in memory by your application code anyway.

    P.S. There is timeout for creating posts. If you were writing it for too long time copy your entire post to clipboard and paste it back if it failed.

    _____________
    Code for TallyGenerator

  • I need SQL license b/c I need to process SQL tables before they go get uploaded to a database on our site (http://www.knovel.com) And b/c the original data i'm getting in the form of Access/SQL tables, i dont see a better solution than use SQL 2005 coupled with .NET capabilities to do data processing/modifications. Would you agree?

  • But you don't process data in SQL Server, as I can see.

    You store it as you get it in free text. And when you need to get something from it you read the whole thind and process it in external C# application.

    Where is a job for SQL Server?

    _____________
    Code for TallyGenerator

  • the job for SQL Server is :

    1. to keep the data in relational DB datastructure

    2. to process/modify the data by using native T-SQL capabilities

    3. to happily delegate some processing steps [like the one I describe above] to .NET environment, when it's easier to accomplish compared to pure T-SQL means.

     

     

  • i think that was more or less the idea behind MSOFT's integration of SQL 2005 and .NET platform, at least when it comes to SQL Server enhansments/improvements.

  • Just a note... if you take to long writing a post, this site looses it's mind  and destroys your post... ALWAYS remember to COPY the post before you click POST REPLY or PREVIEW...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • thanks Jeff, I got it.

  • Probably you have some data kept in relational DB datastructure.

    Probably you process it using native T-SQL capabilities.

    But it's definetely not all of the data.

    At least example we discussed here has nothing in common with relational DB. It's just flat file for some freaky reason uploaded to BLOB field in database.

    And you use flat file approach to treat the data dumped in that field.

    I'm not sure what you mean by "it's easier to accomplish compared to pure T-SQL means", but for me it's much more easier to create 3 tables + 1 UDF + 1 VIEW than debug such an expression as you used in your code.

    And performance of SQL solution will be much better than .NET one. It's guaranteed.

    Not 20% faster, not 100%. As I said - under 1 seconds for 4 mil rows search.

    Not to mention flexiblity and additional bonuses from getting data sorted on arrival.

    _____________
    Code for TallyGenerator

  • Be patient with me, Serqiy... I'm confused ... I thought you were saying before that you didn't need SQL, just the file system to do this.  Now you're talking an SQL solution with 3 tables, a UDF, and a View.  Not sure where you're going with all this  ...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh, no!

    Again!

    File system is quite enogh for rational expression solution. This solution does not take any advantage of relational model.

    That's why I was asking the guy who is practicing that approach why he needs SQL Server.

    I'm not practicing rational expressions. I'm doing indexes and keys.

    And I use to sort my data on arrival. That's why my databases don't need NTEXT columns. Except the cases when nobody cares about content of those texts.

    As you can guess search against indexed entities is really fast.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply