Pulling Strings from larger string.

  • I'm trying to get parameter names and variables out of huge strings that I am importing from log files. The parameters look kind of like this:

    Parameter Name: PN.someParName

    In the file the parameter and respective value looks like this:

    PN.someParName=thisIsTheValue&PN.someParName2=anotherValue&PN.someParName3=lastValue

    ...and goes on row after row after row.

    I want to be able to return the parameter names as a row set and the values associated also. Basically I want to be able to get this...

    Parameter Name Value

    PN.someParName thisIsTheValue

    PN.someParName2 anotherValue

    PN.someParName3 lastValue

    Would the best way to parse this be with regex code? What would be a suggested way to go about this? In SSIS or in CLR compiled procedure using Regex? ...or is there an easy way to do it in regular T-SQL?

    One point of context, I'll be processing millions of rows at a time.

    Thanks,

    Adron

  • I'd think Regex using CLR integration. Specifically a CLR Table-valued function.

    If you created the right Regex, then simply ran the MATCHES method - you'd end up with an array of all parameters and values, which you'd then need to turn into a table variable.

    Here's a good running start at it:

    http://blogs.msdn.com/stuartpa/archive/2005/07/21/441468.aspx

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks for the input Matt.

    So far I just did some basic old C# string parsing using the string object and string builder object. That is a remarkable improvement over whatever is used in the previous methods.

    So next I'm going to work in some regex work and see how things go.

    So far I have the import and some parsing with the C# string parsing down to about 3 minutes for 600,000 rows of data. That is from text log file to clean well formed table.

    I'm hoping I can get that down to about 1-2 minutes with full parsing. I think with Regular expressions I might just be able to do that. 😎

  • How would you like to import 5 million rows in about a minute? Post, say, 100 rows of data as an attachment and I'll show you how to build a BCP format file and use BULK INSERT to just wail on files like this.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply