Performance issue with tally solution

  • Florian Reischl (4/25/2009)


    *ARGH!*

    Yeah, I know how you feel...!

    By the way, reading the whole data column (all rows at once) inside the CLR TVF works very well. Splitting that data also works well. However, writing the results to the database down the SqlPipe is horrible. It ends up being about as slow as the UNION ALL tally on the unique data set.

    It's a shame, because the plan was to read the CSV data into a streaming SqlDataReader, writing the results back down the context connection pipe through the streaming CLR TVF interface.

    The problem is that one is only allowed to do data operations in the InitMethod part of the CLR function. If you try to create a SqlDataReader in Init and then read it on the streaming side (while being called through IEnumerator) the SqlDataReader is closed and all attempts to access data fail. Annoying.

    It would have been great because one could just write:

    SELECT * FROM dbo.My_Streaming_TVF ('dbo.DataTable', 'KeyColumnName', 'DataColumnName')and you would get all the results at once, with just one call through reflection (or whatever it uses to bind to the IEnumerator), and both reading and writing would have been streaming, and therefore very scalable. Oh well, never mind.

    Another (non-CLR) solution I tried was to bulk-copy the data table out to a file in character format, using no field terminator, and a comma as the record terminator. Bulk copying the file back in using comma as the field terminator *and* record terminator results in the file being split quite nicely. It's actually quite fast, and should scale extremely well - it's just a shame I have to use character format to use the comma-trick: native bcp would be much much faster. I will post the code to do this, just for your amusement.

    One last thing. Using SELECT INTO to create the result table is a bit unfair on any method (e.g. a stored procedure) which cannot be used that way. SELECT INTO can be bulk logged if the conditions are right, whereas INSERT INTO never is, at least on 2K5.

    Cheers!

    Paul

    edit: TCF != TVF :c)

  • Derek Dongray (4/24/2009)


    Bob Hovious (4/23/2009)


    With respect to testing, and I apologize for not having done more, it seems there are at least three or four variables about the test data to consider:

    1. The number of rows in the table

    2. The average size of the data in the column to be parsed

    3. The average number of delimited strings to be parsed per row

    4. The average size of the strings being parsed (really the inverse of #3 above).

    I never thought about this before, but would the optimizer choose a different execution plan based on the characteristics (RAM, number of processors) of the server (or instance) it was running on?

    I'd always been aware that all the above could affect the choice of execution plan, but usually found in practice that one plan was a clear winner in all situations so there was never any choice.

    This is the first example I've found where SQLserver changes the plan based simply on the number of rows input.

    ... and the number of delimiters

    ... and the number of characters between the delimiters

    ... and the randomness of the position of delimiters from line to line

    ... and whether or not the string goes "off page" with VARCHAR(MAX) or not

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Florian Reischl (4/25/2009)


    @Bob, Jeff and the rest of the tally masters

    This becomes a bit esoteric but as Bob said "if one tally is fast, two are faster"... If my UNION ALL dual-carburetor tally is fast and Bob's double-barreled tally solution is faster, a double-barreled dual-carburetor tally is even faster . The source becomes quiet huge for a requirement to split strings, but is 10% faster and it handles the first and last element:

    In my very brief testing (I'll eventually have time to publish it here) against 799 random numbers (0 to 1 billion) per row for only 1,000 rows, I actually had to abort the run for that method because it was so slow.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Florian Reischl (4/25/2009)


    The Article

    Sorry that the article is still pending, but it is quiet hard to write an article since all test results still changing... I didn't forget, I just have to finish all tests.

    Take your time, Flo... testing to make sure things are repeatable is key for this type of article. And, with so many permutations of data that affect the outcome of runs, it's gonna take some time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Flo,

    As threatened^H^H^H^H promised, here is a quick example of the BCP method (in SQLCMD mode in SSMS):

    Doing this in tempdb obviates the need to set recovery appropriately to enable fast bulk copy.

    -- Create output table to hold the split rows

    -- Column must be NULL because of leading and trailing commas

    if object_id(N'tempdb.dbo.Rows', N'u') is not null drop table tempdb.dbo.Rows;

    create table tempdb.dbo.Rows (Data varchar(8000) null);

    -- Bulk copy out to temp file (< 400ms for me)

    !!bcp tempdb.dbo.JBMTest out %TEMP%\JBMTest.bcp -c -t "" -r "," -S (local) -T

    -- Bulk copy in splitting at the commas (< 4,000ms here)

    !!bcp tempdb.dbo.Rows in %TEMP%\JBMTest.bcp -r "," -c -S (local) -T -h "TABLOCK"

    -- Tidy up

    !!del %TEMP%\JBMTest.bcp

    Just tried it with Moby Dick:

    197376 rows copied.

    Network packet size (bytes): 4096

    Clock Time (ms.) Total : 1187 Average : (166281.38 rows per sec.)

    Cheers,

    Paul

    edit: added Moby Dick results

  • Paul White (4/25/2009)


    Hey Flo!

    These new tally solutions are only fast if all the rows in dbo.JBMTest are identical.

    That's exactly what my testing was showing me. Randomness of position of the delimiters from row to row makes or breaks the double barreled Tally methods and several others.

    As I said, I've not had a huge amount of time to do any of the testing, but I did make a "heart breaker" set of data to test against. Here's the code to build the test table...

    [font="Courier New"]/**************************************************************************************************

     Purpose:

      Generate highly randomized CSV rows of random numbers for CSV Splitter testing.  It' virtually 

      impossible that this code will create any identical rows with this code. It's not "Moby Dick", 

      but it does produce CSV's that are more likely to be seen in the real world.

      I wrote this so that it's compatible with both SQL Server 2000 and 2005 so virtually anyone

      can play along.  The only thing that needs a manual change to run in 2k is the object of the TOP

      statement in the actual table creation code below.

      Notes:

      1. The code is setup for VARCHAR(8000).  To test for VARCHAR(MAX), do a Search'n'Replace to

         change all VARCHAR(8000) to VARCHAR(MAX).

      2. If you get the following error, you've selected too many elements for the size numbers and 

         have most likely exceeded the VARCHAR(8000).  You simply need to reduce either the range of

         numbers or the number of elements, or both.  Of course, that probably won't happen with

         VARCHAR(MAX) if you've changed to that.

            Msg 8152, Level 16, State 14, Line 32

            String or binary data would be truncated.

            The statement has been terminated.

      3. The settings in the code are currently set to make 800 elements with numbers ranging from 

         1 to 1,000,000,000 (1 to 10 characters) that pretty much fill up VARCHAR(8000).

      4. Look for "Change the values here" in the code below.

    --  Jeff Moden

    **************************************************************************************************/

    --===== Change to a safe place to "play"

        USE TempDB

    GO

    --=================================================================================================

    --      Conditionally drop the special objects that we'll need to create

    --      (just to simplify reruns)

    --=================================================================================================

         IF OBJECT_ID('TempDB.dbo.RandomPositiveInt'IS NOT NULL

            DROP VIEW dbo.RandomPositiveInt

         IF OBJECT_ID('TempDB.dbo.RandomIntCSV'IS NOT NULL

            DROP FUNCTION dbo.RandomIntCSV

    GO

    --=================================================================================================

    --===== Create a view to return random positive integers starting at 0.

         -- This is necessary because you cannot yet use NEWID() in a UDF even in 2k5.

    --================================================================================================='

     CREATE VIEW dbo.RandomPositiveInt AS

     SELECT ABS(CHECKSUM(NEWID())) AS RandomPositiveInt

    GO

    --=================================================================================================

    --===== Create a function to make predictable ranges of CSV'd numbers

    --=================================================================================================

     CREATE FUNCTION dbo.RandomIntCSV(@NumberOfElements INT@Range INT@Offset INT)

    RETURNS VARCHAR(8000)

         AS

      BEGIN

            DECLARE @ReturnCSV VARCHAR(8000)

             SELECT @ReturnCSV ISNULL(@ReturnCSV+',','')

                  + CAST(rpi.RandomPositiveInt%@Range+@Offset AS VARCHAR(10))

               FROM dbo.Tally t

              CROSS JOIN dbo.RandomPositiveInt rpi

              WHERE t.N <= @NumberOfElements

     RETURN @ReturnCSV

        END

    GO

    --=================================================================================================

    --      This section of code creates the test table based on 

    --=================================================================================================

    --===== Declare some control variables that we can change the values of to make the generation

         -- of different size and width tables easy

    DECLARE @NumRows       INT,     --Number of rows to create

            @NumElements   INT,     --Per row

            @ElementRange  INT,     --Number of numbers in the range

            @ElementOfffSet INT,     --First number to appear in the range

            @AddDelimiters CHAR(1)  --If = 'Y', adds leading/trailing delimiters

    --===== Change the values here

     SELECT @NumRows       1000,

            @NumElements   800,

            @ElementRange  1000000000,

            @ElementOffSet 1,

            @AddDelimiters 'Y'

    --===== Conditionally drop the test table so we can rebuild it on reruns

         IF OBJECT_ID('dbo.JBMTest'IS NOT NULL

            DROP TABLE dbo.JBMTest

    --===== Create and populate the test table on the fly

     SELECT TOP (@NumRows-- You will need to hard code the value here in 2k.

            IDENTITY(INT,1,1AS RowNum,

            dbo.RandomIntCSV(@NumElements,@ElementRange,@ElementOffSetAS CSV

       INTO dbo.JBMTest

       FROM dbo.Tally

    --===== Conditionally add leading and trailing delimiters

         IF @AddDelimiters 'Y'

     UPDATE dbo.JBMTest

        SET CSV ','+CSV+','

    --===== Add a Primary Key

      ALTER TABLE dbo.JBMTest

        ADD CONSTRAINT PK_JBMTest_RowNum

            PRIMARY KEY CLUSTERED (RowNumWITH FILLFACTOR = 100

    --===== Display the stats for the new test table just for convenience in testing

     SELECT COUNT(*)       AS MeasuredRowCount,

            @NumElements   AS DesignedNumElements,

            CASE

                WHEN @AddDelimiters 'Y'

                THEN MAX(LEN(CSV) - LEN(REPLACE(CSV,',','')) - 1)

                ELSE MAX(LEN(CSV) - LEN(REPLACE(CSV,',','')) + 1)

            END AS MeasuredNumElements,

            @ElementOffSet AS DesignedElementOffset,

            @ElementRange  AS DesignedElementRange,

            @AddDelimiters AS DelimitersAdded,

            MIN(LEN(CSV))  AS MinCsvLength,

            AVG(LEN(CSV))  AS AvgCsvLength,

            MAX(LEN(CSV))  AS MaxCsvLength 

       FROM dbo.JBMTest

    GO

    [/font]

    And, Yes, I agree with Paul... if this stuff is in a file, using BCP to do the splits is one of the easiest and fastest ways.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (4/25/2009)


    One last thing. Using SELECT INTO to create the result table is a bit unfair on any method (e.g. a stored procedure) which cannot be used that way. SELECT INTO can be bulk logged if the conditions are right, whereas INSERT INTO never is, at least on 2K5.

    Still, I believe you'll find that SELECT INTO is still faster than INSERT INTO even on a FULL recovery database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/25/2009)


    That's exactly what my testing was showing me. Randomness of position of the delimiters from row to row makes or breaks the double barreled Tally methods and several others.

    The point I was making was that the UNION ALL method and it's derivatives split one row and then copy the results for all subsequent rows. The relatively good performance relies on the rows being *exact duplicates*. On more realistic data all these methods appear slower than the original tally solution.

    The tally methods seem to dislike rows with relatively few delimiters and/or long strings. This may be because these methods do a substring at every character position - with few delimeters, much of this effort is wasted. With long strings, the substrings take longer.

    Jeff Moden (4/25/2009)


    ...if this stuff is in a file, using BCP to do the splits is one of the easiest and fastest ways.

    For some large tasks, it may be worth doing the bcp out and in trick.

    Paul

  • Jeff Moden (4/25/2009)


    Paul White (4/25/2009)


    One last thing. Using SELECT INTO to create the result table is a bit unfair on any method (e.g. a stored procedure) which cannot be used that way. SELECT INTO can be bulk logged if the conditions are right, whereas INSERT INTO never is, at least on 2K5.

    Still, I believe you'll find that SELECT INTO is still faster than INSERT INTO even on a FULL recovery database.

    I'm not sure that is true for a small number of rows. Maybe Flo could run 201 tests...? :laugh:

    I was thinking of tempdb really because that is where a lot of tests get run.

    TempDB always has Simple recovery which would give SELECT INTO the maximum advantage.

  • Paul White (4/25/2009)


    The point I was making was that the UNION ALL method and it's derivatives split one row and then copy the results for all subsequent rows. The relatively good performance relies on the rows being *exact duplicates*. On more realistic data all these methods appear slower than the original tally solution.

    Agreed for the most part. As with you, my all-to-brief testing showed that when every row has delimiters in the same spot or very few delimiters over wide rows, the UNION ALL method runs quite fast. Randomize the postion of the delimiters from row-to-row and it slows down quite a bit. I'm still doing some occasional testing but, right now, the Union All method still appears to be faster than a flat Tally solution by a factor of about 700% (7 times faster) on the worst cast of totally random delimiters like what my test code makes.

    For some large tasks, it may be worth doing the bcp out and in trick.

    Absolutely agreed on this. The only step you've left out is hunting down and force feeding a whole truck load of pork chops to the moron (or, "moroff") that originally populated the column as a CSV. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (4/25/2009)


    Maybe Flo could run 201 tests...? :laugh:

    Pew... I give up with all those static test environments. I currently designed a generic test environment to cover different test at once. Please let me know if there is anything wrong. Yes, it uses a WHILE-loop but only to ensure that any type of strings can simply be created.

    @Jeff

    Sorry, your posted test environment is really good but I need something more flexible and a possibility to run different tests at once.

    [font="Courier New"]SET NOCOUNT ON

    GO

    -- Just create dummy table for SSMS-2k8 intelli sense

    IF (OBJECT_ID('dbo.Strings') IS NOT NULL)

       DROP TABLE dbo.Strings

    CREATE TABLE dbo.Strings (Id INT NOT NULL, Csv VARCHAR(8000))

    GO

    ----===============================================

    ---- Create a configuration table for all the CSV tests

    IF (OBJECT_ID('dbo.StringsConfig') IS NOT NULL)

       DROP TABLE dbo.StringsConfig

    CREATE TABLE dbo.StringsConfig

    (

       Id INT NOT NULL IDENTITY,

       ItemLength INT,

       ItemCount INT,

       ItemRows INT,

       ItemType NVARCHAR(128)

    )

    GO

    ----===============================================

    ---- Fill any type of configurations which shall be tested

    INSERT INTO dbo.StringsConfig

                 SELECT 10, 10, 10, N'VARCHAR(111)'

       UNION ALL SELECT 10, 10, 1000, N'VARCHAR(111)'

       --UNION ALL SELECT 10, 10, 10000, N'VARCHAR(111)'

       --UNION ALL SELECT 10, 10, 20000, N'VARCHAR(111)'

       --UNION ALL SELECT 10, 10, 100000, N'VARCHAR(111)'

       UNION ALL SELECT 10, 100, 10, N'VARCHAR(1101)'

       --UNION ALL SELECT 10, 100, 1000, N'VARCHAR(1101)'

       --UNION ALL SELECT 10, 100, 10000, N'VARCHAR(1101)'

       --UNION ALL SELECT 10, 100, 20000, N'VARCHAR(1101)'

       UNION ALL SELECT 10, 1000, 10, N'VARCHAR(MAX)'

       --UNION ALL SELECT 10, 1000, 100, N'VARCHAR(MAX)'

       --UNION ALL SELECT 10, 1000, 1000, N'VARCHAR(MAX)'

       --UNION ALL SELECT 10, 1000, 5000, N'VARCHAR(MAX)'

       UNION ALL SELECT 100, 10, 10, N'VARCHAR(1011)'

       --UNION ALL SELECT 100, 10, 100, N'VARCHAR(1011)'

       --UNION ALL SELECT 100, 10, 1000, N'VARCHAR(1011)'

       --UNION ALL SELECT 100, 10, 10000, N'VARCHAR(1011)'

       --UNION ALL SELECT 100, 10, 20000, N'VARCHAR(1011)'

    GO

    ----===============================================

    ---- Delete previous test results from PerfMon table

    DELETE FROM PerfMon

    GO

    ----===============================================

    --- This is the batch which contains all tests

    --- It will be executed several times by: GO xxx

    DECLARE @ConfigId INT

    DECLARE @ItemLength INT

    DECLARE @ItemCount INT

    DECLARE @ItemRows INT

    DECLARE @ItemType NVARCHAR(128)

    DECLARE @sql NVARCHAR(MAX)

    -- Get the next configuration for strings table

    SELECT TOP(1)

          @ConfigId = Id,

          @ItemLength = ItemLength,

          @ItemCount = ItemCount,

          @ItemRows = ItemRows,

          @ItemType = ItemType

       FROM dbo.StringsConfig

       ORDER BY Id

    -- Delete current configuration

    DELETE FROM StringsConfig WHERE Id = @ConfigId

    IF (@@ROWCOUNT = 0)

       RETURN

    ----===============================================

    ---- Client output for current configuration

    PRINT ('')

    PRINT ('')

    PRINT ('----===================================================')

    PRINT ('---- Next Run')

    PRINT ('---- ' +

             'Item Length: ' + CONVERT(VARCHAR(10), @ItemLength) +

          ' | Count Per Row: ' + CONVERT(VARCHAR(10), @ItemCount) +

          ' | Row Count: ' + CONVERT(VARCHAR(10), @ItemRows) +

          ' | Item Type: ' + @ItemType)

    ----===============================================

    ---- Create a new dbo.Strings table with current configuration

    -- Drop existing

    EXECUTE sp_executesql N'IF (OBJECT_ID(''dbo.Strings'') IS NOT NULL) DROP TABLE dbo.Strings'

    -- Create new dbo.Strings table

    SELECT @sql = 'CREATE TABLE dbo.Strings (Id INT NOT NULL, Csv ' + @ItemType + ')'

    EXECUTE sp_executesql @sql

    ----===============================================

    --- Dynamic SQL to fill the current dbo.Strings table with test data

    SELECT @sql = '

    INSERT INTO dbo.Strings (Id, Csv)

       SELECT TOP(@ItemRows)

             N, ''''

          FROM dbo.Tally

    DECLARE @ReplicationCount INT

    SELECT @ReplicationCount = CASE WHEN @ItemLength <= 36 THEN 1 ELSE @ItemLength / 36 END

    WHILE (@ItemCount > 0)

    BEGIN

       UPDATE dbo.Strings SET Csv = Csv + '','' + CONVERT(' + @ItemType + ', LEFT(REPLICATE(NEWID(), @ReplicationCount), @ItemLength))

       SELECT @ItemCount = @ItemCount - 1

    END

    UPDATE dbo.Strings SET Csv = Csv + '',''

    ALTER TABLE dbo.Strings

       ADD CONSTRAINT PK_Stringgs

       PRIMARY KEY CLUSTERED (Id)

    '

    EXECUTE sp_executesql

          @sql,

          N'@ItemCount INT, @ItemLength INT, @ItemRows INT',

          @ItemCount = @ItemCount,

          @ItemLength = @ItemLength,

          @ItemRows = @ItemRows

    /* CHECK TEST DATA

    SELECT TOP(100) LEN(Csv), * FROM dbo.Strings

    */

    ----===============================================

    ---- Discard all possibly existing, definitely wrong plans

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    ----================================================================

    ----

    ---- PLACE FOR ANY TESTS ON CSV ALIKE DATA

    ----

    ----================================================================

    GO 10 -- MUST BE AT LEAST THE COUNT OF ITEMS IN StringsConfig

    [/font]

    Greets

    Flo

  • Florian Reischl (4/25/2009)


    @Jeff

    Sorry, your posted test environment is really good but I need something more flexible and a possibility to run different tests at once.

    Heh... no need to apologize, Flo... at least we're making some progress towards something more real-world than totally static, homogenous, and single line testing (although single line testing is appropriate for GUI application of split functions).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • [p]I'm afraid I've been still plugging away with splitting chunks of Moby Dick into lines. This is to try and find which function returns a table of lines from Moby Dick the fastest.[/p]

    [p]Why Moby Dick? Because it is real text with lots of variation in the number of lines and all sorts of unpredictabilities. There is lots of it too. It is easy to get hold of.[/p]

    [p]Why find the best function to do this? Because the key to the original task is surely to cross apply with a function, if we are to get a realistic comparison between techniques. [/p]

    [p]Why persevere? Because it is revealing a lot about the comparative performance of the various techniques under the sort of heavy load you get from -say- importing a logfile from IIS, or a datafeed from a piece of test equipment.[/p]

    Testrun on a slow server (The Dog-2005) (horizontal line is size of string)- see the key on the lower graph. bloody thing seems to have vanished.

    Testrun on fast server (2005)

    Best wishes,
    Phil Factor

  • So are we just saying it depends?

    Seriously, if this can be boiled down to performance characteristics tied to a matrix of variables, it would be a valuable resource to help select which technique(s) to use. Developers could look at the characteristics, including volume , of the data to be parsed and pick an appropriate technique. Phil, I've never had to parse anything like Moby Dick, but I would not bet the ranch that I will never be asked to.

    Jeff: How's the book coming along. ?

    Flo: I love the RE-double. I have a feeling that learning to nest Cross Apply queries is going to prove valuable in other areas, as is knowing when to TOP().

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Phil, I've never had to parse anything like Moby Dick, but I would not bet the ranch that I will never be asked to.

    Most data feeds are a bit more regular, I'll admit, but then it is great for testing text parsing routines. I've had to parse larger strings than this before now. (e.g. daily feed from a telecommunications switch) though I've usually used BCP to do the line-splitting for me. I've been more interested in doing this sort of thing for website files for fast indexing, recently.

    Best wishes,
    Phil Factor

Viewing 15 posts - 211 through 225 (of 522 total)

You must be logged in to reply to this topic. Login to reply