Performance issue with tally solution

Question

Performance issue with tally solution

Viewing 15 posts - 211 through 225 (of 522 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

Florian Reischl (4/25/2009)
*ARGH!*

Yeah, I know how you feel...!

By the way, reading the whole data column (all rows at once) inside the CLR TVF works very well. Splitting that data also works well. However, writing the results to the database down the SqlPipe is horrible. It ends up being about as slow as the UNION ALL tally on the unique data set.

It's a shame, because the plan was to read the CSV data into a streaming SqlDataReader, writing the results back down the context connection pipe through the streaming CLR TVF interface.

The problem is that one is only allowed to do data operations in the InitMethod part of the CLR function. If you try to create a SqlDataReader in Init and then read it on the streaming side (while being called through IEnumerator) the SqlDataReader is closed and all attempts to access data fail. Annoying.

It would have been great because one could just write:

SELECT * FROM dbo.My_Streaming_TVF ('dbo.DataTable', 'KeyColumnName', 'DataColumnName')and you would get all the results at once, with just one call through reflection (or whatever it uses to bind to the IEnumerator), and both reading and writing would have been streaming, and therefore very scalable. Oh well, never mind.

Another (non-CLR) solution I tried was to bulk-copy the data table out to a file in character format, using no field terminator, and a comma as the record terminator. Bulk copying the file back in using comma as the field terminator *and* record terminator results in the file being split quite nicely. It's actually quite fast, and should scale extremely well - it's just a shame I have to use character format to use the comma-trick: native bcp would be much much faster. I will post the code to do this, just for your amusement.

One last thing. Using SELECT INTO to create the result table is a bit unfair on any method (e.g. a stored procedure) which cannot be used that way. SELECT INTO can be bulk logged if the conditions are right, whereas INSERT INTO never is, at least on 2K5.

Cheers!

Paul

edit: TCF != TVF :c)

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 2

Derek Dongray (4/24/2009)
Bob Hovious (4/23/2009)
With respect to testing, and I apologize for not having done more, it seems there are at least three or four variables about the test data to consider:
1. The number of rows in the table
2. The average size of the data in the column to be parsed
3. The average number of delimited strings to be parsed per row
4. The average size of the strings being parsed (really the inverse of #3 above).
I never thought about this before, but would the optimizer choose a different execution plan based on the characteristics (RAM, number of processors) of the server (or instance) it was running on?
I'd always been aware that all the above could affect the choice of execution plan, but usually found in practice that one plan was a clear winner in all situations so there was never any choice.
This is the first example I've found where SQLserver changes the plan based simply on the number of rows input.

... and the number of delimiters

... and the number of characters between the delimiters

... and the randomness of the position of delimiters from line to line

... and whether or not the string goes "off page" with VARCHAR(MAX) or not

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 3

Florian Reischl (4/25/2009)
@Bob, Jeff and the rest of the tally masters
This becomes a bit esoteric but as Bob said "if one tally is fast, two are faster"... If my UNION ALL dual-carburetor tally is fast and Bob's double-barreled tally solution is faster, a double-barreled dual-carburetor tally is even faster . The source becomes quiet huge for a requirement to split strings, but is 10% faster and it handles the first and last element:

In my very brief testing (I'll eventually have time to publish it here) against 799 random numbers (0 to 1 billion) per row for only 1,000 rows, I actually had to abort the run for that method because it was so slow.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 4

Florian Reischl (4/25/2009)
The Article
Sorry that the article is still pending, but it is quiet hard to write an article since all test results still changing... I didn't forget, I just have to finish all tests.

Take your time, Flo... testing to make sure things are repeatable is key for this type of article. And, with so many permutations of data that affect the outcome of runs, it's gonna take some time.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 5

Flo,

As threatened^H^H^H^H promised, here is a quick example of the BCP method (in SQLCMD mode in SSMS):

Doing this in tempdb obviates the need to set recovery appropriately to enable fast bulk copy.

-- Create output table to hold the split rows

-- Column must be NULL because of leading and trailing commas

if object_id(N'tempdb.dbo.Rows', N'u') is not null drop table tempdb.dbo.Rows;

create table tempdb.dbo.Rows (Data varchar(8000) null);

-- Bulk copy out to temp file (< 400ms for me)

!!bcp tempdb.dbo.JBMTest out %TEMP%\JBMTest.bcp -c -t "" -r "," -S (local) -T

-- Bulk copy in splitting at the commas (< 4,000ms here)

!!bcp tempdb.dbo.Rows in %TEMP%\JBMTest.bcp -r "," -c -S (local) -T -h "TABLOCK"

-- Tidy up

!!del %TEMP%\JBMTest.bcp

Just tried it with Moby Dick:

197376 rows copied.

Network packet size (bytes): 4096

Clock Time (ms.) Total : 1187 Average : (166281.38 rows per sec.)

Cheers,

Paul

edit: added Moby Dick results

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 6

Paul White (4/25/2009)
Hey Flo!
These new tally solutions are only fast if all the rows in dbo.JBMTest are identical.

That's exactly what my testing was showing me. Randomness of position of the delimiters from row to row makes or breaks the double barreled Tally methods and several others.

As I said, I've not had a huge amount of time to do any of the testing, but I did make a "heart breaker" set of data to test against. Here's the code to build the test table...

[font="Courier New"]/**************************************************************************************************

Purpose:

Generate highly randomized CSV rows of random numbers for CSV Splitter testing. It' virtually

impossible that this code will create any identical rows with this code. It's not "Moby Dick",

but it does produce CSV's that are more likely to be seen in the real world.

I wrote this so that it's compatible with both SQL Server 2000 and 2005 so virtually anyone

can play along. The only thing that needs a manual change to run in 2k is the object of the TOP

statement in the actual table creation code below.

Notes:

1. The code is setup for VARCHAR(8000). To test for VARCHAR(MAX), do a Search'n'Replace to

change all VARCHAR(8000) to VARCHAR(MAX).

2. If you get the following error, you've selected too many elements for the size numbers and

have most likely exceeded the VARCHAR(8000). You simply need to reduce either the range of

numbers or the number of elements, or both. Of course, that probably won't happen with

VARCHAR(MAX) if you've changed to that.

Msg 8152, Level 16, State 14, Line 32

String or binary data would be truncated.

The statement has been terminated.

3. The settings in the code are currently set to make 800 elements with numbers ranging from

1 to 1,000,000,000 (1 to 10 characters) that pretty much fill up VARCHAR(8000).

4. Look for "Change the values here" in the code below.

-- Jeff Moden

**************************************************************************************************/

--===== Change to a safe place to "play"

USE TempDB

GO

--=================================================================================================

-- Conditionally drop the special objects that we'll need to create

-- (just to simplify reruns)

--=================================================================================================

IF OBJECT_ID('TempDB.dbo.RandomPositiveInt') IS NOT NULL

DROP VIEW dbo.RandomPositiveInt

IF OBJECT_ID('TempDB.dbo.RandomIntCSV') IS NOT NULL

DROP FUNCTION dbo.RandomIntCSV

GO

--=================================================================================================

--===== Create a view to return random positive integers starting at 0.

-- This is necessary because you cannot yet use NEWID() in a UDF even in 2k5.

--================================================================================================='

CREATE VIEW dbo.RandomPositiveInt AS

SELECT ABS(CHECKSUM(NEWID())) AS RandomPositiveInt

GO

--=================================================================================================

--===== Create a function to make predictable ranges of CSV'd numbers

--=================================================================================================

CREATE FUNCTION dbo.RandomIntCSV(@NumberOfElements INT, @Range INT, @Offset INT)

RETURNS VARCHAR(8000)

AS

BEGIN

DECLARE @ReturnCSV VARCHAR(8000)

SELECT @ReturnCSV = ISNULL(@ReturnCSV+',','')

+ CAST(rpi.RandomPositiveInt%@Range+@Offset AS VARCHAR(10))

FROM dbo.Tally t

CROSS JOIN dbo.RandomPositiveInt rpi

WHERE t.N <= @NumberOfElements

RETURN @ReturnCSV

END

GO

--=================================================================================================

-- This section of code creates the test table based on

--=================================================================================================

--===== Declare some control variables that we can change the values of to make the generation

-- of different size and width tables easy

DECLARE @NumRows INT, --Number of rows to create

@NumElements INT, --Per row

@ElementRange INT, --Number of numbers in the range

@ElementOfffSet INT, --First number to appear in the range

@AddDelimiters CHAR(1) --If = 'Y', adds leading/trailing delimiters

--===== Change the values here

SELECT @NumRows = 1000,

@NumElements = 800,

@ElementRange = 1000000000,

@ElementOffSet = 1,

@AddDelimiters = 'Y'

--===== Conditionally drop the test table so we can rebuild it on reruns

IF OBJECT_ID('dbo.JBMTest') IS NOT NULL

DROP TABLE dbo.JBMTest

--===== Create and populate the test table on the fly

SELECT TOP (@NumRows) -- You will need to hard code the value here in 2k.

IDENTITY(INT,1,1) AS RowNum,

dbo.RandomIntCSV(@NumElements,@ElementRange,@ElementOffSet) AS CSV

INTO dbo.JBMTest

FROM dbo.Tally

--===== Conditionally add leading and trailing delimiters

IF @AddDelimiters = 'Y'

UPDATE dbo.JBMTest

SET CSV = ','+CSV+','

--===== Add a Primary Key

ALTER TABLE dbo.JBMTest

ADD CONSTRAINT PK_JBMTest_RowNum

PRIMARY KEY CLUSTERED (RowNum) WITH FILLFACTOR = 100

--===== Display the stats for the new test table just for convenience in testing

SELECT COUNT(*) AS MeasuredRowCount,

@NumElements AS DesignedNumElements,

CASE

WHEN @AddDelimiters = 'Y'

THEN MAX(LEN(CSV) - LEN(REPLACE(CSV,',','')) - 1)

ELSE MAX(LEN(CSV) - LEN(REPLACE(CSV,',','')) + 1)

END AS MeasuredNumElements,

@ElementOffSet AS DesignedElementOffset,

@ElementRange AS DesignedElementRange,

@AddDelimiters AS DelimitersAdded,

MIN(LEN(CSV)) AS MinCsvLength,

AVG(LEN(CSV)) AS AvgCsvLength,

MAX(LEN(CSV)) AS MaxCsvLength

FROM dbo.JBMTest

GO

[/font]

And, Yes, I agree with Paul... if this stuff is in a file, using BCP to do the splits is one of the easiest and fastest ways.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 7

Paul White (4/25/2009)
One last thing. Using SELECT INTO to create the result table is a bit unfair on any method (e.g. a stored procedure) which cannot be used that way. SELECT INTO can be bulk logged if the conditions are right, whereas INSERT INTO never is, at least on 2K5.

Still, I believe you'll find that SELECT INTO is still faster than INSERT INTO even on a FULL recovery database.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 8

Jeff Moden (4/25/2009)
That's exactly what my testing was showing me. Randomness of position of the delimiters from row to row makes or breaks the double barreled Tally methods and several others.

The point I was making was that the UNION ALL method and it's derivatives split one row and then copy the results for all subsequent rows. The relatively good performance relies on the rows being *exact duplicates*. On more realistic data all these methods appear slower than the original tally solution.

The tally methods seem to dislike rows with relatively few delimiters and/or long strings. This may be because these methods do a substring at every character position - with few delimeters, much of this effort is wasted. With long strings, the substrings take longer.

Jeff Moden (4/25/2009)
...if this stuff is in a file, using BCP to do the splits is one of the easiest and fastest ways.

For some large tasks, it may be worth doing the bcp out and in trick.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 9

Jeff Moden (4/25/2009)
Paul White (4/25/2009)
One last thing. Using SELECT INTO to create the result table is a bit unfair on any method (e.g. a stored procedure) which cannot be used that way. SELECT INTO can be bulk logged if the conditions are right, whereas INSERT INTO never is, at least on 2K5.
Still, I believe you'll find that SELECT INTO is still faster than INSERT INTO even on a FULL recovery database.

I'm not sure that is true for a small number of rows. Maybe Flo could run 201 tests...? :laugh:

I was thinking of tempdb really because that is where a lot of tests get run.

TempDB always has Simple recovery which would give SELECT INTO the maximum advantage.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 10

Paul White (4/25/2009)
The point I was making was that the UNION ALL method and it's derivatives split one row and then copy the results for all subsequent rows. The relatively good performance relies on the rows being *exact duplicates*. On more realistic data all these methods appear slower than the original tally solution.

Agreed for the most part. As with you, my all-to-brief testing showed that when every row has delimiters in the same spot or very few delimiters over wide rows, the UNION ALL method runs quite fast. Randomize the postion of the delimiters from row-to-row and it slows down quite a bit. I'm still doing some occasional testing but, right now, the Union All method still appears to be faster than a flat Tally solution by a factor of about 700% (7 times faster) on the worst cast of totally random delimiters like what my test code makes.

For some large tasks, it may be worth doing the bcp out and in trick.

Absolutely agreed on this. The only step you've left out is hunting down and force feeding a whole truck load of pork chops to the moron (or, "moroff") that originally populated the column as a CSV. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 11

Paul White (4/25/2009)
Maybe Flo could run 201 tests...? :laugh:

Pew... I give up with all those static test environments. I currently designed a generic test environment to cover different test at once. Please let me know if there is anything wrong. Yes, it uses a WHILE-loop but only to ensure that any type of strings can simply be created.

@Jeff

Sorry, your posted test environment is really good but I need something more flexible and a possibility to run different tests at once.

[font="Courier New"]SET NOCOUNT ON

GO

-- Just create dummy table for SSMS-2k8 intelli sense

IF (OBJECT_ID('dbo.Strings') IS NOT NULL)

DROP TABLE dbo.Strings

CREATE TABLE dbo.Strings (Id INT NOT NULL, Csv VARCHAR(8000))

GO

----===============================================

---- Create a configuration table for all the CSV tests

IF (OBJECT_ID('dbo.StringsConfig') IS NOT NULL)

DROP TABLE dbo.StringsConfig

CREATE TABLE dbo.StringsConfig

(

Id INT NOT NULL IDENTITY,

ItemLength INT,

ItemCount INT,

ItemRows INT,

ItemType NVARCHAR(128)

)

GO

----===============================================

---- Fill any type of configurations which shall be tested

INSERT INTO dbo.StringsConfig

SELECT 10, 10, 10, N'VARCHAR(111)'

UNION ALL SELECT 10, 10, 1000, N'VARCHAR(111)'

--UNION ALL SELECT 10, 10, 10000, N'VARCHAR(111)'

--UNION ALL SELECT 10, 10, 20000, N'VARCHAR(111)'

--UNION ALL SELECT 10, 10, 100000, N'VARCHAR(111)'

UNION ALL SELECT 10, 100, 10, N'VARCHAR(1101)'

--UNION ALL SELECT 10, 100, 1000, N'VARCHAR(1101)'

--UNION ALL SELECT 10, 100, 10000, N'VARCHAR(1101)'

--UNION ALL SELECT 10, 100, 20000, N'VARCHAR(1101)'

UNION ALL SELECT 10, 1000, 10, N'VARCHAR(MAX)'

--UNION ALL SELECT 10, 1000, 100, N'VARCHAR(MAX)'

--UNION ALL SELECT 10, 1000, 1000, N'VARCHAR(MAX)'

--UNION ALL SELECT 10, 1000, 5000, N'VARCHAR(MAX)'

UNION ALL SELECT 100, 10, 10, N'VARCHAR(1011)'

--UNION ALL SELECT 100, 10, 100, N'VARCHAR(1011)'

--UNION ALL SELECT 100, 10, 1000, N'VARCHAR(1011)'

--UNION ALL SELECT 100, 10, 10000, N'VARCHAR(1011)'

--UNION ALL SELECT 100, 10, 20000, N'VARCHAR(1011)'

GO

----===============================================

---- Delete previous test results from PerfMon table

DELETE FROM PerfMon

GO

----===============================================

--- This is the batch which contains all tests

--- It will be executed several times by: GO xxx

DECLARE @ConfigId INT

DECLARE @ItemLength INT

DECLARE @ItemCount INT

DECLARE @ItemRows INT

DECLARE @ItemType NVARCHAR(128)

DECLARE @sql NVARCHAR(MAX)

-- Get the next configuration for strings table

SELECT TOP(1)

@ConfigId = Id,

@ItemLength = ItemLength,

@ItemCount = ItemCount,

@ItemRows = ItemRows,

@ItemType = ItemType

FROM dbo.StringsConfig

ORDER BY Id

-- Delete current configuration

DELETE FROM StringsConfig WHERE Id = @ConfigId

IF (@@ROWCOUNT = 0)

RETURN

----===============================================

---- Client output for current configuration

PRINT ('')

PRINT ('----===================================================')

PRINT ('---- Next Run')

PRINT ('---- ' +

'Item Length: ' + CONVERT(VARCHAR(10), @ItemLength) +

' | Count Per Row: ' + CONVERT(VARCHAR(10), @ItemCount) +

' | Row Count: ' + CONVERT(VARCHAR(10), @ItemRows) +

' | Item Type: ' + @ItemType)

----===============================================

---- Create a new dbo.Strings table with current configuration

-- Drop existing

EXECUTE sp_executesql N'IF (OBJECT_ID(''dbo.Strings'') IS NOT NULL) DROP TABLE dbo.Strings'

-- Create new dbo.Strings table

SELECT @sql = 'CREATE TABLE dbo.Strings (Id INT NOT NULL, Csv ' + @ItemType + ')'

EXECUTE sp_executesql @sql

----===============================================

--- Dynamic SQL to fill the current dbo.Strings table with test data

SELECT @sql = '

INSERT INTO dbo.Strings (Id, Csv)

SELECT TOP(@ItemRows)

N, ''''

FROM dbo.Tally

DECLARE @ReplicationCount INT

SELECT @ReplicationCount = CASE WHEN @ItemLength <= 36 THEN 1 ELSE @ItemLength / 36 END

WHILE (@ItemCount > 0)

BEGIN

UPDATE dbo.Strings SET Csv = Csv + '','' + CONVERT(' + @ItemType + ', LEFT(REPLICATE(NEWID(), @ReplicationCount), @ItemLength))

SELECT @ItemCount = @ItemCount - 1

END

UPDATE dbo.Strings SET Csv = Csv + '',''

ALTER TABLE dbo.Strings

ADD CONSTRAINT PK_Stringgs

PRIMARY KEY CLUSTERED (Id)

'

EXECUTE sp_executesql

@sql,

N'@ItemCount INT, @ItemLength INT, @ItemRows INT',

@ItemCount = @ItemCount,

@ItemLength = @ItemLength,

@ItemRows = @ItemRows

/* CHECK TEST DATA

SELECT TOP(100) LEN(Csv), * FROM dbo.Strings

*/

----===============================================

---- Discard all possibly existing, definitely wrong plans

DBCC FREEPROCCACHE WITH NO_INFOMSGS

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

----================================================================

----

---- PLACE FOR ANY TESTS ON CSV ALIKE DATA

----

----================================================================

GO 10 -- MUST BE AT LEAST THE COUNT OF ITEMS IN StringsConfig

[/font]

Greets

Flo

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 12

Florian Reischl (4/25/2009)
@Jeff
Sorry, your posted test environment is really good but I need something more flexible and a possibility to run different tests at once.

Heh... no need to apologize, Flo... at least we're making some progress towards something more real-world than totally static, homogenous, and single line testing (although single line testing is appropriate for GUI application of split functions).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Factor SSC-Insane Points: 20254 More actions · Answer 13

[p]I'm afraid I've been still plugging away with splitting chunks of Moby Dick into lines. This is to try and find which function returns a table of lines from Moby Dick the fastest.[/p]

[p]Why Moby Dick? Because it is real text with lots of variation in the number of lines and all sorts of unpredictabilities. There is lots of it too. It is easy to get hold of.[/p]

[p]Why find the best function to do this? Because the key to the original task is surely to cross apply with a function, if we are to get a realistic comparison between techniques. [/p]

[p]Why persevere? Because it is revealing a lot about the comparative performance of the various techniques under the sort of heavy load you get from -say- importing a logfile from IIS, or a datafeed from a piece of test equipment.[/p]

Testrun on a slow server (The Dog-2005) (horizontal line is size of string)- see the key on the lower graph. bloody thing seems to have vanished.

Testrun on fast server (2005)

Best wishes,
Phil Factor

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 14

So are we just saying it depends?

Seriously, if this can be boiled down to performance characteristics tied to a matrix of variables, it would be a valuable resource to help select which technique(s) to use. Developers could look at the characteristics, including volume , of the data to be parsed and pick an appropriate technique. Phil, I've never had to parse anything like Moby Dick, but I would not bet the ranch that I will never be asked to.

Jeff: How's the book coming along. ?

Flo: I love the RE-double. I have a feeling that learning to nest Cross Apply queries is going to prove valuable in other areas, as is knowing when to TOP().

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Phil Factor SSC-Insane Points: 20254 More actions · Answer 15

Phil, I've never had to parse anything like Moby Dick, but I would not bet the ranch that I will never be asked to.

Most data feeds are a bit more regular, I'll admit, but then it is great for testing text parsing routines. I've had to parse larger strings than this before now. (e.g. daily feed from a telecommunications switch) though I've usually used BCP to do the line-splitting for me. I've been more interested in doing this sort of thing for website files for fast indexing, recently.

Best wishes,
Phil Factor