String splitter to table weird result

  • .."As we've seen on dozens of posts, an XML splitter would not have performed so well."..

    Jeff you have seen the yahoo .csv link I posted in this thread.

    I too investigated XML transfer of OHLCV data.

    So you Say that plain TEXT splitting is fast than XML splitting into a DateOHLCV table.

    Interesting..

  • Digs (5/2/2010)


    .."As we've seen on dozens of posts, an XML splitter would not have performed so well."..

    Jeff you have seen the yahoo .csv link I posted in this thread.

    I too investigated XML transfer of OHLCV data.

    So you Say that plain TEXT splitting is fast than XML splitting into a DateOHLCV table.

    Interesting..

    If I understand you correctly, you're asking if I think that plain text splitting is faster than shredding properly formed XML. My answer is... I don't know... I've not had to worry about XML downloads.

    What I was saying is that a Tally table splitter is usually faster than using FOR XML to do the split on CSV text for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You can also tackle this problem using OPENROWSET BULK.

    The data is first referenced using OPENROWSET BULK via the CTE cteFileData. This uses the unstructured format file DataSampleBulk.fmt which has a single field called BulkColumn defined as SQLCHAR 8000 and terminated with #

    The second CTE cteData then splits the file vertically using CROSS APPLY/Tally split method and in the same query reassembles the data into the 8 column components using PIVOT. The final SELECT on cteData resolves each column into the correct data type

    ;

    WITH cteFileData AS

    (

    SELECT BulkColumn FROM OPENROWSET (BULK 'C:\DataSample.txt', FORMATFILE = 'C:\DataSampleBulk.fmt') AS X

    )

    ,

    cteData AS

    (

    SELECT [1],[2],[3],[4],[5],[6],[7],[8] FROM cteFileData

    CROSS APPLY

    (

    SELECT [1],[2],[3],[4],[5],[6],[7],[8] FROM

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS ROW,

    SUBSTRING(BulkColumn + ',', N, CHARINDEX(',', BulkColumn + ',', N) - N) AS Value

    FROM master.dbo.Tally

    WHERE N < LEN(BulkColumn) + 2 AND SUBSTRING(',' + BulkColumn + ',', N, 1) = ','

    ) AS Z

    PIVOT

    (

    MAX(Value) FOR ROW IN

    (

    [1],[2],[3],[4],[5],[6],[7],[8]

    )

    )

    AS pvt

    )

    AS Y

    )

    SELECT

    CONVERT(VARCHAR(20), [1]) AS COL1,

    CONVERT(DATETIME, [2]) AS COL2,

    CONVERT(DECIMAL(9,2), [3]) AS COL3,

    CONVERT(DECIMAL(9,2), [4]) AS COL4,

    CONVERT(DECIMAL(9,2), [5]) AS COL5,

    CONVERT(DECIMAL(9,2), [6]) AS COL6,

    CONVERT(BIGINT, [7]) AS COL7,

    CONVERT(DECIMAL(9,2), [8]) AS COL8

    FROM cteData

  • WayneS (5/2/2010)

    Jeff must be slipping :-D... usually when he gets involved, minutes go down to milliseconds! :w00t:

    And then the $$$ to secure his services goes way up when that happens. Maybe he's simply trying to draw him into his lair before pouncing??? :w00t:

    -- You can't be late until you show up.

  • Digs (5/2/2010)


    .."As we've seen on dozens of posts, an XML splitter would not have performed so well."..

    Jeff you have seen the yahoo .csv link I posted in this thread.

    I too investigated XML transfer of OHLCV data.

    So you Say that plain TEXT splitting is fast than XML splitting into a DateOHLCV table.

    Interesting..

    Sorry for resurrecting a post that's more than a year old... I can finally answer this with some good bit of authority... please see the following article... particularly the comparison charts.

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply