Split Multiple Times

  • Hello

    I have a simple problem, that i can solve easily with help of cursors, but i want to know if anyone knows a better way.

    I have a large piece of text that comes from a text file, in that file i have several lines (new lines separator), then in each line i have the data separated by tabs, and in this data i have two "fields" that i need to split by the char "\".

    I have the split function working, and like i said, i can use cursors to do the job, but i was thinking if i can achieve the same thing with a better approach.

    Sample Data, contents of the file:

    123User3A\User3BTeamAMainOtherData1SerialA\SerialB01:10:35:256P1

    124User4A\User4BTeamBMainOtherData2SerialC\SerialD01:10:36:256P1

    125User5A\User5BTeamCMainFOtherData3SerialE\SerialF01:10:37:256P1

    126User6A\User6BTeamDEliteOtherData4SerialAAA\SerialBBBB01:10:38:256P1

    The final result should be a table, where the rows will be the lines, and the fields will be the columns, and every line is duplicated (the \ separator):

    123 User3A TeamA Main OtherData1 SerialA 01:10:35:256 P1

    123 User3B TeamA Main OtherData1 SerialB 01:10:35:256 P1

    ...

    THanks

  • rootfixxxer (3/14/2016)


    Hello

    I have a simple problem, that i can solve easily with help of cursors, but i want to know if anyone knows a better way.

    I have a large piece of text that comes from a text file, in that file i have several lines (new lines separator), then in each line i have the data separated by tabs, and in this data i have two "fields" that i need to split by the char "\".

    I have the split function working, and like i said, i can use cursors to do the job, but i was thinking if i can achieve the same thing with a better approach.

    Sample Data, contents of the file:

    123User3A\User3BTeamAMainOtherData1SerialA\SerialB01:10:35:256P1

    124User4A\User4BTeamBMainOtherData2SerialC\SerialD01:10:36:256P1

    125User5A\User5BTeamCMainFOtherData3SerialE\SerialF01:10:37:256P1

    126User6A\User6BTeamDEliteOtherData4SerialAAA\SerialBBBB01:10:38:256P1

    The final result should be a table, where the rows will be the lines, and the fields will be the columns, and every line is duplicated (the \ separator):

    123 User3A TeamA Main OtherData1 SerialA 01:10:35:256 P1

    123 User3B TeamA Main OtherData1 SerialB 01:10:35:256 P1

    ...

    THanks

    For the user column, will there always be exactly two users separated by sloshes? Same question for the serial column but need to know for sure that the position of the serials will always match the position of the users.

    Also, please post the code that you're using for your splitter. Most folks have one that doesn't perform so well and we can help there, as well.

    Last but not least, please post the CREATE TABLE code for your final destination table. It'll help.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I would suggest investigating using a numbers or tally table. This search, https://www.bing.com/search?q=Number+or+Tally+table, retunrs a bunch of posts about what a numbers or tally table is and how to use it.

  • This "works". It uses Jeff's DelimitedSplit8K function for all the splits (found here[/url]). If the answer to Jeff's question about "sloshes" 😛 is no, then this won't be sufficient.

    DECLARE

    @String VARCHAR(1000) = '123' + CHAR(9) + 'User3A\User3B' + CHAR(9) + 'TeamA' + CHAR(9) + 'Main' + CHAR(9) + 'OtherData1' + CHAR(9) + 'SerialA\SerialB' +

    CHAR(9) + '01:10:35:256' + CHAR(9) + 'P1' + CHAR(10) + '124' + CHAR(9) + 'User3A\User3B' + CHAR(9) + 'TeamA' + CHAR(9) + 'Main' + CHAR(9) + 'OtherData1' +

    CHAR(9) + 'SerialA\SerialB' + CHAR(9) + '01:10:35:256' + CHAR(9) + 'P1' + CHAR(10),

    @Delimiter CHAR(1) = CHAR(10),

    @Delimiter2 CHAR(1) = CHAR(9),

    @Delimiter3 CHAR(1) = '\';

    WITH unpivotedData

    AS (

    SELECT

    DSK.ItemNumber,

    ROW_NUMBER() OVER(PARTITION BY DSK.ItemNumber ORDER BY DSK.ItemNumber, DSK2.ItemNumber, DSK3.Item) AS ID,

    DSK3.Item

    FROM

    master.dbo.DelimitedSplit8K(@String, @Delimiter) AS DSK

    CROSS APPLY master.dbo.DelimitedSplit8K(DSK.Item, @Delimiter2) AS DSK2

    CROSS APPLY master.dbo.DelimitedSplit8K(DSK2.Item, @Delimiter3) AS DSK3

    WHERE

    DSK.Item <> ''

    )

    SELECT

    PivotTable.[1],

    PivotTable.[2],

    PivotTable.[3],

    PivotTable.[4],

    PivotTable.[5],

    PivotTable.[6],

    PivotTable.[7],

    PivotTable.[8],

    PivotTable.[9],

    PivotTable.[10]

    FROM

    (

    SELECT

    *

    FROM

    unpivotedData

    ) AS P PIVOT( MAX(Item) FOR ID IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10]) ) AS PivotTable

    ;

    There probably is a better way, and I'd bet on Jeff to come back with it when you answer his question about "sloshes"

  • I would upload the file using bulk insert or something similar to avoid the pivot and the additional split. But I was waiting for a reply from the OP.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (3/14/2016)


    I would upload the file using bulk insert or something similar to avoid the pivot and the additional split. But I was waiting for a reply from the OP.

    There ya go reading my mind again. 😀 Nice little format file and this get's a whole lot simpler. That's why I asked the questions I did... need to have a stable format to use a format file on this.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jack Corbett (3/14/2016)


    This "works". It uses Jeff's DelimitedSplit8K function for all the splits (found here[/url]). If the answer to Jeff's question about "sloshes" 😛 is no, then this won't be sufficient.

    DECLARE

    @String VARCHAR(1000) = '123' + CHAR(9) + 'User3A\User3B' + CHAR(9) + 'TeamA' + CHAR(9) + 'Main' + CHAR(9) + 'OtherData1' + CHAR(9) + 'SerialA\SerialB' +

    CHAR(9) + '01:10:35:256' + CHAR(9) + 'P1' + CHAR(10) + '124' + CHAR(9) + 'User3A\User3B' + CHAR(9) + 'TeamA' + CHAR(9) + 'Main' + CHAR(9) + 'OtherData1' +

    CHAR(9) + 'SerialA\SerialB' + CHAR(9) + '01:10:35:256' + CHAR(9) + 'P1' + CHAR(10),

    @Delimiter CHAR(1) = CHAR(10),

    @Delimiter2 CHAR(1) = CHAR(9),

    @Delimiter3 CHAR(1) = '\';

    WITH unpivotedData

    AS (

    SELECT

    DSK.ItemNumber,

    ROW_NUMBER() OVER(PARTITION BY DSK.ItemNumber ORDER BY DSK.ItemNumber, DSK2.ItemNumber, DSK3.Item) AS ID,

    DSK3.Item

    FROM

    master.dbo.DelimitedSplit8K(@String, @Delimiter) AS DSK

    CROSS APPLY master.dbo.DelimitedSplit8K(DSK.Item, @Delimiter2) AS DSK2

    CROSS APPLY master.dbo.DelimitedSplit8K(DSK2.Item, @Delimiter3) AS DSK3

    WHERE

    DSK.Item <> ''

    )

    SELECT

    PivotTable.[1],

    PivotTable.[2],

    PivotTable.[3],

    PivotTable.[4],

    PivotTable.[5],

    PivotTable.[6],

    PivotTable.[7],

    PivotTable.[8],

    PivotTable.[9],

    PivotTable.[10]

    FROM

    (

    SELECT

    *

    FROM

    unpivotedData

    ) AS P PIVOT( MAX(Item) FOR ID IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10]) ) AS PivotTable

    ;

    Nicely done, Jack!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Posted at the end of the day yesterday...

    Thanks everyone for all the replies.

    @Jeff Moden

    Yes, it always have two names separated by sloshes like the serial.

    The question about the position, i didn't understand what do you mean about the "match postion", in the file, it always be in the same "column", the size it may var, from 3 chars to 10 chars for each one of the values in the same row.

    The split function:

    CREATE FUNCTION [dbo].[SplitString]

    (

    @string NVARCHAR(MAX),

    @delimiter CHAR(1)

    )

    RETURNS @output TABLE(id INT,splitdata NVARCHAR(MAX)

    )

    BEGIN

    DECLARE @count INT

    DECLARE @start INT, @end INT

    SELECT @start = 1, @end = CHARINDEX(@delimiter, @string)

    SET @count = 1

    WHILE @start < LEN(@string) + 1 BEGIN

    IF @end = 0

    SET @end = LEN(@string) + 1

    INSERT INTO @output (id,splitdata)

    VALUES(@count,SUBSTRING(@string, @start, @end - @start))

    SET @count = @count + 1

    SET @start = @end + 1

    SET @end = CHARINDEX(@delimiter, @string, @start)

    END

    RETURN

    END

    The destination table, it's really a temporary table, that will be used to update other tables...

    CREATE TABLE [dbo].[RegTimesTemp](

    [Num] [INT] NOT NULL,

    [Name1] [NVARCHAR](50) NULL,

    [Name2] [NVARCHAR](50) NULL,

    [Team] [NVARCHAR](50) NULL,

    [TypeG] [NVARCHAR](50) NULL,

    [OtherData] [NVARCHAR](50) NULL,

    [Serial1] [NVARCHAR](50) NULL,

    [Serial2] [NVARCHAR](50) NULL,

    [TimeMS] [BIGINT] NULL,

    [Location] [NCHAR](2) NULL,

    [CurrentDate] [SMALLDATETIME] NULL

    )

    @Jack Corbett

    Thanks for the code.

    I'll take a look later.

    The files are small files, 300/400 lines top, the database it's only for this problem, so the performance isn't "critical"...

    I was thinking in doing all the data handling in the side of the database, but then i thought that maybe isn't a good path... The files will be posted to a php script, and then the php script takes care of the process, so I'm measuring if i do the parse job in the script, and call individual or bulk insert in the database, or just, read the file and call a sp in database that does the parse/insert job...

    Thanks

  • Jeff Moden (3/14/2016)


    Nicely done, Jack!

    Thanks Jeff, always nice to hear that from you.

    I've never used bcp or BULK INSERT because I'm an SSIS guy, but if Luis and Jeff recommend that route, I'd tend to think they are correct. I could also recommend using SSIS, but you'd have to use the Derived Column transform with SSIS expressions to split the 2 columns with the "\" and that is something I would consider advanced SSIS, so I stayed away from it.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply