Split Multiple Times

Question

Split Multiple Times

rootfixxxer

SSCrazy

Points: 2918
More actions
March 14, 2016 at 7:55 am

#316947

Hello
I have a simple problem, that i can solve easily with help of cursors, but i want to know if anyone knows a better way.
I have a large piece of text that comes from a text file, in that file i have several lines (new lines separator), then in each line i have the data separated by tabs, and in this data i have two "fields" that i need to split by the char "\".
I have the split function working, and like i said, i can use cursors to do the job, but i was thinking if i can achieve the same thing with a better approach.
Sample Data, contents of the file:
123User3A\User3BTeamAMainOtherData1SerialA\SerialB01:10:35:256P1
124User4A\User4BTeamBMainOtherData2SerialC\SerialD01:10:36:256P1
125User5A\User5BTeamCMainFOtherData3SerialE\SerialF01:10:37:256P1
126User6A\User6BTeamDEliteOtherData4SerialAAA\SerialBBBB01:10:38:256P1
The final result should be a table, where the rows will be the lines, and the fields will be the columns, and every line is duplicated (the \ separator):
123 User3A TeamA Main OtherData1 SerialA 01:10:35:256 P1
123 User3B TeamA Main OtherData1 SerialB 01:10:35:256 P1
...
THanks

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 1

rootfixxxer (3/14/2016)
Hello
I have a simple problem, that i can solve easily with help of cursors, but i want to know if anyone knows a better way.
I have a large piece of text that comes from a text file, in that file i have several lines (new lines separator), then in each line i have the data separated by tabs, and in this data i have two "fields" that i need to split by the char "\".
I have the split function working, and like i said, i can use cursors to do the job, but i was thinking if i can achieve the same thing with a better approach.
Sample Data, contents of the file:
123User3A\User3BTeamAMainOtherData1SerialA\SerialB01:10:35:256P1
124User4A\User4BTeamBMainOtherData2SerialC\SerialD01:10:36:256P1
125User5A\User5BTeamCMainFOtherData3SerialE\SerialF01:10:37:256P1
126User6A\User6BTeamDEliteOtherData4SerialAAA\SerialBBBB01:10:38:256P1
The final result should be a table, where the rows will be the lines, and the fields will be the columns, and every line is duplicated (the \ separator):
123 User3A TeamA Main OtherData1 SerialA 01:10:35:256 P1
123 User3B TeamA Main OtherData1 SerialB 01:10:35:256 P1
...
THanks

For the user column, will there always be exactly two users separated by sloshes? Same question for the serial column but need to know for sure that the position of the serials will always match the position of the users.

Also, please post the code that you're using for your splitter. Most folks have one that doesn't perform so well and we can help there, as well.

Last but not least, please post the CREATE TABLE code for your final destination table. It'll help.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jack Corbett SSC Guru Points: 184394 More actions · Answer 2

I would suggest investigating using a numbers or tally table. This search, https://www.bing.com/search?q=Number+or+Tally+table, retunrs a bunch of posts about what a numbers or tally table is and how to use it.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Jack Corbett SSC Guru Points: 184394 More actions · Answer 3

This "works". It uses Jeff's DelimitedSplit8K function for all the splits (found here[/url]). If the answer to Jeff's question about "sloshes" 😛 is no, then this won't be sufficient.

DECLARE

@String VARCHAR(1000) = '123' + CHAR(9) + 'User3A\User3B' + CHAR(9) + 'TeamA' + CHAR(9) + 'Main' + CHAR(9) + 'OtherData1' + CHAR(9) + 'SerialA\SerialB' +

CHAR(9) + '01:10:35:256' + CHAR(9) + 'P1' + CHAR(10) + '124' + CHAR(9) + 'User3A\User3B' + CHAR(9) + 'TeamA' + CHAR(9) + 'Main' + CHAR(9) + 'OtherData1' +

CHAR(9) + 'SerialA\SerialB' + CHAR(9) + '01:10:35:256' + CHAR(9) + 'P1' + CHAR(10),

@Delimiter CHAR(1) = CHAR(10),

@Delimiter2 CHAR(1) = CHAR(9),

@Delimiter3 CHAR(1) = '\';

WITH unpivotedData

AS (

SELECT

DSK.ItemNumber,

ROW_NUMBER() OVER(PARTITION BY DSK.ItemNumber ORDER BY DSK.ItemNumber, DSK2.ItemNumber, DSK3.Item) AS ID,

DSK3.Item

FROM

master.dbo.DelimitedSplit8K(@String, @Delimiter) AS DSK

CROSS APPLY master.dbo.DelimitedSplit8K(DSK.Item, @Delimiter2) AS DSK2

CROSS APPLY master.dbo.DelimitedSplit8K(DSK2.Item, @Delimiter3) AS DSK3

WHERE

DSK.Item <> ''

)

SELECT

PivotTable.[1],

PivotTable.[2],

PivotTable.[3],

PivotTable.[4],

PivotTable.[5],

PivotTable.[6],

PivotTable.[7],

PivotTable.[8],

PivotTable.[9],

PivotTable.[10]

FROM

(

SELECT

*

FROM

unpivotedData

) AS P PIVOT( MAX(Item) FOR ID IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10]) ) AS PivotTable

;

There probably is a better way, and I'd bet on Jeff to come back with it when you answer his question about "sloshes"

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Luis Cazares SSC Guru Points: 183706 More actions · Answer 4

I would upload the file using bulk insert or something similar to avoid the pivot and the additional split. But I was waiting for a reply from the OP.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 5

Luis Cazares (3/14/2016)
I would upload the file using bulk insert or something similar to avoid the pivot and the additional split. But I was waiting for a reply from the OP.

There ya go reading my mind again. 😀 Nice little format file and this get's a whole lot simpler. That's why I asked the questions I did... need to have a stable format to use a format file on this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 6

Jack Corbett (3/14/2016)
This "works". It uses Jeff's DelimitedSplit8K function for all the splits (found here[/url]). If the answer to Jeff's question about "sloshes" 😛 is no, then this won't be sufficient.
DECLARE
@String VARCHAR(1000) = '123' + CHAR(9) + 'User3A\User3B' + CHAR(9) + 'TeamA' + CHAR(9) + 'Main' + CHAR(9) + 'OtherData1' + CHAR(9) + 'SerialA\SerialB' +
CHAR(9) + '01:10:35:256' + CHAR(9) + 'P1' + CHAR(10) + '124' + CHAR(9) + 'User3A\User3B' + CHAR(9) + 'TeamA' + CHAR(9) + 'Main' + CHAR(9) + 'OtherData1' +
CHAR(9) + 'SerialA\SerialB' + CHAR(9) + '01:10:35:256' + CHAR(9) + 'P1' + CHAR(10),
@Delimiter CHAR(1) = CHAR(10),
@Delimiter2 CHAR(1) = CHAR(9),
@Delimiter3 CHAR(1) = '\';
WITH unpivotedData
AS (
SELECT
DSK.ItemNumber,
ROW_NUMBER() OVER(PARTITION BY DSK.ItemNumber ORDER BY DSK.ItemNumber, DSK2.ItemNumber, DSK3.Item) AS ID,
DSK3.Item
FROM
master.dbo.DelimitedSplit8K(@String, @Delimiter) AS DSK
CROSS APPLY master.dbo.DelimitedSplit8K(DSK.Item, @Delimiter2) AS DSK2
CROSS APPLY master.dbo.DelimitedSplit8K(DSK2.Item, @Delimiter3) AS DSK3
WHERE
DSK.Item <> ''
)
SELECT
PivotTable.[1],
PivotTable.[2],
PivotTable.[3],
PivotTable.[4],
PivotTable.[5],
PivotTable.[6],
PivotTable.[7],
PivotTable.[8],
PivotTable.[9],
PivotTable.[10]
FROM
(
SELECT
*
FROM
unpivotedData
) AS P PIVOT( MAX(Item) FOR ID IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10]) ) AS PivotTable
;

Nicely done, Jack!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

rootfixxxer SSCrazy Points: 2918 More actions · Answer 7

Posted at the end of the day yesterday...

Thanks everyone for all the replies.

@Jeff Moden

Yes, it always have two names separated by sloshes like the serial.

The question about the position, i didn't understand what do you mean about the "match postion", in the file, it always be in the same "column", the size it may var, from 3 chars to 10 chars for each one of the values in the same row.

The split function:

CREATE FUNCTION [dbo].[SplitString]

(

@string NVARCHAR(MAX),

@delimiter CHAR(1)

)

RETURNS @output TABLE(id INT,splitdata NVARCHAR(MAX)

)

BEGIN

DECLARE @count INT

DECLARE @start INT, @end INT

SELECT @start = 1, @end = CHARINDEX(@delimiter, @string)

SET @count = 1

WHILE @start < LEN(@string) + 1 BEGIN

IF @end = 0

SET @end = LEN(@string) + 1

INSERT INTO @output (id,splitdata)

VALUES(@count,SUBSTRING(@string, @start, @end - @start))

SET @count = @count + 1

SET @start = @end + 1

SET @end = CHARINDEX(@delimiter, @string, @start)

END

RETURN

END

The destination table, it's really a temporary table, that will be used to update other tables...

CREATE TABLE [dbo].[RegTimesTemp](

[Num] [INT] NOT NULL,

[Name1] [NVARCHAR](50) NULL,

[Name2] [NVARCHAR](50) NULL,

[Team] [NVARCHAR](50) NULL,

[TypeG] [NVARCHAR](50) NULL,

[OtherData] [NVARCHAR](50) NULL,

[Serial1] [NVARCHAR](50) NULL,

[Serial2] [NVARCHAR](50) NULL,

[TimeMS] [BIGINT] NULL,

[Location] [NCHAR](2) NULL,

[CurrentDate] [SMALLDATETIME] NULL

)

@Jack Corbett

Thanks for the code.

I'll take a look later.

The files are small files, 300/400 lines top, the database it's only for this problem, so the performance isn't "critical"...

I was thinking in doing all the data handling in the side of the database, but then i thought that maybe isn't a good path... The files will be posted to a php script, and then the php script takes care of the process, so I'm measuring if i do the parse job in the script, and call individual or bulk insert in the database, or just, read the file and call a sp in database that does the parse/insert job...

Thanks

Jack Corbett SSC Guru Points: 184394 More actions · Answer 8

Jeff Moden (3/14/2016)
Nicely done, Jack!

Thanks Jeff, always nice to hear that from you.

I've never used bcp or BULK INSERT because I'm an SSIS guy, but if Luis and Jeff recommend that route, I'd tend to think they are correct. I could also recommend using SSIS, but you'd have to use the Derived Column transform with SSIS expressions to split the 2 columns with the "\" and that is something I would consider advanced SSIS, so I stayed away from it.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question