Parse Data in SQL

Question

Post reply

Parse Data in SQL

sks_989

SSC Eights!

Points: 843
More actions
June 9, 2017 at 9:31 am

#338161

I have data into field (SS)(DD)(GG) i want parse that field into 3 different column. Is Substring work or there is other way i can parse this field.
Three field should be like this
Field 1 Field 2 Field 3
SS DD GG

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

MadAdmin SSChampion Points: 11261 More actions · Answer 1

Will there always be 3 columns?

Catch-all queries done right [/url]
Gail Shaw's Performance Blog[/url]

sks_989 SSC Eights! Points: 843 More actions · Answer 2

4 column max. it can be like that way (DD)(FFF)(GG)(HH) Field can have any character in ()

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 3

It depends... If your data is always in that format, then yes, the SUBSTRING function is the way to go.
If the number of characters between parens can vary or the number of sets can vary, then you may need to parsing things with a string splitting function.
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL DROP TABLE #TestData;

CREATE TABLE #TestData ( String CHAR(12) NOT NULL ); INSERT #TestData (String) VALUES ('(SS)(DD)(GG)')

SELECT td.String, cs.CleanString, Col1 = SUBSTRING(cs.CleanString, 1, 2), Col2 = SUBSTRING(cs.CleanString, 3, 2), Col3 = SUBSTRING(cs.CleanString, 5, 2) FROM #TestData td CROSS APPLY ( VALUES (REPLACE(REPLACE(td.String, '(', ''), ')', '')) ) cs (CleanString)

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 4

sks_989 - Friday, June 9, 2017 9:47 AM
4 column max. it can be like that way (DD)(FFF)(GG)(HH) Field can have any character in ()

If 4 groups is the max, then you can use the PARSENAME function.

IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL DROP TABLE #TestData;

CREATE TABLE #TestData ( String VARCHAR(20) NOT NULL ); INSERT #TestData (String) VALUES ('(SS)(DD)(GG)'), ('(DD)(FFF)(GG)(HH)');

SELECT td.String, cs.CleanString, Col1 = PARSENAME(cs.CleanString, 1), Col2 = PARSENAME(cs.CleanString, 2), Col3 = PARSENAME(cs.CleanString, 3), Col4 = PARSENAME(cs.CleanString, 4) FROM #TestData td CROSS APPLY ( VALUES (REPLACE(REPLACE(REPLACE(td.String, ')(', '.'), '(', ''), ')', '')) ) cs (CleanString);

sks_989 SSC Eights! Points: 843 More actions · Answer 5

sks_989

SSC Eights!

Points: 843

June 9, 2017 at 10:11 am

#1946220

Thanks Jason It worked.

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 6

sks_989 - Friday, June 9, 2017 10:11 AM
Thanks Jason It worked.

Good deal. Glad to help. 🙂

sks_989 SSC Eights! Points: 843 More actions · Answer 7

Jason if data is like this way (AS)(RT) output is coming as RT first and AS second. I want AS as first column

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 8

sks_989 - Friday, June 9, 2017 11:31 AM
Jason if data is like this way (AS)(RT) output is coming as RT first and AS second. I want AS as first column

DOH!!! I completely forgot about the PARSENAME working from right to left. Sorry about that...

Try it like this...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL DROP TABLE #TestData;

CREATE TABLE #TestData ( String VARCHAR(20) NOT NULL ); INSERT #TestData (String) VALUES ('(SS)(DD)(GG)'), ('(DD)(FFF)(GG)(HH)'), ('(AS)(RT)');

SELECT td.String, cs.CleanString, cc.ColumnCount, Col1 = PARSENAME(cs.CleanString, cc.ColumnCount), Col2 = PARSENAME(cs.CleanString, cc.ColumnCount -1), Col3 = PARSENAME(cs.CleanString, cc.ColumnCount -2), Col4 = PARSENAME(cs.CleanString, cc.ColumnCount -3) FROM #TestData td CROSS APPLY ( VALUES (REPLACE(REPLACE(REPLACE(td.String, ')(', '.'), '(', ''), ')', '')) ) cs (CleanString) CROSS APPLY ( VALUES (LEN(cs.CleanString) - LEN(REPLACE(cs.CleanString, '.', '')) + 1) ) cc (ColumnCount);

sks_989 SSC Eights! Points: 843 More actions · Answer 9

Thanks Jason. It worked now the way I want. Much Appreciated.

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 10

sks_989 - Friday, June 9, 2017 12:26 PM
Thanks Jason. It worked now the way I want. Much Appreciated.

No problem. Glad it worked for you.

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 11

Also be seriously aware that the return datatype for PARSENAME is NCHAR. It you're saving the output in a table somewhere, it will save trailing spaces even if the target column is NVARCHAR or VARCHAR and can cause serious bloat of the table and any indexes that the columns may have on them.

{EDIT} Scratch that. MS lists the return type as NCHAR and, if that were true, my warning above would be correct. See the following post...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 12

Here's the link for the current definition of PARSENAME.
https://docs.microsoft.com/en-us/sql/t-sql/functions/parsename-transact-sql

But Books Online and the link above is incorrect. PARSENAME actually returns an NVARCHAR(128), which is the same as the data type alias of "SYSNAME" (which also supposedly defaults to NOT NULL in tables and has a few other goodies associated with it which I'm not going to prove/disprove just now). Here's the proof. Don't forget that you have to divide the max data length by 2 to get the number of characters that can be used for Unicode datatypes like NVARCHAR.

DECLARE @OtherLimit INT = 6 ; WITH ctePreAgg AS ( SELECT product_name = CASE WHEN DENSE_RANK() OVER (ORDER BY SUM(total_sale) DESC) <= @OtherLimit THEN product_name ELSE 'Others' END ,total_sale = SUM(total_sale) FROM #values GROUP BY product_name ) SELECT product_name ,sum_total_sale = SUM(total_sale) FROM ctePreAgg GROUP BY product_name ORDER BY sum_total_sale DESC ;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jason A. Long SSC-Insane Points: 23790 More actions · Answer 13

It would appear that you are correct about it being an NVARCHAR(128).
DECLARE @String VARCHAR(50) = 'this.is.a.test';

IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL DROP TABLE #temp;

SELECT Col1 = PARSENAME(@String, 4), Col2 = PARSENAME(@String, 3), Col3 = PARSENAME(@String, 2), Col4 = PARSENAME(@String, 1) INTO #temp;

EXEC tempdb..sp_help #temp;
That said, I think the only time that actually needs to be of concern is if you're doing a "SELECT INTO" like above.
Otherwise, it's able to insert into a predefined table w/o any issues (at least none that I've noticed)...
DECLARE @String VARCHAR(50) = 'this.is.a.test';

IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL DROP TABLE #temp; GO CREATE TABLE #temp ( Col1 varchar(20), Col2 varchar(20), Col3 varchar(20), Col4 varchar(20) ); INSERT #temp (Col1,Col2,Col3,Col4) SELECT Col1 = PARSENAME(@String, 4), Col2 = PARSENAME(@String, 3), Col3 = PARSENAME(@String, 2), Col4 = PARSENAME(@String, 1);

SELECT Col1 = '''' + t.Col1 + '''', Col2 = '''' + t.Col2 + '''', Col3 = '''' + t.Col3 + '''', Col4 = '''' + t.Col4 + '''' FROM #temp t;

Since it's not actually a SYSNAME data type, it has no issues handling NULLs.
DECLARE @String VARCHAR(50) = 'This...Test';

SELECT Col1 = PARSENAME(@String, 4), Col2 = PARSENAME(@String, 3), Col3 = PARSENAME(@String, 2), Col4 = PARSENAME(@String, 1);

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 14

If it were an NCHAR, you would need to be seriously concerned no matter what. I ran into this problem before with some system tables (can't remember which ones just now). Someone was capturing some of the data and the table they were capturing the data in was growing explosively. Here's an example of why it was growing out of control and why I was so concerned with PARSENAME returning NCHAR (which we've discovered is not true).

Heh... perhaps this should be a "QOD". Before you run the following code, how many bytes will the single letter "A" consume in the table?
CREATE TABLE #MyHead ( SomeString NVARCHAR(300) ) ; DECLARE @SomeString NCHAR(128) = 'A'; INSERT INTO #MyHead (SomeString) SELECT SomeString = @SomeString ; SELECT SomeString ,LengthInBytes = DATALENGTH(SomeString) FROM #MyHead ;

p.s. Don't forget about the "overhead".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)