Pipe Delimmeted Data To Shred Into A Table

Question

Pipe Delimmeted Data To Shred Into A Table

xxxy2k

Valued Member

Points: 72
More actions
November 6, 2015 at 5:19 am

#309553

Hi,
Need help to shred pipe delimmited data into a table, cant seem to find anything that will help me do this.
for example the data looks roughly like this
234|2015-6-11|234aa|1
ID1|a|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1543211
ID2|B|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1234
I have approximatly 200 rows of these type of lines of data can anyone provide a potential awnser that can help me add each column into the same row of a temp table

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004329 More actions · Answer 1

xxxy2k (11/6/2015)
Hi,
Need help to shred pipe delimmited data into a table, cant seem to find anything that will help me do this.
for example the data looks roughly like this
234|2015-6-11|234aa|
ID1|a|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1543211
ID2|B|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1234
I have approximatly 200 rows of these type of lines of data can anyone provide a potential awnser that can help me add each column into the same row of a temp table

Considering that the first row only has 4 elements (one of them is a trailing empty element) and the rest of more than that, what do you want to do about such a disparity in the number of elements?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

xxxy2k Valued Member Points: 72 More actions · Answer 2

The sample of data im using is test data, i wouldnt get hung up about the quality of the data. I've amended the data you highlighted to explain the data further first line is a header row, 2nd row of data and onwards is the actual data.

lnardozi 61862 SSCommitted Points: 1931 More actions · Answer 3

There's two parts to this. First is getting it into SQL, which you can do like this: (props to whoever provided this sample. it didn't say)

DECLARE @FileContents VARCHAR(MAX)

SELECT @FileContents=BulkColumn

FROM OPENROWSET(BULK'PathToYourFile.sql',SINGLE_BLOB) x;

Second part is just use Moden's string splitter and insert the result into a table.

xxxy2k Valued Member Points: 72 More actions · Answer 4

Final peice will be something like this

select

dbo.GetDomain(t.test,'|',0) as col2,

dbo.GetDomain(t.test,'|',1) as col3,

dbo.GetDomain(t.test,'|',2) as col4

from #test t

go

Jeff Moden SSC Guru Points: 1004329 More actions · Answer 5

xxxy2k (11/13/2015)
Final peice will be something like this
select
dbo.GetDomain(t.test,'|',0) as col2,
dbo.GetDomain(t.test,'|',1) as col3,
dbo.GetDomain(t.test,'|',2) as col4
from #test t
go

Please post your "GetDomain" function so that I can show you why you shouldn't actually be doing it that way. 😉

Also, you said...

The sample of data im using is test data, i wouldn't get hung up about the quality of the data.

That's totally obvious. I just wanted to know what you wanted to do with that first row in the data because you never explicitly stated what you want to do with it. Do you just want to ignore it? This is a fairly easy problem. I just need to know the answer to that question.

And, again, the reason why I want to see your "GetDomain" function is to show you why you don't want to do such a thing in the future especially if you ever have a large amount of data to do this to.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

xxxy2k Valued Member Points: 72 More actions · Answer 6

If i could ignore it then great if not i would just filter it out in the where clause once in a table.

The getdomain is a function see below

create function dbo.GetDomain(

@source varchar(1024),

@delimiter varchar(10),

@domain int

) returns varchar(1024) as begin

declare @returnValue varchar(1024)

declare @workingOn int

declare @length int

set @workingOn=0

while @workingOn<@domain begin

set @source=substring(@source,charindex(@delimiter,@source)+1,1024)

set @workingOn+=1

end

set @length=charindex(@delimiter,@source)

set @returnValue=substring(@source,1,case when @length=0 then 1024 else @length-1 end)

return @returnValue

end

GO

Jeff Moden SSC Guru Points: 1004329 More actions · Answer 7

Ok... first things first. Part of the reason why you've gotten no actual code to help you for the split portion of this problem is because you haven't posted readily consumable test data. Folks like to test their stuff before posting it and I'm no exception. I've converted your test data to readily consumable data to demonstrate with because you're new. I strongly recommend that you read the article at the first link in my signature line below under "Helpful Links" for future posts. You get tested, coded replies that way.

Thank you for posting your GetDomain splitter. As I suspected, it contains a slow While Loop to do the splits, not to mention that it's a Scalar Function rather than a high performance iTVF (Inline Table Valued Function).

Without further ado, take a trip over to the following article and get yourself a copy of the "DelimitedSplit8K" function from the "Resources" section near the bottom of the article. Once you've setup that function, this problem becomes child's play and will only be beat for performance by a CLR in the pre-2012 world.

http://www.sqlservercentral.com/articles/Tally+Table/72993/

Here's the code for a solution. Change column names and the table names to suit you. The result is stored in a Temp Table that's created on-the-fly. It also assumes the "IDx" at the beginning of each row of data is going to be unique. If it's not, there's a fix we can apply for that.

And, yes, I included a test table like you should in the future. You'll get tested answers much more quickly that way.

--===== Create the test table.

-- This is NOT a part of the solution. We're just creating test data here.

IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL

DROP TABLE #TestTable

;

SELECT *

INTO #TestTable

FROM (

SELECT '234|2015-6-11|234aa|1' UNION ALL

SELECT 'ID1|a|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1543211' UNION ALL

SELECT 'ID2|B|A||234|ASDFASDF|2015-06-11|A|s||NULL|||||TEST|TEST|TEST|TEST|1234'

) d (PipeData)

;

--===== This does the split and repivot as well as creating and populating

-- the desired Temp Table on-the-fly.

SELECT C01 = SUBSTRING(tt.PipeData,1,CHARINDEX('|',tt.PipeData)-1)

,C02 = MAX(CASE WHEN split.ItemNumber = 2 THEN split.Item ELSE '' END)

,C03 = MAX(CASE WHEN split.ItemNumber = 3 THEN split.Item ELSE '' END)

,C04 = MAX(CASE WHEN split.ItemNumber = 4 THEN split.Item ELSE '' END)

,C05 = MAX(CASE WHEN split.ItemNumber = 5 THEN split.Item ELSE '' END)

,C06 = MAX(CASE WHEN split.ItemNumber = 6 THEN split.Item ELSE '' END)

,C07 = MAX(CASE WHEN split.ItemNumber = 7 THEN split.Item ELSE '' END)

,C08 = MAX(CASE WHEN split.ItemNumber = 8 THEN split.Item ELSE '' END)

,C09 = MAX(CASE WHEN split.ItemNumber = 9 THEN split.Item ELSE '' END)

,C10 = MAX(CASE WHEN split.ItemNumber = 10 THEN split.Item ELSE '' END)

,C11 = MAX(CASE WHEN split.ItemNumber = 11 THEN split.Item ELSE '' END)

,C12 = MAX(CASE WHEN split.ItemNumber = 12 THEN split.Item ELSE '' END)

,C13 = MAX(CASE WHEN split.ItemNumber = 13 THEN split.Item ELSE '' END)

,C14 = MAX(CASE WHEN split.ItemNumber = 14 THEN split.Item ELSE '' END)

,C15 = MAX(CASE WHEN split.ItemNumber = 15 THEN split.Item ELSE '' END)

,C16 = MAX(CASE WHEN split.ItemNumber = 16 THEN split.Item ELSE '' END)

,C17 = MAX(CASE WHEN split.ItemNumber = 17 THEN split.Item ELSE '' END)

,C18 = MAX(CASE WHEN split.ItemNumber = 18 THEN split.Item ELSE '' END)

,C19 = MAX(CASE WHEN split.ItemNumber = 19 THEN split.Item ELSE '' END)

,C20 = MAX(CASE WHEN split.ItemNumber = 20 THEN split.Item ELSE '' END)

INTO #SplitResult

FROM #TestTable tt

CROSS APPLY dbo.DelimitedSplit8k(tt.PipeData,'|') split

WHERE tt.PipeData LIKE 'ID%'

GROUP BY SUBSTRING(tt.PipeData,1,CHARINDEX('|',tt.PipeData)-1)

;

--===== Show that our data has been correctly split

SELECT * FROM #SplitResult

;

The "pivot" method that uses MAX(CASE)) that I used is an ancient but very fast method known as a CROSS TAB. You can read about that at the following link.

http://www.sqlservercentral.com/articles/T-SQL/63681/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)