Parsing a cell into multiple rows

Question

Parsing a cell into multiple rows

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Anam Verma SSCertifiable Points: 5365 More actions · Answer 1

I agree with Dave's logic. You have to manually write a program to parse this file and then store the required text from file into a table.

i did parsing of complete HTML file in past where i have to extract information and then store it in table, do some database processing on info and then store the resultant into HTML page at its desired location. This is complete db programming. It took me a while to think and work around logic.

Best of luck mate.

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 2

wdillon (5/5/2008)
Hi Jeff
Hum, you are right, it's pretty messed up. I'll attach a zip file of the orginal and see if that works.
Thanks
Bill

Thanks... that's a lot better.

Suggestion on using b an /b are good but not 100%... the marks are inconsistent. Fortunately, they can be made consistent... and then we can hit this with a "simple" 3 dimensional split.

I'm working on it... just to confirm, you ARE using SQL Server 2005 for sure, right?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 3

Like I said, the data needs to be made more consistent... and the code below does that. About all you need to do is change the path to the file.

Fun exercise... thanks...

[font="Courier New"]

--===== Import the data into a temporary working table

IF OBJECT_ID('TempDB..#RawData','U') IS NOT NULL

DROP TABLE #RawData

CREATE TABLE #RawData (RawData VARCHAR(8000))

BULK INSERT #RawData

FROM 'C:\Temp\EastLake\EastLake.csv' --SQL Server needs to be able to "see" where the file is at

WITH (DATAFILETYPE = 'WIDECHAR', --It IS a unicode file

MAXERRORS = 1000000) --Forget errors... load everything

--===== Precondition the raw data to make it consistent

-- These are order sensitive

UPDATE #RawData SET RawData = REPLACE(RawData,' ',' ')

UPDATE #RawData SET RawData = REPLACE(RawData,' Check In:','Check In:')

UPDATE #RawData SET RawData = REPLACE(RawData,' <','<')

UPDATE #RawData SET RawData = REPLACE(RawData,': ',':')

UPDATE #RawData SET RawData = REPLACE(RawData,' Total Cost:','Total Cost:')

UPDATE #RawData SET RawData = REPLACE(RawData,'> ','>')

UPDATE #RawData SET RawData = REPLACE(RawData,' ',' ')

UPDATE #RawData SET RawData = REPLACE(RawData,' ','')

UPDATE #RawData SET RawData = REPLACE(RawData,' Tax:','Tax:')

UPDATE #RawData SET RawData = REPLACE(RawData,'Total Cost:','Total Price:')

UPDATE #RawData SET RawData = REPLACE(RawData,'','~')

UPDATE #RawData SET RawData = REPLACE(RawData,'','~')

UPDATE #RawData SET RawData = REPLACE(RawData,' ','|')

UPDATE #RawData SET RawData = REPLACE(RawData,'~~|~','~')

--===== Do the "impossible" split

;WITH

cteSplit1 AS

(--==== This splits reservations that have more than one Item per line

SELECT ReservationID,

ItemNumber = ROW_NUMBER() OVER (PARTITION BY ReservationID ORDER BY ReservationID),

Split1

FROM (

SELECT ReservationID = CAST(LEFT(RawData,CHARINDEX(',',rd.RawData)-1) AS INT),

FROM #RawData rd,

dbo.Tally t

WHERE t.N < LEN('|'+rd.RawData+'|')

AND t.N > CHARINDEX(',',rd.RawData)

AND SUBSTRING('|'+rd.RawData+'|',t.N,1) = '|'

)d

)

,

cteSplit2 AS

(--==== This splits the individual Items from each line with a couple of "helper" columns.

-- SubItem let's us join back to the cte to get matching pairs

SELECT ReservationID,

ItemNumber,

SubItem = ROW_NUMBER() OVER (PARTITION BY ReservationID,ItemNumber ORDER BY ReservationID,ItemNumber),

HalfItem = (ROW_NUMBER() OVER (PARTITION BY ReservationID,ItemNumber ORDER BY ReservationID,ItemNumber)-1)%2,

Split2 = SUBSTRING(s1.Split1+'~',t.N+1,CHARINDEX('~',s1.Split1+'~',t.N+1)-t.N-1)

FROM cteSplit1 s1,

dbo.Tally t

WHERE t.N < LEN(s1.Split1+'~')

AND SUBSTRING(s1.Split1+'~',t.N,1) = '~'

)

--===== This reassembles the data into what we'd expect as a table

SELECT c.ReservationID,

c.ItemNumber,

[Service Type] = MAX(CASE c.Split2

WHEN 'Cabin:' THEN 'Cabin'

WHEN 'Dock:' THEN 'Dock'

WHEN 'Pontoon:' THEN 'Pontoon'

WHEN 'RV Site:' THEN 'RV Site'

END),

[Service Name] = MAX(CASE c.Split2

WHEN 'Cabin:' THEN v.Split2

WHEN 'Dock:' THEN v.Split2

WHEN 'Pontoon:' THEN v.Split2

WHEN 'RV Site:' THEN v.Split2

END),

[Check In] = MAX(CASE

WHEN c.Split2 = 'Check In:'

AND ISDATE(v.Split2) = 1

THEN CAST(v.Split2 AS DATETIME)

END),

[Check OUT] = MAX(CASE

WHEN c.Split2 = 'Check Out:'

AND ISDATE(v.Split2) = 1

THEN CAST(v.Split2 AS DATETIME)

END),

Price = MAX(CASE

WHEN c.Split2 = 'Price:'

THEN CAST(v.Split2 AS MONEY)

END),

Tax = MAX(CASE

WHEN c.Split2 = 'Tax:'

THEN CAST(v.Split2 AS MONEY)

END),

Total = MAX(CASE

WHEN c.Split2 = 'Total:'

THEN CAST(v.Split2 AS MONEY)

END),

[Total Price] = MAX(CASE

WHEN c.Split2 = 'Total Price:'

THEN CAST(v.Split2 AS MONEY)

END)

FROM cteSplit2 c --This provides the column name part of paired values

INNER JOIN

cteSplit2 v --This provides the value part of paired values

ON c.ReservationID = v.ReservationID

AND c.ItemNumber = v.ItemNumber

AND c.SubItem+1 = v.SubItem --Value always follows the column name

WHERE c.HalfItem = 0 --Identifies Column name part of paired values

GROUP BY c.ReservationID,

c.ItemNumber

ORDER BY c.ReservationID,

c.ItemNumber

[/font]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

wdillon SSChasing Mays Points: 642 More actions · Answer 4

Hi Jeff:

A lot of fun ideas there!!!! Thank you very very much. Yes SQL 2005 for the question. I'll be working on this in about 2 hours and see where it goes.

Bill

wdillon SSChasing Mays Points: 642 More actions · Answer 5

Hi Jeff:

So, I'm playing with this now and it complains about a "Tally" object. When searching to find out more about this "Tally" object, I'm finding "Karthik, if you want to know how it works, I have an article coming out on May 7th (just 4 more days... :hehe called "The "Numbers" or "Tally" Table: What it is and how it replaces a loop."

Smiles!

So is there anything I can do quick and dirty for this?

Thanks

Bill

GSquared SSC Guru Points: 260824 More actions · Answer 6

create table dbo.Tally (

Number int identity (0,1) primary key,

Junk bit)

go

insert into dbo.Tally(junk)

select top 10001 null

from sys.all_objects

cross join sys.all_objects

go

alter table dbo.Tally

drop column junk

That should do it for you.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

wdillon SSChasing Mays Points: 642 More actions · Answer 7

Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice work

Bill

Noffer Ten Centuries Points: 1090 More actions · Answer 8

If I understand what the point of the original question was correctly, all you want to do is returen a table from a a column with values. I had to do comma delimited one before and created a user defined function for it as follows...

ALTER FUNCTION [dbo].[UDF_CharCommaSeparatedListToTable]

(

@CommaSeparatedList VARCHAR(8000)

)

RETURNS @ParsedTableValue TABLE (TableValue VARCHAR(1000))

AS

BEGIN

DECLARE @TableValue VARCHAR(1000)

DECLARE @Pos INT

SET @CommaSeparatedList = LTRIM(RTRIM(@CommaSeparatedList))+ ','

SET @Pos = CHARINDEX(',', @CommaSeparatedList, 1)

IF REPLACE(@CommaSeparatedList, ',', '') <> ''

BEGIN

WHILE @Pos > 0

BEGIN

SET @TableValue = LTRIM(RTRIM(LEFT(@CommaSeparatedList, @Pos - 1)))

IF @TableValue <> ''

BEGIN

INSERT INTO @ParsedTableValue (TableValue)

VALUES (RTRIM(@TableValue))

END

SET @CommaSeparatedList = RIGHT(@CommaSeparatedList,

LEN(@CommaSeparatedList) - @Pos)

SET @Pos = CHARINDEX(',', @CommaSeparatedList, 1)

END

RETURN

END

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 9

wdillon (5/6/2008)
Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice work
Bill

Jeff rarely sleeps.

Steve's had a buzzer implanted on Jeff, that goes off each time a new post is added...:)

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

GSquared SSC Guru Points: 260824 More actions · Answer 10

Matt Miller (5/6/2008)
wdillon (5/6/2008)
Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice work
Bill
Jeff rarely sleeps.
Steve's had a buzzer implanted on Jeff, that goes off each time a new post is added...:)

Must ... resist ... temptation .... to ... just ... spam ... heck .... out ... of ... forums ... at ... 4 AM...

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 11

Matt Miller (5/6/2008)
Steve's had a buzzer implanted on Jeff, that goes off each time a new post is added...:)

So THAT's what it is! Here I was trying to cut back on coffee and all! 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 12

wdillon (5/6/2008)
Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice work
Bill

Awesome! Man, thanks for the compliment. I'm pretty well embarrassed that I forgot to give you the link for the Tally table. Better late than never I suppose… here 'tis…

http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/

...and that's not the article that comes out tomorrow.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Anam Verma SSCertifiable Points: 5365 More actions · Answer 13

Hi Jeff

Thanks for the intelligent and neat solution of this big problem.

I was thinking of a scenario where the number of records in the input file is huge say a million, would this script be suitable in terms of performance.

I think it wouldn't be as we have to first increase the tally table to 1 million rows and then do a cartesan join with input file records which is again a million.

What could be possible ways in order to handle this kind of situation? I can think of writing my own program which would parse the sequential file and extract the required data.

Any advices to tackle this situation.

Thanks

Anam

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 14

If it's truly a "file" and not just some passed parameter, I'd import it with Bulk Insert or BCP, instead. Dunno about others but I've been able to import 5.1 million 20 column rows in 60 seconds flat using a format file and Bulk Insert.

The big problem with this one is the record format is not consistent. Some rows have more than 1 "Unit" in it and the units are not all laid out the same way. If it were me and I was going to have to content with importing a million rows on a regular basis, I'd get with the folks providing the data and, armed with my favorite bucket of porkchops, beat them into submission of regular importable data. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jason Miller-476791 SSCommitted Points: 1540 More actions · Answer 15

Along a similar vein I have an issue that I'm trying to figure out. I have a fairly simple table...

create table TEST (

a int identity ( 1,1 ),

s varchar(2000) )

insert TEST values ( 'ABC|DEF|GHI|JKL' )

insert TEST values ( 'MNO|PQR|STU|VWX' )

insert TEST values ( 'YZ|123|456|789' )

I took Jeff's tally table and parse function and added a second column to the returned data, just a sequential integer.

(The 2k in the name refers to the fact that the tally table has only 2 thousand entries. I also created a 10K and my unbounded 100M.)

create FUNCTION f_StringToTable_2K_With_Index

(@s-2varchar(max),

@deliminatorchar(1) )

RETURNS @t table ( c varchar(255),

idxsmallint )

AS

BEGIN

DECLARE @ismallint

SET @s-2 = @deliminator + @s-2 + @deliminator

INSERT INTO @t (c, idx)

SELECT SUBSTRING(@s, N+1, CHARINDEX(@deliminator, @s-2, N+1)-N-1) , ROW_NUMBER() OVER (ORDER BY N)

FROM dbo.Tally_2k

WHERE N < LEN( @s-2 ) AND SUBSTRING(@s,N,1) = @deliminator

RETURN

END

What I would like to figure out is how I can feed this table into a select statement and get output like:

[font="Courier New"]a c idx

--- --- ---

1 ABC 1

1 DEF 2

1 GHI 3

1 JKL 4

2 MNO 1

2 PQR 2

2 STU 3

2 VWX 4

3 YZ 1

3 123 2

3 456 3

3 789 4[/font]

I'm sure there's an elegant way to do this, I'm just not seeing it. Lack of sleep from a teething 1.5 year old, there isn't enough coffee around here.

Your time and consideration is greatly appreciated.

Honor Super Omnia-
Jason Miller