May 5, 2008 at 11:51 pm
I agree with Dave's logic. You have to manually write a program to parse this file and then store the required text from file into a table.
i did parsing of complete HTML file in past where i have to extract information and then store it in table, do some database processing on info and then store the resultant into HTML page at its desired location. This is complete db programming. It took me a while to think and work around logic.
Best of luck mate.
May 6, 2008 at 2:11 am
wdillon (5/5/2008)
Hi JeffHum, you are right, it's pretty messed up. I'll attach a zip file of the orginal and see if that works.
Thanks
Bill
Thanks... that's a lot better.
Suggestion on using b an /b are good but not 100%... the marks are inconsistent. Fortunately, they can be made consistent... and then we can hit this with a "simple" 3 dimensional split.
I'm working on it... just to confirm, you ARE using SQL Server 2005 for sure, right?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 6, 2008 at 5:47 am
Like I said, the data needs to be made more consistent... and the code below does that. About all you need to do is change the path to the file.
Fun exercise... thanks...
[font="Courier New"]
--===== Import the data into a temporary working table
IF OBJECT_ID('TempDB..#RawData','U') IS NOT NULL
DROP TABLE #RawData
CREATE TABLE #RawData (RawData VARCHAR(8000))
BULK INSERT #RawData
FROM 'C:\Temp\EastLake\EastLake.csv' --SQL Server needs to be able to "see" where the file is at
WITH (DATAFILETYPE = 'WIDECHAR', --It IS a unicode file
MAXERRORS = 1000000) --Forget errors... load everything
--===== Precondition the raw data to make it consistent
-- These are order sensitive
UPDATE #RawData SET RawData = REPLACE(RawData,' ',' ')
UPDATE #RawData SET RawData = REPLACE(RawData,' ',' ')
UPDATE #RawData SET RawData = REPLACE(RawData,' Check In:','<b>Check In:')
UPDATE #RawData SET RawData = REPLACE(RawData,' <','<')
UPDATE #RawData SET RawData = REPLACE(RawData,': ',':</b>')
UPDATE #RawData SET RawData = REPLACE(RawData,'<br><b>Total Cost:','<b>Total Cost:')
UPDATE #RawData SET RawData = REPLACE(RawData,'> ','>')
UPDATE #RawData SET RawData = REPLACE(RawData,'<br /><br /><b>','<br><b>')
UPDATE #RawData SET RawData = REPLACE(RawData,'<br />','<b>')
UPDATE #RawData SET RawData = REPLACE(RawData,' Tax:','<b>Tax:')
UPDATE #RawData SET RawData = REPLACE(RawData,'Total Cost:','Total Price:')
UPDATE #RawData SET RawData = REPLACE(RawData,'<b>','~')
UPDATE #RawData SET RawData = REPLACE(RawData,'</b>','~')
UPDATE #RawData SET RawData = REPLACE(RawData,'<br>','|')
UPDATE #RawData SET RawData = REPLACE(RawData,'~~|~','~')
--===== Do the "impossible" split
;WITH
cteSplit1 AS
(--==== This splits reservations that have more than one Item per line
SELECT ReservationID,
ItemNumber = ROW_NUMBER() OVER (PARTITION BY ReservationID ORDER BY ReservationID),
Split1
FROM (
SELECT ReservationID = CAST(LEFT(RawData,CHARINDEX(',',rd.RawData)-1) AS INT),
Split1 = SUBSTRING('|'+rd.RawData+'|',t.N+1,CHARINDEX('|','|'+rd.RawData+'|',t.N+1)-t.N-1)
FROM #RawData rd,
dbo.Tally t
WHERE t.N < LEN('|'+rd.RawData+'|')
AND t.N > CHARINDEX(',',rd.RawData)
AND SUBSTRING('|'+rd.RawData+'|',t.N,1) = '|'
)d
)
,
cteSplit2 AS
(--==== This splits the individual Items from each line with a couple of "helper" columns.
-- SubItem let's us join back to the cte to get matching pairs
SELECT ReservationID,
ItemNumber,
SubItem = ROW_NUMBER() OVER (PARTITION BY ReservationID,ItemNumber ORDER BY ReservationID,ItemNumber),
HalfItem = (ROW_NUMBER() OVER (PARTITION BY ReservationID,ItemNumber ORDER BY ReservationID,ItemNumber)-1)%2,
Split2 = SUBSTRING(s1.Split1+'~',t.N+1,CHARINDEX('~',s1.Split1+'~',t.N+1)-t.N-1)
FROM cteSplit1 s1,
dbo.Tally t
WHERE t.N < LEN(s1.Split1+'~')
AND SUBSTRING(s1.Split1+'~',t.N,1) = '~'
)
--===== This reassembles the data into what we'd expect as a table
SELECT c.ReservationID,
c.ItemNumber,
[Service Type] = MAX(CASE c.Split2
WHEN 'Cabin:' THEN 'Cabin'
WHEN 'Dock:' THEN 'Dock'
WHEN 'Pontoon:' THEN 'Pontoon'
WHEN 'RV Site:' THEN 'RV Site'
END),
[Service Name] = MAX(CASE c.Split2
WHEN 'Cabin:' THEN v.Split2
WHEN 'Dock:' THEN v.Split2
WHEN 'Pontoon:' THEN v.Split2
WHEN 'RV Site:' THEN v.Split2
END),
[Check In] = MAX(CASE
WHEN c.Split2 = 'Check In:'
AND ISDATE(v.Split2) = 1
THEN CAST(v.Split2 AS DATETIME)
END),
[Check OUT] = MAX(CASE
WHEN c.Split2 = 'Check Out:'
AND ISDATE(v.Split2) = 1
THEN CAST(v.Split2 AS DATETIME)
END),
Price = MAX(CASE
WHEN c.Split2 = 'Price:'
THEN CAST(v.Split2 AS MONEY)
END),
Tax = MAX(CASE
WHEN c.Split2 = 'Tax:'
THEN CAST(v.Split2 AS MONEY)
END),
Total = MAX(CASE
WHEN c.Split2 = 'Total:'
THEN CAST(v.Split2 AS MONEY)
END),
[Total Price] = MAX(CASE
WHEN c.Split2 = 'Total Price:'
THEN CAST(v.Split2 AS MONEY)
END)
FROM cteSplit2 c --This provides the column name part of paired values
INNER JOIN
cteSplit2 v --This provides the value part of paired values
ON c.ReservationID = v.ReservationID
AND c.ItemNumber = v.ItemNumber
AND c.SubItem+1 = v.SubItem --Value always follows the column name
WHERE c.HalfItem = 0 --Identifies Column name part of paired values
GROUP BY c.ReservationID,
c.ItemNumber
ORDER BY c.ReservationID,
c.ItemNumber
[/font]
--Jeff Moden
Change is inevitable... Change for the better is not.
May 6, 2008 at 7:34 am
Hi Jeff:
A lot of fun ideas there!!!! Thank you very very much. Yes SQL 2005 for the question. I'll be working on this in about 2 hours and see where it goes.
Bill
May 6, 2008 at 11:37 am
Hi Jeff:
So, I'm playing with this now and it complains about a "Tally" object. When searching to find out more about this "Tally" object, I'm finding "Karthik, if you want to know how it works, I have an article coming out on May 7th (just 4 more days... :hehe called "The "Numbers" or "Tally" Table: What it is and how it replaces a loop."
Smiles!
So is there anything I can do quick and dirty for this?
Thanks
Bill
May 6, 2008 at 12:10 pm
create table dbo.Tally (
Number int identity (0,1) primary key,
Junk bit)
go
insert into dbo.Tally(junk)
select top 10001 null
from sys.all_objects
cross join sys.all_objects
go
alter table dbo.Tally
drop column junk
That should do it for you.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 6, 2008 at 12:17 pm
Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice work
Bill
May 6, 2008 at 12:46 pm
If I understand what the point of the original question was correctly, all you want to do is returen a table from a a column with values. I had to do comma delimited one before and created a user defined function for it as follows...
ALTER FUNCTION [dbo].[UDF_CharCommaSeparatedListToTable]
(
@CommaSeparatedList VARCHAR(8000)
)
RETURNS @ParsedTableValue TABLE (TableValue VARCHAR(1000))
AS
BEGIN
DECLARE @TableValue VARCHAR(1000)
DECLARE @Pos INT
SET @CommaSeparatedList = LTRIM(RTRIM(@CommaSeparatedList))+ ','
SET @Pos = CHARINDEX(',', @CommaSeparatedList, 1)
IF REPLACE(@CommaSeparatedList, ',', '') <> ''
BEGIN
WHILE @Pos > 0
BEGIN
SET @TableValue = LTRIM(RTRIM(LEFT(@CommaSeparatedList, @Pos - 1)))
IF @TableValue <> ''
BEGIN
INSERT INTO @ParsedTableValue (TableValue)
VALUES (RTRIM(@TableValue))
END
SET @CommaSeparatedList = RIGHT(@CommaSeparatedList,
LEN(@CommaSeparatedList) - @Pos)
SET @Pos = CHARINDEX(',', @CommaSeparatedList, 1)
END
END
RETURN
END
May 6, 2008 at 12:49 pm
wdillon (5/6/2008)
Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice workBill
Jeff rarely sleeps.
Steve's had a buzzer implanted on Jeff, that goes off each time a new post is added...:)
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 6, 2008 at 2:11 pm
Matt Miller (5/6/2008)
wdillon (5/6/2008)
Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice workBill
Jeff rarely sleeps.
Steve's had a buzzer implanted on Jeff, that goes off each time a new post is added...:)
Must ... resist ... temptation .... to ... just ... spam ... heck .... out ... of ... forums ... at ... 4 AM...
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 6, 2008 at 5:32 pm
Matt Miller (5/6/2008)
Steve's had a buzzer implanted on Jeff, that goes off each time a new post is added...:)
So THAT's what it is! Here I was trying to cut back on coffee and all! 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
May 6, 2008 at 5:41 pm
wdillon (5/6/2008)
Hi Jeff, I got the Tally table thing worked out from finding on another post of yours. WOW!!!! The data looks great!! Do you ever get any sleep? Nice workBill
Awesome! Man, thanks for the compliment. I'm pretty well embarrassed that I forgot to give you the link for the Tally table. Better late than never I suppose… here 'tis…
http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/
...and that's not the article that comes out tomorrow.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 7, 2008 at 10:01 pm
Hi Jeff
Thanks for the intelligent and neat solution of this big problem.
I was thinking of a scenario where the number of records in the input file is huge say a million, would this script be suitable in terms of performance.
I think it wouldn't be as we have to first increase the tally table to 1 million rows and then do a cartesan join with input file records which is again a million.
What could be possible ways in order to handle this kind of situation? I can think of writing my own program which would parse the sequential file and extract the required data.
Any advices to tackle this situation.
Thanks
Anam
May 8, 2008 at 12:53 am
If it's truly a "file" and not just some passed parameter, I'd import it with Bulk Insert or BCP, instead. Dunno about others but I've been able to import 5.1 million 20 column rows in 60 seconds flat using a format file and Bulk Insert.
The big problem with this one is the record format is not consistent. Some rows have more than 1 "Unit" in it and the units are not all laid out the same way. If it were me and I was going to have to content with importing a million rows on a regular basis, I'd get with the folks providing the data and, armed with my favorite bucket of porkchops, beat them into submission of regular importable data. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
December 3, 2008 at 5:57 am
Along a similar vein I have an issue that I'm trying to figure out. I have a fairly simple table...
create table TEST (
a int identity ( 1,1 ),
s varchar(2000) )
insert TEST values ( 'ABC|DEF|GHI|JKL' )
insert TEST values ( 'MNO|PQR|STU|VWX' )
insert TEST values ( 'YZ|123|456|789' )
I took Jeff's tally table and parse function and added a second column to the returned data, just a sequential integer.
(The 2k in the name refers to the fact that the tally table has only 2 thousand entries. I also created a 10K and my unbounded 100M.)
create FUNCTION f_StringToTable_2K_With_Index
(@s-2varchar(max),
@deliminatorchar(1) )
RETURNS @t table ( c varchar(255),
idxsmallint )
AS
BEGIN
DECLARE @ismallint
SET @s-2 = @deliminator + @s-2 + @deliminator
INSERT INTO @t (c, idx)
SELECT SUBSTRING(@s, N+1, CHARINDEX(@deliminator, @s-2, N+1)-N-1) , ROW_NUMBER() OVER (ORDER BY N)
FROM dbo.Tally_2k
WHERE N < LEN( @s-2 ) AND SUBSTRING(@s,N,1) = @deliminator
RETURN
END
What I would like to figure out is how I can feed this table into a select statement and get output like:
[font="Courier New"]a c idx
--- --- ---
1 ABC 1
1 DEF 2
1 GHI 3
1 JKL 4
2 MNO 1
2 PQR 2
2 STU 3
2 VWX 4
3 YZ 1
3 123 2
3 456 3
3 789 4[/font]
I'm sure there's an elegant way to do this, I'm just not seeing it. Lack of sleep from a teething 1.5 year old, there isn't enough coffee around here.
Your time and consideration is greatly appreciated.
Honor Super Omnia-
Jason Miller
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply