Tabular Format

Question

Tabular Format

raym

SSCrazy

Points: 2213
More actions
December 28, 2007 at 4:31 am

#177812

I have a file Data.txt . I would like to split the textfile into a tabular format as well as the its data.
Unfortunately the only tool available to me right now is sql server 2005. How can I achieve this ?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

FelixG SSCrazy Points: 2550 More actions · Answer 1

SQL server 2005 offers CLR integration... so I have performed this kind of imports coding a stored procedure using C# that reads the text file and inserts the data to the correspondind table in SQL database.

Regards,

Felix

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 2

That's nice... But the OP did say "Unfortunately the only tool available to me right now is sql server 2005. How can I achieve this ?" I'm pretty sure that means the C# based CLR's are excluded...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 3

Ray,

Just curious... obviously this information is produced by a database somewhere... is there no way to convince the data provider to provide the information either in a delimited format or an XML format?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 4

Actually... I've done it... I've got a few more pieces together, but it's all but done.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 5

Ok... sorry for the delay... couple of things came up and I couldn't get back to this... a shame, really, because this was kind'a fun.

In order to split the information like you wanted, we first have to reassemble the rows according to the numbering embedded in your document. Once we have all the related rows in a single row, then the split is fairly easy.

The splits could be controlled by a "configuration table" but I didn't take the time to do that.

In order to reassemble the rows, we must first import them in a given order. In order to do that, we need an IDENTITY column on the import table (Raw1). In order to import data to a table that has an IDENTITY column, we also need a BCP format file so we can "skip" the IDENTITY column. See the attachment called "Raw1Format.txt". That file must live in the same folder as the raw data file you want to split. For the casual observer, here's what that file looks like...

8.0

2

1 SQLCHAR 0 0 "" 0 RowNum ""

2 SQLCHAR 0 8000 "\r" 2 RawData SQL_Latin1_General_CP1_CI_AS

The rest is history... I'll leave it up to you to insert the output from the following code into a table...

--=====================================================================================================================

-- Setup

--=====================================================================================================================

--===== Create the Raw1 working table

-- Used to congeal the data and place process markers

IF OBJECT_ID('TempDB..Raw1','U') IS NOT NULL

DROP TABLE Raw1

CREATE TABLE Raw1

(

RowNum INT IDENTITY(1,1),

RawData VARCHAR(200), --The raw data being imported

SectionNumber CHAR(2), --The section number (single number on row)

ItemType CHAR(1), --A=Individual, B=Entities, C=List

ItemNumber CHAR(10), --The numeric item number under ItemType

LastUpdated DATETIME, --From the "Last Updated On" rows

ItemMark AS SectionNumber+ItemType+ItemNumber

)

ALTER TABLE Raw1

ADD CONSTRAINT PK_Raw1_RowNum

PRIMARY KEY CLUSTERED (RowNUM)

--===== Create the Raw2 working table

-- Used to split the congealed the data

IF OBJECT_ID('TempDB..Raw2','U') IS NOT NULL

DROP TABLE Raw2

CREATE TABLE Raw2

(

RowNum INT IDENTITY(1,1),

ItemMark CHAR(13),

RawData VARCHAR(MAX),

[*Good quality a.k.a.:]VARCHAR(255)

)

ALTER TABLE Raw2

ADD CONSTRAINT PK_Raw2_RowNum

PRIMARY KEY CLUSTERED (RowNUM)

GO

--=====================================================================================================================

-- Import the raw data file

--=====================================================================================================================

--===== Import the raw data. RowNum will keep the order of rows

BULK INSERT Raw1

FROM 'D:\SQL Examples\Worlds Nastiest Split\Data.txt' --TODO: CHANGE PATH AND FILE NAME

WITH (

DATAFILETYPE = 'CHAR',

FORMATFILE = 'D:\SQL Examples\Worlds Nastiest Split\Raw1Format.txt', --TODO: CHANGE PATH ONLY

TABLOCK

)

--=====================================================================================================================

-- Find and mark the various sections indicated in the file

--=====================================================================================================================

--===== Find and mark "SectionNumber" rows

UPDATE Raw1

SET SectionNumber = RawData

FROM Raw1

WHERE RawData NOT LIKE '%[^0-9]%'

--===== Find and mark "ItemType" rows

UPDATE Raw1

SET ItemType = SUBSTRING(RawData,1,CHARINDEX('.',RawData)-1)

FROM Raw1

WHERE RawData LIKE '[A-Z]. _%'

--===== Find and mark "ItemNumber" rows

UPDATE Raw1

SET ItemNumber = SUBSTRING(RawData,1,CHARINDEX('.',RawData)-1)

FROM Raw1

WHERE RawData LIKE '[0-9]. *Name:%'

OR RawData LIKE '[0-9][0-9]. *Name:%'

OR RawData LIKE '[0-9][0-9][0-9]. *Name:%'

OR RawData LIKE '[0-9][0-9][0-9][0-9]. *Name:%'

--===== Find and mark "LastUpdated" rows

UPDATE Raw1

SET LastUpdated = SUBSTRING(RawData,17,200)

FROM Raw1

WHERE RawData LIKE 'Last Updated On %'

--=====================================================================================================================

-- Smear the various section data down through the rows to identify them.

-- This whole section is just to make it so the ItemMark can be used as the key to reassembling the related rows

-- of raw data so that we can parse them all the same way.

--=====================================================================================================================

--===== Declare the variables that will remember the data to be "smeared" down

DECLARE @SectionNumber CHAR(2)

DECLARE @ItemType CHAR(1)

DECLARE @ItemNumber CHAR(10)

DECLARE @LastUpdated DATETIME

DECLARE @RowNum INT --Used as an "anchor"

--===== Preset the variables to a known condition

SET @SectionNumber = 0

SET @ItemType = NULL

SET @ItemNumber = NULL

SET @LastUpdated = '19000101'

SET @RowNum = 0

--===== Smear the known section and other data down through the other rows to identify them

UPDATE Raw1

SET @ItemNumber = ItemNumber = STR(

CASE WHEN ItemNumber IS NULL

AND ItemType IS NULL

THEN @ItemNumber

ELSE ItemNumber

END

,5),

@ItemType = ItemType = CASE WHEN ItemType IS NULL

AND SectionNumber IS NULL

THEN @ItemType

ELSE ItemType

END,

@SectionNumber = SectionNumber = STR(

CASE WHEN SectionNumber IS NULL

THEN @SectionNumber

ELSE SectionNumber

END

,2),

@LastUpdated = LastUpdated = CASE WHEN LastUpdated IS NULL

THEN @LastUpdated

ELSE LastUpdated

END,

@RowNum = RowNum --Just an "anchor"

FROM Raw1 WITH (INDEX(PK_Raw1_RowNum),TABLOCKX)

WHERE RawData > '' --Is not blank and is not null

--===== Delete all rows we no longer need

DELETE Raw1

WHERE ItemNumber IS NULL

OR RawData LIKE 'Last Updated On %'

--=====================================================================================================================

-- Reassemble the individual rows by SectionNumber, ItemType, and ItemNumber... (Derived table "s")

-- ... then split the data from the derived table into the elements we want

--=====================================================================================================================

SELECT SectionNumber, ItemType, ItemNumber,

Name1 =

CASE WHEN RawData LIKE '%*Name: 1:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('*Name: 1:',RawData)+LEN('*Name: 1:'),

CHARINDEX('2: ',RawData)

-(CHARINDEX('*Name: 1:',RawData)+LEN('*Name: 1:')))))

WHEN RawData LIKE '%*Name: %*A.k.a.:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('*Name:',RawData)+LEN('*Name:'),

CHARINDEX('*A.k.a.:',RawData)

-(CHARINDEX('*Name:',RawData)+LEN('*Name:')))))

ELSE 'na'

END,

Name2 =

CASE WHEN RawData LIKE '%*Name: 1:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('2:',RawData)+LEN('2:'),

CHARINDEX('3: ',RawData)

-(CHARINDEX('2:',RawData)+LEN('2:')))))

ELSE 'na'

END,

Name3 =

CASE WHEN RawData LIKE '%*Name: 1:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('3:',RawData)+LEN('3:'),

CHARINDEX('4: ',RawData)

-(CHARINDEX('3:',RawData)+LEN('3:')))))

ELSE 'na'

END,

Name4 =

CASE WHEN RawData LIKE '%*Name: 1:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('4:',RawData)+LEN('4:'),

CHARINDEX('Title',RawData)

-(CHARINDEX('4:',RawData)+LEN('4:')))))

ELSE 'na'

END,

Title1 =

CASE WHEN RawData LIKE '%Title 1:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('Title 1:',RawData)+LEN('Title 1:'),

CHARINDEX('Title 2:',RawData)

-(CHARINDEX('Title 1:',RawData)+LEN('Title 1:')))))

WHEN RawData LIKE '%*Name: %Title:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('Title:',RawData)+LEN('Title:'),

CHARINDEX('Designation:',RawData)

-(CHARINDEX('Title:',RawData)+LEN('Title:')))))

ELSE 'na'

END,

Title2 =

CASE WHEN RawData LIKE '%Title 2:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('Title 2:',RawData)+LEN('Title 2:'),

CHARINDEX('Designation:',RawData)

-(CHARINDEX('Title 2:',RawData)+LEN('Title 2:')))))

ELSE 'na'

END,

Designation =

CASE WHEN RawData LIKE '%Designation:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('Designation:',RawData)+LEN('Designation:'),

CHARINDEX('DOB:',RawData)

-(CHARINDEX('Designation:',RawData)+LEN('Designation:')))))

ELSE 'na'

END,

DOB =

CASE WHEN RawData LIKE '%DOB:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('DOB:',RawData)+LEN('DOB:'),

CHARINDEX('POB:',RawData)

-(CHARINDEX('DOB:',RawData)+LEN('DOB:')))))

ELSE 'na'

END,

POB =

CASE WHEN RawData LIKE '%POB:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('POB:',RawData)+LEN('POB:'),

CHARINDEX('*Good quality a.k.a.:',RawData)

-(CHARINDEX('POB:',RawData)+LEN('POB:')))))

ELSE 'na'

END,

[*Good quality a.k.a.:] =

CASE WHEN RawData LIKE '%*Good quality a.k.a.:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('*Good quality a.k.a.:',RawData)+LEN('*Good quality a.k.a.:'),

CHARINDEX('Low quality a.k.a.:',RawData)

-(CHARINDEX('*Good quality a.k.a.:',RawData)+LEN('*Good quality a.k.a.:')))))

ELSE 'na'

END,

[Low quality a.k.a.:] =

CASE WHEN RawData LIKE '%Low quality a.k.a.:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('Low quality a.k.a.:',RawData)+LEN('Low quality a.k.a.:'),

CHARINDEX('*Nationality:',RawData)

-(CHARINDEX('Low quality a.k.a.:',RawData)+LEN('Low quality a.k.a.:')))))

ELSE 'na'

END,

[*A.k.a.:] =

CASE WHEN RawData LIKE '%*A.k.a.:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('*A.k.a.:',RawData)+LEN('*A.k.a.:'),

CHARINDEX('F.k.a.:',RawData)

-(CHARINDEX('*A.k.a.:',RawData)+LEN('*A.k.a.:')))))

ELSE 'na'

END,

[F.k.a.:] =

CASE WHEN RawData LIKE '%F.k.a.:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('F.k.a.:',RawData)+LEN('F.k.a.:'),

CHARINDEX('*Address:',RawData)

-(CHARINDEX('F.k.a.:',RawData)+LEN('F.k.a.:')))))

ELSE 'na'

END,

[*Nationality:] =

CASE WHEN RawData LIKE '%*Nationality:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('*Nationality:',RawData)+LEN('*Nationality:'),

CHARINDEX('Passport no.:',RawData)

-(CHARINDEX('*Nationality:',RawData)+LEN('*Nationality:')))))

ELSE 'na'

END,

[Passport no.:] =

CASE WHEN RawData LIKE '%Passport no.:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('Passport no.:',RawData)+LEN('Passport no.:'),

CHARINDEX('National identification no.:',RawData)

-(CHARINDEX('Passport no.:',RawData)+LEN('Passport no.:')))))

ELSE 'na'

END,

[National identification no.:] =

CASE WHEN RawData LIKE '%National identification no.:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('National identification no.:',RawData)+LEN('National identification no.:'),

CHARINDEX('Address:',RawData)

-(CHARINDEX('National identification no.:',RawData)+LEN('National identification no.:')))))

ELSE 'na'

END,

[Address:] =

CASE WHEN RawData LIKE '%*Address:%*Listed on:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('*Address:',RawData)+LEN('*Address:'),

CHARINDEX('*Listed on:',RawData)

-(CHARINDEX('*Address:',RawData)+LEN('*Address:')))))

WHEN RawData LIKE '%*Address:%Listed on:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('*Address:',RawData)+LEN('*Address:'),

CHARINDEX('Listed on:',RawData)

-(CHARINDEX('*Address:',RawData)+LEN('*Address:')))))

ELSE 'na'

END,

[*Listed on:] =

CASE WHEN RawData LIKE '%Listed on:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('*Listed on:',RawData)+LEN('*Listed on:'),

CHARINDEX('*Other information:',RawData)

-(CHARINDEX('*Listed on:',RawData)+LEN('*Listed on:')))))

ELSE 'na'

END,

[*Other information:] =

CASE WHEN RawData LIKE '%*Other information:%'

THEN RTRIM(LTRIM(SUBSTRING(RawData,

CHARINDEX('*Other information:',RawData)+LEN('*Other information:'),

1000)))

ELSE 'na'

END

FROM

(--==== Derived table "s" reassembles the data from the "Raw1" table and makes some minor data corrections

SELECT t1.ItemMark, SectionNumber, ItemType, ItemNumber,

REPLACE(REPLACE(STUFF((SELECT ' ' + t2.RawData FROM Raw1 t2 WHERE t1.ItemMark = t2.ItemMark FOR XML PATH('')),1,1,''),' :',':'),' ',' ') AS RawData

FROM Raw1 t1

GROUP BY t1.ItemMark, SectionNumber, ItemType, ItemNumber

)s

For your convenience, I've attached that bit of T-SQL in the file called "Work 01.txt" (not allowed to attach .SQL extensions on this forum).

Before you try to run the code, find the two places that have the word "TODO"... you must make a couple of path and file name changes to match your environment. Normally, I'd have these in a configuration table, as well, but simply didn't do it for this "freebie".

If you have any questions, please read the comments in the code first, then study the code before you ask... The comments aren't the greatest I've ever written but, like I said, I'm a bit pressed for time this week...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 6

p.s. I just noticed that I left the code in that creates the "Raw2" table... that code is just not necessary and may be removed. Sorry for the non-lethal mistake... 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)