Tabular Format

  • I have a file Data.txt . I would like to split the textfile into a tabular format as well as the its data.

    Unfortunately the only tool available to me right now is sql server 2005. How can I achieve this ?

  • SQL server 2005 offers CLR integration... so I have performed this kind of imports coding a stored procedure using C# that reads the text file and inserts the data to the correspondind table in SQL database.

    Regards,

    Felix

  • That's nice... But the OP did say "Unfortunately the only tool available to me right now is sql server 2005. How can I achieve this ?" I'm pretty sure that means the C# based CLR's are excluded...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ray,

    Just curious... obviously this information is produced by a database somewhere... is there no way to convince the data provider to provide the information either in a delimited format or an XML format?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Actually... I've done it... I've got a few more pieces together, but it's all but done.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... sorry for the delay... couple of things came up and I couldn't get back to this... a shame, really, because this was kind'a fun.

    In order to split the information like you wanted, we first have to reassemble the rows according to the numbering embedded in your document. Once we have all the related rows in a single row, then the split is fairly easy.

    The splits could be controlled by a "configuration table" but I didn't take the time to do that.

    In order to reassemble the rows, we must first import them in a given order. In order to do that, we need an IDENTITY column on the import table (Raw1). In order to import data to a table that has an IDENTITY column, we also need a BCP format file so we can "skip" the IDENTITY column. See the attachment called "Raw1Format.txt". That file must live in the same folder as the raw data file you want to split. For the casual observer, here's what that file looks like...

    8.0

    2

    1 SQLCHAR 0 0 "" 0 RowNum ""

    2 SQLCHAR 0 8000 "\r" 2 RawData SQL_Latin1_General_CP1_CI_AS

    The rest is history... I'll leave it up to you to insert the output from the following code into a table...

    --=====================================================================================================================

    -- Setup

    --=====================================================================================================================

    --===== Create the Raw1 working table

    -- Used to congeal the data and place process markers

    IF OBJECT_ID('TempDB..Raw1','U') IS NOT NULL

    DROP TABLE Raw1

    CREATE TABLE Raw1

    (

    RowNum INT IDENTITY(1,1),

    RawData VARCHAR(200), --The raw data being imported

    SectionNumber CHAR(2), --The section number (single number on row)

    ItemType CHAR(1), --A=Individual, B=Entities, C=List

    ItemNumber CHAR(10), --The numeric item number under ItemType

    LastUpdated DATETIME, --From the "Last Updated On" rows

    ItemMark AS SectionNumber+ItemType+ItemNumber

    )

    ALTER TABLE Raw1

    ADD CONSTRAINT PK_Raw1_RowNum

    PRIMARY KEY CLUSTERED (RowNUM)

    --===== Create the Raw2 working table

    -- Used to split the congealed the data

    IF OBJECT_ID('TempDB..Raw2','U') IS NOT NULL

    DROP TABLE Raw2

    CREATE TABLE Raw2

    (

    RowNum INT IDENTITY(1,1),

    ItemMark CHAR(13),

    RawData VARCHAR(MAX),

    [*Good quality a.k.a.:]VARCHAR(255)

    )

    ALTER TABLE Raw2

    ADD CONSTRAINT PK_Raw2_RowNum

    PRIMARY KEY CLUSTERED (RowNUM)

    GO

    --=====================================================================================================================

    -- Import the raw data file

    --=====================================================================================================================

    --===== Import the raw data. RowNum will keep the order of rows

    BULK INSERT Raw1

    FROM 'D:\SQL Examples\Worlds Nastiest Split\Data.txt' --TODO: CHANGE PATH AND FILE NAME

    WITH (

    DATAFILETYPE = 'CHAR',

    FORMATFILE = 'D:\SQL Examples\Worlds Nastiest Split\Raw1Format.txt', --TODO: CHANGE PATH ONLY

    TABLOCK

    )

    --=====================================================================================================================

    -- Find and mark the various sections indicated in the file

    --=====================================================================================================================

    --===== Find and mark "SectionNumber" rows

    UPDATE Raw1

    SET SectionNumber = RawData

    FROM Raw1

    WHERE RawData NOT LIKE '%[^0-9]%'

    --===== Find and mark "ItemType" rows

    UPDATE Raw1

    SET ItemType = SUBSTRING(RawData,1,CHARINDEX('.',RawData)-1)

    FROM Raw1

    WHERE RawData LIKE '[A-Z]. _%'

    --===== Find and mark "ItemNumber" rows

    UPDATE Raw1

    SET ItemNumber = SUBSTRING(RawData,1,CHARINDEX('.',RawData)-1)

    FROM Raw1

    WHERE RawData LIKE '[0-9]. *Name:%'

    OR RawData LIKE '[0-9][0-9]. *Name:%'

    OR RawData LIKE '[0-9][0-9][0-9]. *Name:%'

    OR RawData LIKE '[0-9][0-9][0-9][0-9]. *Name:%'

    --===== Find and mark "LastUpdated" rows

    UPDATE Raw1

    SET LastUpdated = SUBSTRING(RawData,17,200)

    FROM Raw1

    WHERE RawData LIKE 'Last Updated On %'

    --=====================================================================================================================

    -- Smear the various section data down through the rows to identify them.

    -- This whole section is just to make it so the ItemMark can be used as the key to reassembling the related rows

    -- of raw data so that we can parse them all the same way.

    --=====================================================================================================================

    --===== Declare the variables that will remember the data to be "smeared" down

    DECLARE @SectionNumber CHAR(2)

    DECLARE @ItemType CHAR(1)

    DECLARE @ItemNumber CHAR(10)

    DECLARE @LastUpdated DATETIME

    DECLARE @RowNum INT --Used as an "anchor"

    --===== Preset the variables to a known condition

    SET @SectionNumber = 0

    SET @ItemType = NULL

    SET @ItemNumber = NULL

    SET @LastUpdated = '19000101'

    SET @RowNum = 0

    --===== Smear the known section and other data down through the other rows to identify them

    UPDATE Raw1

    SET @ItemNumber = ItemNumber = STR(

    CASE WHEN ItemNumber IS NULL

    AND ItemType IS NULL

    THEN @ItemNumber

    ELSE ItemNumber

    END

    ,5),

    @ItemType = ItemType = CASE WHEN ItemType IS NULL

    AND SectionNumber IS NULL

    THEN @ItemType

    ELSE ItemType

    END,

    @SectionNumber = SectionNumber = STR(

    CASE WHEN SectionNumber IS NULL

    THEN @SectionNumber

    ELSE SectionNumber

    END

    ,2),

    @LastUpdated = LastUpdated = CASE WHEN LastUpdated IS NULL

    THEN @LastUpdated

    ELSE LastUpdated

    END,

    @RowNum = RowNum --Just an "anchor"

    FROM Raw1 WITH (INDEX(PK_Raw1_RowNum),TABLOCKX)

    WHERE RawData > '' --Is not blank and is not null

    --===== Delete all rows we no longer need

    DELETE Raw1

    WHERE ItemNumber IS NULL

    OR RawData LIKE 'Last Updated On %'

    --=====================================================================================================================

    -- Reassemble the individual rows by SectionNumber, ItemType, and ItemNumber... (Derived table "s")

    -- ... then split the data from the derived table into the elements we want

    --=====================================================================================================================

    SELECT SectionNumber, ItemType, ItemNumber,

    Name1 =

    CASE WHEN RawData LIKE '%*Name: 1:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('*Name: 1:',RawData)+LEN('*Name: 1:'),

    CHARINDEX('2: ',RawData)

    -(CHARINDEX('*Name: 1:',RawData)+LEN('*Name: 1:')))))

    WHEN RawData LIKE '%*Name: %*A.k.a.:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('*Name:',RawData)+LEN('*Name:'),

    CHARINDEX('*A.k.a.:',RawData)

    -(CHARINDEX('*Name:',RawData)+LEN('*Name:')))))

    ELSE 'na'

    END,

    Name2 =

    CASE WHEN RawData LIKE '%*Name: 1:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('2:',RawData)+LEN('2:'),

    CHARINDEX('3: ',RawData)

    -(CHARINDEX('2:',RawData)+LEN('2:')))))

    ELSE 'na'

    END,

    Name3 =

    CASE WHEN RawData LIKE '%*Name: 1:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('3:',RawData)+LEN('3:'),

    CHARINDEX('4: ',RawData)

    -(CHARINDEX('3:',RawData)+LEN('3:')))))

    ELSE 'na'

    END,

    Name4 =

    CASE WHEN RawData LIKE '%*Name: 1:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('4:',RawData)+LEN('4:'),

    CHARINDEX('Title',RawData)

    -(CHARINDEX('4:',RawData)+LEN('4:')))))

    ELSE 'na'

    END,

    Title1 =

    CASE WHEN RawData LIKE '%Title 1:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('Title 1:',RawData)+LEN('Title 1:'),

    CHARINDEX('Title 2:',RawData)

    -(CHARINDEX('Title 1:',RawData)+LEN('Title 1:')))))

    WHEN RawData LIKE '%*Name: %Title:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('Title:',RawData)+LEN('Title:'),

    CHARINDEX('Designation:',RawData)

    -(CHARINDEX('Title:',RawData)+LEN('Title:')))))

    ELSE 'na'

    END,

    Title2 =

    CASE WHEN RawData LIKE '%Title 2:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('Title 2:',RawData)+LEN('Title 2:'),

    CHARINDEX('Designation:',RawData)

    -(CHARINDEX('Title 2:',RawData)+LEN('Title 2:')))))

    ELSE 'na'

    END,

    Designation =

    CASE WHEN RawData LIKE '%Designation:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('Designation:',RawData)+LEN('Designation:'),

    CHARINDEX('DOB:',RawData)

    -(CHARINDEX('Designation:',RawData)+LEN('Designation:')))))

    ELSE 'na'

    END,

    DOB =

    CASE WHEN RawData LIKE '%DOB:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('DOB:',RawData)+LEN('DOB:'),

    CHARINDEX('POB:',RawData)

    -(CHARINDEX('DOB:',RawData)+LEN('DOB:')))))

    ELSE 'na'

    END,

    POB =

    CASE WHEN RawData LIKE '%POB:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('POB:',RawData)+LEN('POB:'),

    CHARINDEX('*Good quality a.k.a.:',RawData)

    -(CHARINDEX('POB:',RawData)+LEN('POB:')))))

    ELSE 'na'

    END,

    [*Good quality a.k.a.:] =

    CASE WHEN RawData LIKE '%*Good quality a.k.a.:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('*Good quality a.k.a.:',RawData)+LEN('*Good quality a.k.a.:'),

    CHARINDEX('Low quality a.k.a.:',RawData)

    -(CHARINDEX('*Good quality a.k.a.:',RawData)+LEN('*Good quality a.k.a.:')))))

    ELSE 'na'

    END,

    [Low quality a.k.a.:] =

    CASE WHEN RawData LIKE '%Low quality a.k.a.:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('Low quality a.k.a.:',RawData)+LEN('Low quality a.k.a.:'),

    CHARINDEX('*Nationality:',RawData)

    -(CHARINDEX('Low quality a.k.a.:',RawData)+LEN('Low quality a.k.a.:')))))

    ELSE 'na'

    END,

    [*A.k.a.:] =

    CASE WHEN RawData LIKE '%*A.k.a.:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('*A.k.a.:',RawData)+LEN('*A.k.a.:'),

    CHARINDEX('F.k.a.:',RawData)

    -(CHARINDEX('*A.k.a.:',RawData)+LEN('*A.k.a.:')))))

    ELSE 'na'

    END,

    [F.k.a.:] =

    CASE WHEN RawData LIKE '%F.k.a.:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('F.k.a.:',RawData)+LEN('F.k.a.:'),

    CHARINDEX('*Address:',RawData)

    -(CHARINDEX('F.k.a.:',RawData)+LEN('F.k.a.:')))))

    ELSE 'na'

    END,

    [*Nationality:] =

    CASE WHEN RawData LIKE '%*Nationality:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('*Nationality:',RawData)+LEN('*Nationality:'),

    CHARINDEX('Passport no.:',RawData)

    -(CHARINDEX('*Nationality:',RawData)+LEN('*Nationality:')))))

    ELSE 'na'

    END,

    [Passport no.:] =

    CASE WHEN RawData LIKE '%Passport no.:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('Passport no.:',RawData)+LEN('Passport no.:'),

    CHARINDEX('National identification no.:',RawData)

    -(CHARINDEX('Passport no.:',RawData)+LEN('Passport no.:')))))

    ELSE 'na'

    END,

    [National identification no.:] =

    CASE WHEN RawData LIKE '%National identification no.:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('National identification no.:',RawData)+LEN('National identification no.:'),

    CHARINDEX('Address:',RawData)

    -(CHARINDEX('National identification no.:',RawData)+LEN('National identification no.:')))))

    ELSE 'na'

    END,

    [Address:] =

    CASE WHEN RawData LIKE '%*Address:%*Listed on:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('*Address:',RawData)+LEN('*Address:'),

    CHARINDEX('*Listed on:',RawData)

    -(CHARINDEX('*Address:',RawData)+LEN('*Address:')))))

    WHEN RawData LIKE '%*Address:%Listed on:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('*Address:',RawData)+LEN('*Address:'),

    CHARINDEX('Listed on:',RawData)

    -(CHARINDEX('*Address:',RawData)+LEN('*Address:')))))

    ELSE 'na'

    END,

    [*Listed on:] =

    CASE WHEN RawData LIKE '%Listed on:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('*Listed on:',RawData)+LEN('*Listed on:'),

    CHARINDEX('*Other information:',RawData)

    -(CHARINDEX('*Listed on:',RawData)+LEN('*Listed on:')))))

    ELSE 'na'

    END,

    [*Other information:] =

    CASE WHEN RawData LIKE '%*Other information:%'

    THEN RTRIM(LTRIM(SUBSTRING(RawData,

    CHARINDEX('*Other information:',RawData)+LEN('*Other information:'),

    1000)))

    ELSE 'na'

    END

    FROM

    (--==== Derived table "s" reassembles the data from the "Raw1" table and makes some minor data corrections

    SELECT t1.ItemMark, SectionNumber, ItemType, ItemNumber,

    REPLACE(REPLACE(STUFF((SELECT ' ' + t2.RawData FROM Raw1 t2 WHERE t1.ItemMark = t2.ItemMark FOR XML PATH('')),1,1,''),' :',':'),' ',' ') AS RawData

    FROM Raw1 t1

    GROUP BY t1.ItemMark, SectionNumber, ItemType, ItemNumber

    )s

    For your convenience, I've attached that bit of T-SQL in the file called "Work 01.txt" (not allowed to attach .SQL extensions on this forum).

    Before you try to run the code, find the two places that have the word "TODO"... you must make a couple of path and file name changes to match your environment. Normally, I'd have these in a configuration table, as well, but simply didn't do it for this "freebie".

    If you have any questions, please read the comments in the code first, then study the code before you ask... The comments aren't the greatest I've ever written but, like I said, I'm a bit pressed for time this week...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s. I just noticed that I left the code in that creates the "Raw2" table... that code is just not necessary and may be removed. Sorry for the non-lethal mistake... 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply