SQL 2012: how to parse a text datatype column into many different columns

  • Hello team, your help please!

    I have a database which is designed/developed and owned by an application vendor with SQL server 2012 back end. I am now accessing the database to make analysis reports that I am in trouble with one of the column in demographic table.

    The column is named 'Data' with text datatype and many things in it; such as Date_Of_Birth, Gender, City, State, ZipCode, etc. below are sample rows.

    Row #1: Insurance Name: ALLIED HMO<BR/>SubscriberNo: xxx0000xx <BR/>GroupNo: xxx000yy<BR/>CoPay: 3.00<BR/>CoPayMethod: $<BR/>StartDate: 03/01/2015<BR/>EndDate: <BR/>GroupName: <BR/>Insurance Order: Primary<BR/>Insured Name: Mr. xyz <BR/>Insured Relationship: Self - patient is the insured

    Row#2: Patient Name : abc, xyz <BR/>Address Line 1: 001 E xyz ST<BR/>Address Line 2: <BR/>City: LOS ANGELES<BR/>State: CA<BR/>Zip: 90001-1678<BR/>Phone: 000-000-0000<BR/>Mobile: 000-000-0000<BR/>PreviousName: <BR/>Email: <BR/>DateOfBirth: 01/01/1900<BR/>SSN: 000-00-0000<BR/>Sex: Female

    Row#3: Guarantor Name: abc, xyz<BR/>Guarantor Relationship: Self - patient is the insured

    Row#4: Patient Name : abc,xyz<BR/>Address Line 1: 000 E xyz AVE<BR/>Address Line 2: APT 00<BR/>City: LOS ANGELES<BR/>State: CA<BR/>Zip: 90019-4373<BR/>Phone: 000-000-0000<BR/>Mobile: <BR/>PreviousName: <BR/>Email: <BR/>DateOfBirth: 01/01/1900<BR/>SSN: 000-00-0000<BR/>Sex: Female

    Row#5: Employer Name: <BR/>Employer Address1: <BR/>Employer Address2: <BR/>Employer City: <BR/>Employer State: <BR/>Employer Zip: <BR/>Employer Phone: <BR/>PCP: GILBERT R VARELA<BR/>Referring Provider: <BR/>Rendering Provider: xyz, abc <BR/>Pharmacy: Professional Pharmacy<BR/>Martial Status: Unknown<BR/>Notes: <BR/>Fee Schedule: Medicare Fee Schedule 2015

    Row#6: Patient Name : abc, xyz M<BR/>Address Line 1: 000 N abc BLVD<BR/>Address Line 2: <BR/>City: LONG BEACH<BR/>State: CA<BR/>Zip: 90805-3711<BR/>Phone: 000-000-0000<BR/>Mobile: <BR/>PreviousName: <BR/>Email: <BR/>DateOfBirth: 01/01/1900 <BR/>SSN: 000-000-000<BR/>Sex: Male

    Please your expertise is needed if you worked before similar to this situation, thanks in advance!

  • Here is a quick suggestion to a solution, uses the DelimitedSplit8K function, should be enough to get you over this hurdle.

    The function can be found in these two articles

    1) Tally OH! An Improved SQL 8K “CSV Splitter” Function[/url]

    2) Reaping the benefits of the Window functions in T-SQL[/url]

    😎

    Note that the forum has skewed the test data set!

    USE TEEST;

    GO

    SET NOCOUNT ON;

    ;WITH SAMPLE_DATA(SD_ID,SD_DATA) AS

    (

    SELECT 1,'Insurance Name: ALLIED HMO

    SubscriberNo: xxx0000xx

    GroupNo: xxx000yy

    CoPay: 3.00

    CoPayMethod: $

    StartDate: 03/01/2015

    EndDate:

    GroupName:

    Insurance Order: Primary

    Insured Name: Mr. xyz

    Insured Relationship: Self - patient is the insured'

    UNION ALL

    SELECT 2,'Patient Name : abc, xyz

    Address Line 1: 001 E xyz ST

    Address Line 2:

    City: LOS ANGELES

    State: CA

    Zip: 90001-1678

    Phone: 000-000-0000

    Mobile: 000-000-0000

    PreviousName:

    Email:

    DateOfBirth: 01/01/1900

    SSN: 000-00-0000

    Sex: Female'

    UNION ALL

    SELECT 3,'Guarantor Name: abc, xyz

    Guarantor Relationship: Self - patient is the insured'

    UNION ALL

    SELECT 4,'Patient Name : abc,xyz

    Address Line 1: 000 E xyz AVE

    Address Line 2: APT 00

    City: LOS ANGELES

    State: CA

    Zip: 90019-4373

    Phone: 000-000-0000

    Mobile:

    PreviousName:

    Email:

    DateOfBirth: 01/01/1900

    SSN: 000-00-0000

    Sex: Female'

    UNION ALL

    SELECT 5,'Employer Name:

    Employer Address1:

    Employer Address2:

    Employer City:

    Employer State:

    Employer Zip:

    Employer Phone:

    PCP: GILBERT R VARELA

    Referring Provider:

    Rendering Provider: xyz, abc

    Pharmacy: Professional Pharmacy

    Martial Status: Unknown

    Notes:

    Fee Schedule: Medicare Fee Schedule 2015'

    UNION ALL

    SELECT 6,'Patient Name : abc, xyz M

    Address Line 1: 000 N abc BLVD

    Address Line 2:

    City: LONG BEACH

    State: CA

    Zip: 90805-3711

    Phone: 000-000-0000

    Mobile:

    PreviousName:

    Email:

    DateOfBirth: 01/01/1900

    SSN: 000-000-000

    Sex: Male'

    )

    SELECT

    SD.SD_ID

    ,STRPAIR.ItemNumber

    ,MAX(CASE WHEN VALPAIR.ItemNumber = 1 THEN VALPAIR.Item END) AS ITEM_NAME

    ,MAX(CASE WHEN VALPAIR.ItemNumber = 2 THEN VALPAIR.Item END) AS ITEM_VALUE

    FROM SAMPLE_DATA SD

    CROSS APPLY dbo.DelimitedSplit8K(REPLACE(SD.SD_DATA,'

    ',CHAR(124)),CHAR(124)) STRPAIR

    CROSS APPLY dbo.DelimitedSplit8K(STRPAIR.Item,CHAR(58)) VALPAIR

    GROUP BY SD.SD_ID

    ,STRPAIR.ItemNumber;

    Output

    ItemNumber ITEM_NAME ITEM_VALUE

    ----------- -------------------------- ----------------------------------

    1 Insurance Name ALLIED HMO

    1 Patient Name abc, xyz

    1 Guarantor Name abc, xyz

    1 Patient Name abc,xyz

    1 Employer Name

    1 Patient Name abc, xyz M

    2 SubscriberNo xxx0000xx

    2 Address Line 1 001 E xyz ST

    2 Guarantor Relationship Self - patient is the insured

    2 Address Line 1 000 E xyz AVE

    2 Employer Address1

    2 Address Line 1 000 N abc BLVD

    3 GroupNo xxx000yy

    3 Address Line 2

    3 Address Line 2 APT 00

    3 Employer Address2

    3 Address Line 2

    4 CoPay 3.00

    4 City LOS ANGELES

    4 City LOS ANGELES

    4 Employer City

    4 City LONG BEACH

    5 CoPayMethod $

    5 State CA

    5 State CA

    5 Employer State

    5 State CA

    6 StartDate 03/01/2015

    6 Zip 90001-1678

    6 Zip 90019-4373

    6 Employer Zip

    6 Zip 90805-3711

    7 EndDate

    7 Phone 000-000-0000

    7 Phone 000-000-0000

    7 Employer Phone

    7 Phone 000-000-0000

    8 GroupName

    8 Mobile 000-000-0000

    8 Mobile

    8 PCP GILBERT R VARELA

    8 Mobile

    9 Insurance Order Primary

    9 PreviousName

    9 PreviousName

    9 Referring Provider

    9 PreviousName

    10 Insured Name Mr. xyz

    10 Email

    10 Email

    10 Rendering Provider xyz, abc

    10 Email

    11 Insured Relationship Self - patient is the insured

    11 DateOfBirth 01/01/1900

    11 DateOfBirth 01/01/1900

    11 Pharmacy Professional Pharmacy

    11 DateOfBirth 01/01/1900

    12 SSN 000-00-0000

    12 SSN 000-00-0000

    12 Martial Status Unknown

    12 SSN 000-000-000

    13 Sex Female

    13 Sex Female

    13 Notes

    13 Sex Male

    14 Fee Schedule Medicare Fee Schedule 2015

  • Thanks for the answer!

    I have a question, would that be possible to see the output in a table structured way like:

    IemNum/PIDPatientNameInsuranceNameGuarantorNameDateOfBirth AddressL1City...

    1abc,xyzAllied HMOabc, xyz01/01/1900Los Angeles

    1abc,xyzAllied HMOabc, xyz01/01/1900Long beach

    2

    To make clear, my data is in sql table (>1.5 M rows) of a column with text datatype along with other data fields such as PID,

    datemodified and modifiedBy. One PId could have different demograpic data values that are being modified on every patient's visit day.

    So, do you think the DelimitedSplit8K function can really solve my problem?

  • I know you didn't design this so I'm not mad at you personally but this is yet another bloody application with PII and SSN information in clear text. People that design applications and data like this need to be drummed out of the profession with great prejudice. I guess they don't know what common sense is never mind HIPAA rules and regulations.

    And if the buggers don't think there's anything wrong with it, tell them to give you all their personal information including SSN and see how the feel about it.

    Please tell me who the software vendor is so that I can report them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You spoke my heart out!

  • elulu (12/24/2016)


    You spoke my heart out!

    And yet, there's no action you're going to take against it, eh?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply