Parse SQL data

  • Below is the test data. What I am looking for is...

    I want to PARSE any value starting with PD- from the description column. Below is sample data. However, it could be possible the "Description" column could have n number PD id in the column, not restricted to any numbers.

    Any advice would be highly appreciated.

    Here is the result that I am looking.

    k_id, pd_id,description,value_of_week,

    1,,No PS update required. This issue, PS328498,Discussed with the respective Division Risk Manager and product/ client impact was identified.,2

    2,PD-0051,There were two large events in the month of July 2021 related to PD-0051 One event was a near miss of $250,000- A Transaction Banking client received three ACH transactions that were not authorized.,6

    3,PD-01484,IS116748 - User Access Recertification - Global Trust Services (GTS) - This is a High Operations and Technology PD-01484 Project Finance Services and PD-01485 Services which are currently under review and any relevant RCSAs.PD-00148: Global Custody International Custody Services PD-00155: ,8

    3,PD-01485,IS116748 - User Access Recertification - Global Trust Services (GTS) - This is a High Operations and Technology PD-01484 Project Finance Services and PD-01485 Services which are currently under review and any relevant RCSAs.PD-00148: Global Custody International Custody Services PD-00155: ,8

    3,PD-00148,IS116748 - User Access Recertification - Global Trust Services (GTS) - This is a High Operations and Technology PD-01484 Project Finance Services and PD-01485 Services which are currently under review and any relevant RCSAs.PD-00148: Global Custody International Custody Services PD-00155: ,8

    3,PD-00155,IS116748 - User Access Recertification - Global Trust Services (GTS) - This is a High Operations and Technology PD-01484 Project Finance Services and PD-01485 Services which are currently under review and any relevant RCSAs.PD-00148: Global Custody International Custody Services PD-00155: ,8

    4,PD-00713,PD-00713 (Preferred Securities & Partnership), PD-00714 (Financial Solutions Loans): OCC MRA 2021-32 noted that ,12

    4,PD-00714,PD-00713 (Preferred Securities & Partnership), PD-00714 (Financial Solutions Loans): OCC MRA 2021-32 noted that ,12


    DECLARE @Parse_Test TABLE(k_id INT,description VARCHAR(500) , value_of_week int)


    INSERT INTO @Parse_Test

    VALUES

    (1,'No PS update required. This issue, PS328498,Discussed with the respective Division Risk Manager and product/ client impact was identified.',2) ,

    (2,'There were two large events in the month of July 2021 related to PD-0051 One event was a near miss of $250,000- A Transaction Banking client received three ACH transactions that were not authorized.',6) ,

    (3,'IS116748 - User Access Recertification - Global Trust Services (GTS) - This is a High Operations and Technology PD-01484 Project Finance Services and PD-01485 Services which are currently under review and any relevant RCSAs.PD-00148: Global Custody International Custody Services PD-00155: ',8) ,

    (4,'PD-00713 (Preferred Securities & Partnership), PD-00714 (Financial Solutions Loans): OCC MRA 2021-32 noted that ',12)



    SELECT * FROM @Parse_Test
  • To split out the pd_id's you could try STRING_SPLIT to separate based on space delimiter.  Since some 'PD-[0-9]' values are encompassed by other extra characters a conditional substring is done based on PATINDEX location.  Also, I'm using Azure SQL level 150 (which is roughly equivalent to 2019) which has the extra 'enable_ordinal' parameter set to 1 which then outputs the 'ordinal' column

    select pt.k_id, 
    iif(ss.[value] like 'PD-[0-9]%', ss.[value],
    substring(ss.[value], ptndx.loc,
    len(ss.[value])-ptndx.loc)) pd_id,
    pt.[description], pt.value_of_week
    from @Parse_Test pt
    outer apply string_split(pt.description, ' ', 1) ss
    outer apply (values (patindex('%PD-[0-9]%', ss.[value]))) ptndx(loc)
    where ss.[value] like '%PD-[0-9]%'
    order by pt.k_id, ss.ordinal;

    If the order of the rows doesn't matter the ordinal column could be removed

    select pt.k_id, 
    iif(ss.[value] like 'PD-[0-9]%', ss.[value],
    substring(ss.[value], ptndx.loc,
    len(ss.[value])-ptndx.loc)) pd_id,
    pt.[description], pt.value_of_week
    from @Parse_Test pt
    outer apply string_split(pt.description, ' ') ss
    outer apply (values (patindex('%PD-[0-9]%', ss.[value]))) ptndx(loc)
    where ss.[value] like '%PD-[0-9]%'
    order by pt.k_id;

    • This reply was modified 2 years, 5 months ago by  Steve Collins.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    To split out the pd_id's you could try STRING_SPLIT to separate based on space delimiter.  Since some 'PD-[0-9]' values are encompassed by other extra characters a conditional substring is done based on PATINDEX location.  Also, I'm using Azure SQL level 150 (which is roughly equivalent to 2019) which has the extra 'enable_ordinal' parameter set to 1 which then outputs the 'ordinal' column

    If the order of the rows doesn't matter the ordinal column could be removed

    The OP has posted in the 2019 forum, which doesn't have the "Ordinal Position" operand.  Like it says in the MS documentation for STRING_SPLIT()...

    Note

    The enable_ordinal argument and ordinal output column are currently supported in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (serverless SQL pool only). Beginning with SQL Server 2022 (16.x) Preview, the argument and output column are available in SQL Server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Steve,

    It works for the sample data. I am sure it will work on the original data as well. I marked this question as answered. I appreciate your help.

     

     

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply