Extracting parts of text from a varchar column

  • I have a seemingly very tricky problem here.

    I have a column in a table which has what looks like XML but it actually isn't. It is simply varchar column (string) with data looking like XML.

    What I have to do is extract various parts of this column.

    The parts I have to extract will be at different positions in the string.

    For instance if I want to extract Mr from between the <TITLE> tags then it could be at a different starting position each time - the length of the string is consistent.

    I have tried CHARINDEX and can find the starting point eg 127, 137, 200, 198 etc., though this is of no use as the start is different each time, I have tried a REVERSE to the end tag </TITLE> though this still fails. If I can get to the different starting point each time then I can SUBSTRING the amount I need, for instance Mr will be two characters etc.,

    The string stored in SQL is show below.

    Any help with floating position extract text would be brilliant.

    Kind regards

    Paul

    <Booking Reference="XXXXX"> <Purchases> <Train Reference="00000000"> <Travellers> <Adult> <Title>Mr.</Title> <FirstName>Test</FirstName> <LastName>Testington</LastName> </Adult> </Travellers> <Outbound> <RailLeg OriginStation="MANCHESTER OXFORD ROAD" DepartDateTime="2011-11-25T07:15:00" DestinationStation="LONDON CHARING CROSS" ArrivalDateTime="2011-11-25T10:25:00" Direction="Outbound" Currency="GBP" FareClass="Standard" FareCost="35.50" FareDescription="Standard Advance Single" FareRef="16" SequenceNumber="0" /> </Outbound> <Inbound> <RailLeg OriginStation="LONDON CHARING CROSS" DepartDateTime="2011-11-26T07:02:00" DestinationStation="MANCHESTER OXFORD ROAD" ArrivalDateTime="2011-11-26T09:56:00" Direction="Inbound" Currency="GBP" FareClass="Standard" FareCost="0.00" FareDescription="Standard Advance Single" FareRef="66" SequenceNumber="5" /> </Inbound> </Train> <Flight Reference="AVJAOS"> <Travellers> <Adult> <Title>Mr.</Title> <FirstName>Test</FirstName> <LastName>Testington</LastName> </Adult> </Travellers> <Outbound> <AirLeg OriginAirportName="Manchester, MAN" DepartDateTime="2011-11-18T08:10:00" DestinationAirportName="London Heathrow, LHR" ArrivalDateTime="2011-11-18T09:15:00" Direction="Outbound" FlightNumber="BD581" ServiceClass="Economy" BookingClass="Economy" Currency="GBP" FareCost="43.86" TaxCost="0.00" SequenceNumber="0" /> </Outbound> <Inbound> <AirLeg OriginAirportName="London Heathrow, LHR" DepartDateTime="2011-11-20T07:35:00" DestinationAirportName="Manchester, MAN" ArrivalDateTime="2011-11-20T08:40:00" Direction="Inbound" FlightNumber="AAAAA" ServiceClass="Economy" BookingClass="Economy" Currency="GBP" FareCost="0.00" TaxCost="0.00" SequenceNumber="0" /> </Inbound> </Flight> <Hotel Reference="18760971" Name="Hendon Hall"> <Address>Off Parson Street Hendon London NW4 1HF</Address> <Travellers> <Adult> <Title>Mr.</Title> <FirstName>Test</FirstName> <LastName>Testington</LastName> </Adult> </Travellers> <HotelRooms> <HotelRoom RateType="RACK" RateCost="132.00" /> </HotelRooms> </Hotel> </Purchases> <Booker SVCID="100" Title="Mr." Forename="Test" Surname="Testington" EmailAddress="cgtms@expotel.com" /> <Traveller SVCID="100" Title="Mr." Forename="Test" Surname="Testington" EmailAddress="cgtms@expotel.com" /> </Booking>

  • PATINDEX should do the trick:

    DECLARE @SampleData TABLE (

    data nvarchar(max)

    )

    INSERT INTO @SampleData VALUES (N'<Booking Reference="XXXXX"> <Purchases> <Train Reference="00000000"> <Travellers> <Adult> <Title>Mr.</Title> <FirstName>Test</FirstName> <LastName>Testington</LastName> </Adult> </Travellers> <Outbound> <RailLeg OriginStation="MANCHESTER OXFORD ROAD" DepartDateTime="2011-11-25T07:15:00" DestinationStation="LONDON CHARING CROSS" ArrivalDateTime="2011-11-25T10:25:00" Direction="Outbound" Currency="GBP" FareClass="Standard" FareCost="35.50" FareDescription="Standard Advance Single" FareRef="16" SequenceNumber="0" /> </Outbound> <Inbound> <RailLeg OriginStation="LONDON CHARING CROSS" DepartDateTime="2011-11-26T07:02:00" DestinationStation="MANCHESTER OXFORD ROAD" ArrivalDateTime="2011-11-26T09:56:00" Direction="Inbound" Currency="GBP" FareClass="Standard" FareCost="0.00" FareDescription="Standard Advance Single" FareRef="66" SequenceNumber="5" /> </Inbound> </Train> <Flight Reference="AVJAOS"> <Travellers> <Adult> <Title>Mr.</Title> <FirstName>Test</FirstName> <LastName>Testington</LastName> </Adult> </Travellers> <Outbound> <AirLeg OriginAirportName="Manchester, MAN" DepartDateTime="2011-11-18T08:10:00" DestinationAirportName="London Heathrow, LHR" ArrivalDateTime="2011-11-18T09:15:00" Direction="Outbound" FlightNumber="BD581" ServiceClass="Economy" BookingClass="Economy" Currency="GBP" FareCost="43.86" TaxCost="0.00" SequenceNumber="0" /> </Outbound> <Inbound> <AirLeg OriginAirportName="London Heathrow, LHR" DepartDateTime="2011-11-20T07:35:00" DestinationAirportName="Manchester, MAN" ArrivalDateTime="2011-11-20T08:40:00" Direction="Inbound" FlightNumber="AAAAA" ServiceClass="Economy" BookingClass="Economy" Currency="GBP" FareCost="0.00" TaxCost="0.00" SequenceNumber="0" /> </Inbound> </Flight> <Hotel Reference="18760971" Name="Hendon Hall"> <Address>Off Parson Street Hendon London NW4 1HF</Address> <Travellers> <Adult> <Title>Mr.</Title> <FirstName>Test</FirstName> <LastName>Testington</LastName> </Adult> </Travellers> <HotelRooms> <HotelRoom RateType="RACK" RateCost="132.00" /> </HotelRooms> </Hotel> </Purchases> <Booker SVCID="100" Title="Mr." Forename="Test" Surname="Testington" EmailAddress="cgtms@expotel.com" /> <Traveller SVCID="100" Title="Mr." Forename="Test" Surname="Testington" EmailAddress="cgtms@expotel.com" /> </Booking>')

    SELECT tagContent = SUBSTRING(data, titleTagOpen + tagLength, titleTagClose - titleTagOpen - tagLength)

    FROM (

    SELECT PATINDEX('%<TITLE>%', data) AS titleTagOpen,

    PATINDEX('%</TITLE>%', data) AS titleTagClose,

    LEN('<TITLE>') AS tagLength,

    data

    FROM @SampleData

    ) AS src

    -- Gianluca Sartori

  • If you have the same structure, but different length for each value, then you can convert it to XML and then use Xquery to get the data that you need. By the way if this is the case, then maybe you should also store it as XML instead of a string.

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Wow thank you so much for the prompt and excellent reply. It works a treat. Now I can dismantle my horrible SSIS solution.

    Thanks once again

    Paul

  • Adi Cohn-120898 (1/16/2012)


    If you have the same structure, but different length for each value, then you can convert it to XML and then use Xquery to get the data that you need. By the way if this is the case, then maybe you should also store it as XML instead of a string.

    Adi

    Heh... my recommendation is to parse it all out and store it as normalized data unstead of the blob known as XML. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/16/2012)


    Adi Cohn-120898 (1/16/2012)


    If you have the same structure, but different length for each value, then you can convert it to XML and then use Xquery to get the data that you need. By the way if this is the case, then maybe you should also store it as XML instead of a string.

    Adi

    Heh... my recommendation is to parse it all out and store it as normalized data unstead of the blob known as XML. 😉

    +10000

    But I guess it's not an option any more since we got to this point.

    -- Gianluca Sartori

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply