Select a string when data in the string varies

  • I'm trying to query a substring from a long string. I figured out how to do it by using the SUBSTRING and CHARINDEX commands like this:

    (SELECT SUBSTRING(SUBSTRING(column_name,CHARINDEX('string_to_find',column_name,0),CHARINDEX('&DeviceType=',column_name) - CHARINDEX('string_to_find',column_name)),11,60)) AS MyResult

    After taking a closer look at the raw data, I noticed that the string I was looking for was located at the beginning of the string and also within it. Here is an example:

    Record 1: @1:string_to_find&2:12312312321312@3:3123123abc123

    Record 2: @1:23432423@2:48089@3:abc123@4:string_to_find@5:23434@6:343

    I thought I could use an IF statement, but I can't get it to work properly. Here is what I attempted:

    IF (SELECT CHARINDEX('Cmd=',column_name) FROM Raw_Logs WHERE (column_name1 LIKE 'some_matching_data')) = 1

    SELECT the characters after the colon following the string_to_find when the string_to_find is at the beginning.

    ELSE

    SELECT the characters after the colon following the string_to_find when the string_to_find is NOT at the beginning.

    The data to fetch varies in length, so I use subtract the CHARINDEX to get the length (i.e. @6: - @5:)

    Perhaps there is an easier way to do this altogether. I'm open to any suggestions. Thanks in advance for the help!

  • bill.brazell (4/12/2010)


    Perhaps there is an easier way to do this altogether. I'm open to any suggestions. Thanks in advance for the help!

    Yes there will be... but before that, how about you going through this following article and helping us help you?? 🙂

    CLICK HERE FOR FORUM POSTING ETIQUETTES - JEFF MODEN[/url]

    When u do so, i am sure a lot of us will help u instantly...

    So please post

    1. Table Structures - CREATE TABLE scripts

    2. Constraints or Indexes in the table, - CREATE SCRIPTS

    3. SAMPLE DATA - INSERT INTO TABLE scripts

    4. Desired output - some visual representation of this.

    I am pretty sure, once u give them, you ll for sure get back tested and optimized code...

    Cheers..

  • Ok,as i had some free time, i understood only something from your post and here is the code u are looking for (presumably).

    My assumptions from your post :

    1. U have a string column which has some commands

    2. u want to strip the string such that all the characters after a matching pattern should be retreived.

    Example : U have a STRING = 'CMD=DIR *.EXE' and u need DIR *.EXE from that STRING.

    If these assumptions are true, then here is the code for you

    IF OBJECT_ID('TEMPDB..#STRING_HOLDER') IS NOT NULL

    DROP TABLE #STRING_HOLDER

    CREATE TABLE #STRING_HOLDER

    (

    STRING_COLUMN VARCHAR(128)

    )

    INSERT INTO #STRING_HOLDER (STRING_COLUMN)

    SELECT '@1:string_to_find&2:12312312321312@3:3123123abc123'

    UNION ALL

    SELECT '@1:23432423@2:48089@3:abc123@4:string_to_find@5:23434@6:343'

    UNION ALL

    SELECT '@1:23432423@2:48089@3:abc123@4:@5:23434@6:343string_to_find'

    SELECT

    STRING_COLUMN

    FROM

    #STRING_HOLDER

    SELECT REVERSE(LEFT( REVERSE(STRING_COLUMN),

    (CHARINDEX(REVERSE('string_to_find'),REVERSE(STRING_COLUMN))-1))

    )

    FROM

    #STRING_HOLDER

    Please inform us if this code works for you..

    Cheers,

    C'est Pras!!

  • To be honest, I haven't checked the 2nd suggestion offered to see if it works. I did read the article about the best way to post code on the forum. Hopefully this will make it easier/better to come up with an exact solution.

    I'm importing IIS logs from an Exchange CAS into SQL and am interested in getting the data used for Exchange Active Sync. Most of the data is consistent in how it is logged, but I noticed one column (csuriquery) that lists the data in a different order. Here is the table I'm working with:

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#mytable','U') IS NOT NULL DROP TABLE #mytable

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[mytable](

    [date] [nvarchar](50) NULL,

    [time] [nvarchar](50) NULL,

    [csuriquery] [nvarchar](2000) NULL

    ) ON [PRIMARY]

    Here is some data to populate the table:

    INSERT INTO mytable

    (date, time, csuriquery)

    SELECT '2010-04-08','00:00:02','Cmd=Ping&User=User1_ID&DeviceId=PALM24854ace5ac4e080c918b7213ce4&DeviceType=Palm&Log=V121_LdapC1_LdapL0_RpcC29_RpcL36_Hb1740_Rto1_S1_' UNION ALL

    SELECT '2010-04-08','00:00:03','User=User2_ID&DeviceId=Appl84930CUZ3NP&DeviceType=iPhone&Cmd=Ping&Log=V121_Sst20_LdapC0_LdapL0_RpcC57_RpcL63_Hb975_Rto1_Erq1_Pk4026417530_S1_'

    Note how the same data is written in the csuriquery column, yet in different orders. The data I want to extract from the csuriquery column is the Cmd, DeviceId, and DeviceType (possibly the Log too). In the second record, I was able to select the DeviceId with this:

    SELECT SUBSTRING(SUBSTRING(csuriquery,CHARINDEX('&DeviceId',csuriquery,0),CHARINDEX('&DeviceType=',csuriquery) - CHARINDEX('&DeviceId=',csuriquery)),11,60) AS MyDeviceID

    FROM mytable

    This command fails for the first record since the CHARINDEX value returned is invalid. I was hoping to use an IF statement to conditionally execute separate SELECT statements based on the location of Cmd string.

    Hopefully I've given enough information to give a better idea what I'm looking for. In the meantime, I'll take a closer look at the example that was provided. Thanks!

  • Just tried the suggestion using the REVERSE commands and it returned too much data. I need it to stop when it gets to the next delimiter (which were @1: through @6:) I guess I didn't make that clear enough.

    My other posting from earlier today should have enough information to figure it out. I'm sure it's possible.

  • I am getting closer to what u need of that string.. but still bit unclear though.. u have around six parameters in your string that need to parsed and stripped into 6 different variables.. is that right??

    If that is right, then are there pre-defined patterns to match those 6 parameters?

    Again, if that is right, then we can not dump all 6 "stripped" values into local variables, but we can make use of some temp tables and put them in..

    Clarify on these doubts, then i will take up n your requirement buddy 🙂

    Cheers!!

  • Yes. The parameters to parse are:

    Cmd=

    User=

    DeviceId=

    DeviceType=

    Log=

    Ampersands are placed before each, except for the first one. Here is a condensed example:

    Cmd=Ping,&User=Me&DeviceId=MyDeviceID&DeviceType=MyDeviceType&Log=MyLogEntry

    Another from a different device may look like this:

    User=Me&DeviceId=MyDeviceID_with_more_characters&DeviceType=MyDeviceType&Cmd=Ping,&Log=MyLogEntry

    For some unknown reason, they are ordered differently from different devices. Also, the length of the strings vary.

    I really appreciate the help!

  • Bill,

    Thanks for the table DDL/DML; however you still haven't specified what the desired output should look like, so I'm making a WAG here. Let us know if this works for you. If not, please show us, based on the sample data supplied, what the output should look like.

    -- first, create a function to split the string apart.

    CREATE function [dbo].[DelimitedSplit] (

    @list varchar(max),

    @Delimiter char(1)

    )

    RETURNS TABLE

    AS

    RETURN

    -- first, need to break down into separate items.

    -- See Jeff Modem's article The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

    -- at http://www.sqlservercentral.com/articles/T-SQL/62867/ for how a tally table can split strings apart.

    WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ),

    Thousands (N) AS (SELECT 1 FROM Tens t1 CROSS JOIN Tens t2 CROSS JOIN Tens t3),

    Millions (N) AS (SELECT 1 FROM Thousands t1 CROSS JOIN Thousands t2),

    Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM Millions),

    ItemSplit (ItemOrder, Item) AS (

    SELECT N,

    RTRIM(LTRIM(SUBSTRING(@Delimiter + @list + @Delimiter,N+1,

    CHARINDEX(@Delimiter,@Delimiter + @list + @Delimiter,N+1)-N-1)))

    FROM Tally

    WHERE N < LEN(@Delimiter + @list + @Delimiter)

    AND SUBSTRING(@Delimiter + @list + @Delimiter,N,1) = @Delimiter

    )

    SELECT ItemID = ROW_NUMBER() OVER (ORDER BY ItemOrder),

    Item

    FROM ItemSplit

    GO

    -- make a temporary table, and populate it with the sample data

    if object_id('tempdb..#mytable') IS NOT NULL DROP TABLE #mytable

    CREATE TABLE #mytable(

    [date] [nvarchar](50) NULL,

    [time] [nvarchar](50) NULL,

    [csuriquery] [nvarchar](2000) NULL

    )

    declare @Delimiter char(1)

    set @Delimiter = '&'

    INSERT INTO #mytable (date, time, csuriquery)

    SELECT '2010-04-08','00:00:02','Cmd=Ping&User=User1_ID&DeviceId=PALM24854ace5ac4e080c918b7213ce4&DeviceType=Palm&Log=V121_LdapC1_LdapL0_RpcC29_RpcL36_Hb1740_Rto1_S1_' UNION ALL

    SELECT '2010-04-08','00:00:03','User=User2_ID&DeviceId=Appl84930CUZ3NP&DeviceType=iPhone&Cmd=Ping&Log=V121_Sst20_LdapC0_LdapL0_RpcC57_RpcL63_Hb975_Rto1_Erq1_Pk4026417530_S1_'

    select t2.*

    from #mytable t1

    CROSS APPLY DelimitedSplit(csuriquery, @Delimiter) t2

    ORDER BY t1.date, t1.time, t2.ItemID

    GO

    DROP FUNCTION DelimitedSplit

    GO

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Heh... ya did it to me again, Wayne. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/13/2010)


    Heh... ya did it to me again, Wayne. 😉

    Hey... ya snooze, ya looze. 😉 😀

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • {edit} Machine dependent code removed... sorry folks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/13/2010)


    ... With that in mind, I have a present for you...

    Awesome! thanks - it's been added to my code snippets templates.

    And as is typical for code coming from Jeff, there are more remarks than actual code.

    Edit: posted too soon, wasn't finished.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Heh... actually, I've added more comments in the form of a usage example and some extra credits... My intent is to do a long overdue update on the Tally table article and I've gotta give credit where credit is due.

    Thanks for the feedback, Wayne.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is looking very promising. Thank you.

    As for the output, I'd prefer that it not include the name of the delimiter, only the value. i.e. "Ping" instead of "Cmd=Ping" I'm looking at the code now trying to figure out how to strip that off. Hopefully it's not too difficult.

    I hope to test it across a larger data set and a full-sized table with all 16 columns. The only issue I see is that the code returns 10 records with 2 columns. I believe I'll need it to return 2 records with 7 columns. Something like this:

    date time Cmd User DeviceId DeviceType Log

    2010-04-08 00:00:02 Ping User1_ID PALM24854ace5ac4e080c918b7213ce4 Palm V121_LdapC1_LdapL0_RpcC29_RpcL36_Hb1740_Rto1_S1_

    2010-04-08 00:00:03 Ping User2_ID Appl84930CUZ3NP iPhone V121_Sst20_LdapC0_LdapL0_RpcC57_RpcL63_Hb975_Rto1_Erq1_Pk4026417530_S1_

    Is this doable? Thanks again!

  • bill.brazell (4/14/2010)


    This is looking very promising. Thank you.

    As for the output, I'd prefer that it not include the name of the delimiter, only the value. i.e. "Ping" instead of "Cmd=Ping" Is this doable? Thanks again!

    Yes it is. And this is a perfect reason why you should have included the expected results in your original request... it would have been done right the first time, instead of having to come back to it. Please read the first link in my signature for how to post data for faster, tested results that do what you want, and just be sure to include how you want the results.

    Anyway, change the last select to this:

    SELECT t1.date, t1.time,

    [Cmd] = MAX(CASE WHEN LEFT(Item,4) = 'Cmd=' THEN SUBSTRING(Item, 5, 50) ELSE NULL END),

    = MAX(CASE WHEN LEFT(Item,5) = 'User=' THEN SUBSTRING(Item, 6, 50) ElSE NULL END),

    [DeviceID] = MAX(CASE WHEN LEFT(Item,9) = 'DeviceID=' THEN SUBSTRING(Item, 10, 50) ELSE NULL END),

    [DeviceType] = MAX(CASE WHEN LEFT(Item, 11) = 'DeviceType=' THEN SUBSTRING(Item, 12, 50) ELSE NULL END),

    [LOG] = MAX(CASE WHEN LEFT(Item,4) = 'Log=' THEN SUBSTRING(Item, 5, 50) ELSE NULL END)

    FROM #mytable t1

    CROSS APPLY DelimitedSplit(csuriquery, @Delimiter) t2

    GROUP BY t1.date, t1.time

    ORDER BY t1.date, t1.time

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply