Convert searched String to number

  • Hi

    I have a database that stores information for all printers connected to our Corporate Network.

    I need to extract the data that tells me how many copies have been printed and on what printer. The information is in a column called EXT_Tags.

    an example of an entry in this column is:

    %@01@60~01~%%@02@Microsoft Word - ashbytender.doc~02~%%@03@rtebbatt~03~%%@04@KONICA700E~04~%%@05@KONICA700E~05~%%@06@55149~06~%%@07@13~07~%

    This long string contains a lot of information about the job that has been printed. The part of it that states how many pages were printed is at the end. In the above example it is 13(highlighted in bold), and the name of the printer (also highlighted in bold) is KONICA700E in the above example.

    I need to display how many copies each printer has printed. How do I extract this information from EXT_Tags column. Can I perform other queries on the data such as add them all up, display the total copies etc.

    Is there a way of doing this in the sql query?

    It looks like the string contains 7 pieces of information that are numbered from @1 to @7

    Name of printer is @04 and the copy count is @7

    Is there anyone who can help me construct a sql query to get this information.

  • You've correctly identified the problem which is 90% of the solution... Take a peek at SUBSTRING and CHARINDEX in Books Online to get the rest. Heh... I'd show you but that would take all your fun away. 😉 It's simple... give it a try.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff

    I have been doing some reading up on Substring and charindex as you suggested.

    I came up with this query and it works great.

    DECLARE

    @CopyStart varchar(10),

    @CopyEnd varchar(10),

    @PrinterStart varchar(10),

    @PrinterEnd varchar(10)

    SET @CopyStart = '@07'

    SET @CopyEnd = '~07~'

    SET @PrinterStart = '@04'

    SET @PrinterEnd = '~04~'

    SELECT Username, Event_date, SUBSTRING(EXT_TAGS, NULLIF(charindex(@PrinterStart, EXT_TAGS),0) + len(@PrinterStart) +1, NULLIF(charindex(@PrinterEnd, EXT_Tags), 0) - (NULLIF(charindex(@PrinterStart, EXT_TAGS), 0) + len(@PrinterStart) + 1)) AS Printer,

    convert(int,(SUBSTRING(EXT_TAGS, NULLIF(charindex(@CopyStart, EXT_TAGS),0) + len(@CopyStart) +1, NULLIF(charindex(@CopyEnd, EXT_Tags), 0) - (NULLIF(charindex(@CopyStart, EXT_TAGS), 0) + len(@CopyStart) + 1)))) AS Copies

    FROM system_events

    It gives me the four columns I need, Username, Event_date, and the two new generated columns called Printer and Copies.

    I then wanted to get the total number of copies for each user so I changed the query to this:

    DECLARE

    @CopyStart varchar(10),

    @CopyEnd varchar(10),

    @PrinterStart varchar(10),

    @PrinterEnd varchar(10)

    SET @CopyStart = '@07'

    SET @CopyEnd = '~07~'

    SET @PrinterStart = '@04'

    SET @PrinterEnd = '~04~'

    SELECT Username,

    SUM(convert(int,(SUBSTRING(EXT_TAGS, NULLIF(charindex(@CopyStart, EXT_TAGS),0) + len(@CopyStart) +1, NULLIF(charindex(@CopyEnd, EXT_Tags), 0) - (NULLIF(charindex(@CopyStart, EXT_TAGS), 0) + len(@CopyStart) + 1))))) AS 'Total Copies'

    FROM system_events

    GROUP BY Username

    This worked great also. It gave me two columns, Username and total copies

    The last query I wanted was total number of copies per printer.

    This is where I am a bit lost. I tried this:

    DECLARE

    @CopyStart varchar(10),

    @CopyEnd varchar(10),

    @PrinterStart varchar(10),

    @PrinterEnd varchar(10)

    SET @CopyStart = '@07'

    SET @CopyEnd = '~07~'

    SET @PrinterStart = '@04'

    SET @PrinterEnd = '~04~'

    SELECT SUBSTRING(EXT_TAGS, NULLIF(charindex(@PrinterStart, EXT_TAGS),0) + len(@PrinterStart) +1, NULLIF(charindex(@PrinterEnd, EXT_Tags), 0) - (NULLIF(charindex(@PrinterStart, EXT_TAGS), 0) + len(@PrinterStart) + 1)) AS Printer,

    SUM(convert(int,(SUBSTRING(EXT_TAGS, NULLIF(charindex(@CopyStart, EXT_TAGS),0) + len(@CopyStart) +1, NULLIF(charindex(@CopyEnd, EXT_Tags), 0) - (NULLIF(charindex(@CopyStart, EXT_TAGS), 0) + len(@CopyStart) + 1))))) AS 'Total Copies'

    FROM system_events

    GROUP BY Printer

    But this gives me the error 'Invalid column name 'Printer''

    Please can you help me out.

    Is it possible to Group By an Alias column name?

    Andy

  • Please can you help me out.

    Is it possible to Group By an Alias column name?

    Simple answer: Yes and no. 😉

    Yes, if you would wrap your query into a subquery or CTE like

    ;WITH cte AS

    (SELECT SUBSTRING(EXT_TAGS, NULLIF(charindex(@PrinterStart, EXT_TAGS),0) + len(@PrinterStart) +1, NULLIF(charindex(@PrinterEnd, EXT_Tags), 0) - (NULLIF(charindex(@PrinterStart, EXT_TAGS), 0) + len(@PrinterStart) + 1)) AS Printer,

    SUM(convert(int,(SUBSTRING(EXT_TAGS, NULLIF(charindex(@CopyStart, EXT_TAGS),0) + len(@CopyStart) +1, NULLIF(charindex(@CopyEnd, EXT_Tags), 0) - (NULLIF(charindex(@CopyStart, EXT_TAGS), 0) + len(@CopyStart) + 1))))) AS 'Total Copies'

    FROM system_events )

    SELECT Printer, [Total Copies]

    FROM cte

    GROUP BY printer

    Otherwise: no.

    You wold have to write

    SELECT SUBSTRING(EXT_TAGS, NULLIF(charindex(@PrinterStart, EXT_TAGS),0) + len(@PrinterStart) +1, NULLIF(charindex(@PrinterEnd, EXT_Tags), 0) - (NULLIF(charindex(@PrinterStart, EXT_TAGS), 0) + len(@PrinterStart) + 1)) AS Printer,

    SUM(convert(int,(SUBSTRING(EXT_TAGS, NULLIF(charindex(@CopyStart, EXT_TAGS),0) + len(@CopyStart) +1, NULLIF(charindex(@CopyEnd, EXT_Tags), 0) - (NULLIF(charindex(@CopyStart, EXT_TAGS), 0) + len(@CopyStart) + 1))))) AS 'Total Copies'

    FROM system_events

    GROUP BY

    SUBSTRING(EXT_TAGS, NULLIF(charindex(@PrinterStart, EXT_TAGS),0) + len(@PrinterStart) +1, NULLIF(charindex(@PrinterEnd, EXT_Tags), 0) - (NULLIF(charindex(@PrinterStart, EXT_TAGS), 0) + len(@PrinterStart) + 1))



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thanks Lutz

    I used your second suggestion and it worked a treat.

    Thanks once again for the expert advice on this Forum

    Andy

  • Actually, thank YOU! It's really nice to see someone take a simple hint, do a bit of research, and then actually post a viable solution that's actually readable. Very well done, Andy... my hat is off to you. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Like I said... very cool that you busted a hump helping yourself, Andy.

    Here's a slightly (radically, actually) different approach that uses "Divide'n'Conquer" methods... the details are in the comments. These methods not only make things a bit easier to troubleshoot should the need arise, they also make the code a bit faster than traditional "complicated formula" methods. For those using SQL Server 2000, you can do the same thing using "derived tables" instead of CTE's...

    --=====================================================================================================================

    -- Create some test data. Note that this is NOT a part of the solution. It's just for demonstration.

    --=====================================================================================================================

    --===== Contiditionally drop the test table so we can easily rerun the code under test.

    IF OBJECT_ID('TempDB..#System_Events','U') IS NOT NULL

    DROP TABLE #System_Events

    ;

    GO

    --===== Use the "Pseudo Cursors" of a couple of Cross-Joins to generate lot's of test data

    SELECT TOP 1000

    '%@01@60~01~%%@02@Microsoft Word - ashbytender.doc~02~%%@03@rtebbatt~03~%%@04@KONICA700E~04~%%@05@KONICA700E~05~%%@06@55149~06~%%@07@13~07~%' AS Ext_Tags

    INTO #System_Events

    FROM Master.sys.All_Columns ac1

    CROSS JOIN Master.sys.All_Columns ac2

    UNION ALL

    SELECT TOP 10000

    '%@01@60~01~%%@02@Microsoft Word - ashbytender.doc~02~%%@03@rtebbatt~03~%%@04@HPLJ5~04~%%@05@KONICA700E~05~%%@06@55149~06~%%@07@10~07~%' AS Ext_Tags

    FROM Master.sys.All_Columns ac1

    CROSS JOIN Master.sys.All_Columns ac2

    ;

    --=====================================================================================================================

    -- Solve the problem...

    --=====================================================================================================================

    --===== Declare and preset delimiter variables

    DECLARE @CopyStart VARCHAR(10),

    @CopyEnd VARCHAR(10),

    @CopyStartLen INT,

    @PrinterStart VARCHAR(10),

    @PrinterEnd VARCHAR(10),

    @PrinterStartLen INT

    ;

    SELECT @CopyStart = '@07@',

    @CopyEnd = '~07~',

    @CopyStartLen = LEN(@CopyStart),

    @PrinterStart = '@04@',

    @PrinterEnd = '~04~',

    @PrinterStartLen = LEN(@PrinterStart)

    ;

    --===== Split the data out according to the delimiters, do any necessary datatype conversions,

    -- and produce a summary report for the total number of copies per printer.

    -- Notice that we use "Divide'n'Conquer" methods to first find the start of each string

    -- (and everything that follows) and then we chop off the rest of the string in the correct

    -- spot in a separate "step"... it makes troubleshooting real easy and it has another side

    -- benefit... it uses less CPU time and also has a shorter run duration than the "complicated

    -- formula" versions.

    WITH

    cteSplit1 AS

    ( --=== This finds the beginning of each desired string (and contains everything AFTER that).

    -- If a start tag isn't found, the derived column is assigned the NULL value.

    SELECT SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@PrinterStart,Ext_Tags),0) + @PrinterStartLen,8000) AS Printer,

    SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@CopyStart,Ext_Tags) ,0) + @CopyStartLen ,8000) AS Copies

    FROM #System_Events

    )

    ,

    cteRTrim AS

    ( --=== This finds the end of each string (and still returns NULL if it started that way)

    SELECT SUBSTRING(Printer,1,CHARINDEX(@PrinterEnd,Printer)-1) AS Printer,

    CAST(SUBSTRING(Copies,1,CHARINDEX(@CopyEnd,Copies)-1) AS INT) AS Copies

    FROM cteSplit1

    )

    --===== All set... now a simple SUM will do what we need.

    SELECT Printer, SUM(Copies) AS TotalCopies

    FROM cteRTrim

    GROUP BY Printer

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That makes a lot of sense Jeff thankyou.

    The database holds thousands and thousands of records so I will definately benefit from the speed aspect.

    Just one question.

    you have set the third expression of the substring to 8000.

    SELECT SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@PrinterStart,Ext_Tags),0) + @PrinterStartLen,8000) AS Printer,

    SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@CopyStart,Ext_Tags) ,0) + @CopyStartLen ,8000) AS Copies

    Is this just to compensate for really long strings?

  • andy.woodward (11/23/2009)


    That makes a lot of sense Jeff thankyou.

    The database holds thousands and thousands of records so I will definately benefit from the speed aspect.

    Just one question.

    you have set the third expression of the substring to 8000.

    SELECT SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@PrinterStart,Ext_Tags),0) + @PrinterStartLen,8000) AS Printer,

    SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@CopyStart,Ext_Tags) ,0) + @CopyStartLen ,8000) AS Copies

    Is this just to compensate for really long strings?

    Yes... if you don't know where the end may lie, then use the full width of the variables involved. I used 8k variables so I went to the end using 8k. If the max width of the column is only 500, you could use 500 instead of 8000 but I'm not sure that it'll buy you anything in area of speed because the end is the end. Haven't tested it though... might be worth a shot...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If I wanted to get this query to work on SQL Server 2000 what method would I need.

    I can't use views because you can't declare variables in views.

    I tried nested sub-queries such as this one.

    DECLARE @CopyStart VARCHAR(10),

    @CopyEnd VARCHAR(10),

    @CopyStartLen INT,

    @PrinterStart VARCHAR(10),

    @PrinterEnd VARCHAR(10),

    @PrinterStartLen INT

    ;

    SELECT @CopyStart = '@07@',

    @CopyEnd = '~07~',

    @CopyStartLen = LEN(@CopyStart),

    @PrinterStart = '@04@',

    @PrinterEnd = '~04~',

    @PrinterStartLen = LEN(@PrinterStart)

    SELECT SUBSTRING(Printer,1,CHARINDEX(@PrinterEnd,Printer)-1) AS Printer,

    CAST(SUBSTRING(Copies,1,CHARINDEX(@CopyEnd,Copies)-1) AS INT) AS Copies

    FROM

    (SELECT SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@PrinterStart,Ext_Tags),0) + @PrinterStartLen,8000) AS Printer,

    SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@CopyStart,Ext_Tags) ,0) + @CopyStartLen ,8000) AS Copies

    FROM System_Events) AS cteTrim

    This works ok but again I am struggling with group by. If I put SUM around the 'Copies' in the first select statement as below:

    SELECT SUBSTRING(Printer,1,CHARINDEX(@PrinterEnd,Printer)-1) AS Printer,

    SUM(CAST(SUBSTRING(Copies,1,CHARINDEX(@CopyEnd,Copies)-1) AS INT)) AS Copies

    FROM

    (SELECT SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@PrinterStart,Ext_Tags),0) + @PrinterStartLen,8000) AS Printer,

    SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@CopyStart,Ext_Tags) ,0) + @CopyStartLen ,8000) AS Copies

    FROM System_Events) AS cteTrim

    GROUP BY Printer

    I get the same results as I do without it.

    It looks like I just need to query the results of the first two Select statements. I tried this:

    DECLARE @CopyStart VARCHAR(10),

    @CopyEnd VARCHAR(10),

    @CopyStartLen INT,

    @PrinterStart VARCHAR(10),

    @PrinterEnd VARCHAR(10),

    @PrinterStartLen INT

    ;

    SELECT @CopyStart = '@07@',

    @CopyEnd = '~07~',

    @CopyStartLen = LEN(@CopyStart),

    @PrinterStart = '@04@',

    @PrinterEnd = '~04~',

    @PrinterStartLen = LEN(@PrinterStart)

    SELECT Printer, SUM(Copies) AS TotalCopies

    FROM

    (SELECT SUBSTRING(Printer,1,CHARINDEX(@PrinterEnd,Printer)-1) AS Printer,

    CAST(SUBSTRING(Copies,1,CHARINDEX(@CopyEnd,Copies)-1) AS INT) AS Copies

    FROM

    (SELECT SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@PrinterStart,Ext_Tags),0) + @PrinterStartLen,8000) AS Printer,

    SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@CopyStart,Ext_Tags) ,0) + @CopyStartLen ,8000) AS Copies

    FROM System_Events) AS cteTrim)

    GROUP BY Printer

    But it gives error:

    Msg 156, Level 15, State 1, Line 29

    Incorrect syntax near the keyword 'GROUP'.

    Am I a million miles away?

    The reason why I am asking is that we have test databases on sql 2000.

  • Its just clicked.

    I replaced GROUP BY Printer with

    GROUP BY SUBSTRING(Printer,1,CHARINDEX(@PrinterEnd,Printer)-1)

    Full query is:

    DECLARE @CopyStart VARCHAR(10),

    @CopyEnd VARCHAR(10),

    @CopyStartLen INT,

    @PrinterStart VARCHAR(10),

    @PrinterEnd VARCHAR(10),

    @PrinterStartLen INT

    ;

    SELECT @CopyStart = '@07@',

    @CopyEnd = '~07~',

    @CopyStartLen = LEN(@CopyStart),

    @PrinterStart = '@04@',

    @PrinterEnd = '~04~',

    @PrinterStartLen = LEN(@PrinterStart)

    SELECT SUBSTRING(Printer,1,CHARINDEX(@PrinterEnd,Printer)-1) AS Printer,

    SUM(CAST(SUBSTRING(Copies,1,CHARINDEX(@CopyEnd,Copies)-1) AS INT)) AS Copies

    FROM

    (SELECT SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@PrinterStart,Ext_Tags),0) + @PrinterStartLen,8000) AS Printer,

    SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@CopyStart,Ext_Tags) ,0) + @CopyStartLen ,8000) AS Copies

    FROM System_Events) AS cteTrim

    GROUP BY SUBSTRING(Printer,1,CHARINDEX(@PrinterEnd,Printer)-1)

    Don't you just love it when it actually works!

    Andy

  • Its just clicked.

    I replaced GROUP BY Printer with

    GROUP BY SUBSTRING(Printer,1,CHARINDEX(@PrinterEnd,Printer)-1)

    Full query is:

    DECLARE @CopyStart VARCHAR(10),

    @CopyEnd VARCHAR(10),

    @CopyStartLen INT,

    @PrinterStart VARCHAR(10),

    @PrinterEnd VARCHAR(10),

    @PrinterStartLen INT

    ;

    SELECT @CopyStart = '@07@',

    @CopyEnd = '~07~',

    @CopyStartLen = LEN(@CopyStart),

    @PrinterStart = '@04@',

    @PrinterEnd = '~04~',

    @PrinterStartLen = LEN(@PrinterStart)

    SELECT SUBSTRING(Printer,1,CHARINDEX(@PrinterEnd,Printer)-1) AS Printer,

    SUM(CAST(SUBSTRING(Copies,1,CHARINDEX(@CopyEnd,Copies)-1) AS INT)) AS Copies

    FROM

    (SELECT SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@PrinterStart,Ext_Tags),0) + @PrinterStartLen,8000) AS Printer,

    SUBSTRING(Ext_Tags,NULLIF(CHARINDEX(@CopyStart,Ext_Tags) ,0) + @CopyStartLen ,8000) AS Copies

    FROM System_Events) AS cteTrim

    GROUP BY SUBSTRING(Printer,1,CHARINDEX(@PrinterEnd,Printer)-1)

    Don't you just love it when it actually works!

    Andy

  • Heh... you're faster than I am..

    Yes, in 2k you just swing the CTE's down into the FROM clause as "derived tables" and you've basically got the same thing. Well done. You could have taken it a level deeper like I did with the CTE's to simplify the GROUP BY, but either way works.

    BTW... are you the "Andy Woodward" that I've seen so many posts from on "Planet Source Code?"

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply