Doing a count on characters within a string

  • Hello,

    say you have a string of codes. E.g. XX/\/\/\/\#### all within one field

    Then I want a column called X with a count of the number of X's in that string etc so you would have

    X  /  \  #  i

    2  4  4  4  0

    i doesnt exist in this field so it would come back with 0. Is there any way of doing this kind of query?

    Thanks

    Debbie

     

  • You could do it like this;

    create table #x ( weirdcodes varchar(20) not null )

    go

    insert #x select 'XX/\/\/\/\####'

    go

    select weirdcodes,

     (datalength(weirdcodes) - datalength(replace(weirdcodes, 'X', ''))) as 'X',

     (datalength(weirdcodes) - datalength(replace(weirdcodes, '/', ''))) as '/',

     (datalength(weirdcodes) - datalength(replace(weirdcodes, '\', ''))) as '\',

     (datalength(weirdcodes) - datalength(replace(weirdcodes, '#', ''))) as '#',

     (datalength(weirdcodes) - datalength(replace(weirdcodes, 'i', ''))) as 'i'

    from #x

    go

    weirdcodes           X           /           \           #           i          

    -------------------- ----------- ----------- ----------- ----------- -----------

    XX/\/\/\/\####       2           4           4           4           0

    (1 row(s) affected)

    /Kenneth

  • Hi,

    Thanks for that. I dont really want to go the temporary table route because I have got throusands of records that I need to do this too and then I would have to link the resulting columns back to the original columns.

    Is there any way of doing this without the table. Im going to see if I can use the

    select  

     (datalength(weirdcodes) - datalength(replace(weirdcodes, 'X', ''))) as 'X',

     (datalength(weirdcodes) - datalength(replace(weirdcodes, '/', ''))) as '/',

     (datalength(weirdcodes) - datalength(replace(weirdcodes, '\', ''))) as '\',

     (datalength(weirdcodes) - datalength(replace(weirdcodes, '#', ''))) as '#',

     (datalength(weirdcodes) - datalength(replace(weirdcodes, 'i', ''))) as 'i'

    from #x

    go

    against the table itsself but Im still not sure what Im doing

    Debbie

  • brilliant,

    It works, I used the following within my query...

    ,

    (datalength(wk.SESS_PER_WEEK) - datalength(replace(wk.SESS_PER_WEEK, 'X', ''))) as 'X',

     (datalength(wk.SESS_PER_WEEK) - datalength(replace(wk.SESS_PER_WEEK, '/', ''))) as '/',

     (datalength(wk.SESS_PER_WEEK) - datalength(replace(wk.SESS_PER_WEEK, '\', ''))) as '\',

     (datalength(wk.SESS_PER_WEEK) - datalength(replace(wk.SESS_PER_WEEK, '#', ''))) as '#',

     (datalength(wk.SESS_PER_WEEK) - datalength(replace(wk.SESS_PER_WEEK, 'i', ''))) as 'i'

    Thanks for that!

    Debbie

     

  • Yes, that was the point

    The temptable was just for illustration.

    You were supposed to substitute it with your actual table.

    /Kenneth

  • Datalength was a great way of solving this, however I have a scenario where this won't work.  I had to query Exchange, and find the # of email addresses that are copied on emails.  These are seperated with a ';'  The problem is that sometimes a space follows the semicolon, and sometimes it doesn't.  When the space follows the semicolon, it's counted using the solution above.  So I wrote the function below, which essentially does the same thing:

    --takes a string, a character

    --returns the # of times the character appears in the string

    create function [dbo].[fnCountCharInString] (@string varchar(500), @char varchar(1))

    returns int

    as

    begin

     declare @stringpos int

     declare @charcount int

     declare @ascii int

     set @stringpos = 1

     set @charcount = 0

     set @ascii = ascii(@char)

     while @stringpos <= Len(@string)

      begin

       if ascii(substring(@string,@stringpos,1)) = @ascii

        begin

         set @charcount = @charcount + 1

        end

       set @stringpos = @stringpos + 1

     

      end

      

    return @charcount

    end

  • I believe what you're looking for is a kind of "histogram".  The other solutions are great but if there are any unexpected characters, the cross-tabs will miss them and, if I can, I avoid the RBAR associated with a function.

    The first thing you need to do is make an nice little multi-use numbers table.  It consists of nothing more than a well indexed column of sequential numbers and has a great number of uses.  I call mine a "Tally" table because I use it to count and it sounds cooler than "Numbers".  Here's how to make one (recommend you make it a permanent part of your programming arsenal)...

    --===== Create and populate the Tally table on the fly

     SELECT TOP 11000 --equates to more than 30 years of dates

            IDENTITY(INT,1,1) AS N

       INTO dbo.Tally

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

      ALTER TABLE dbo.Tally

        ADD CONSTRAINT PK_Tally_N

            PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Allow the general public to use it

      GRANT SELECT ON dbo.Tally TO PUBLIC

    Heh... notice that I hate loops and cursors so much that I even avoided them to make the Tally table

    Ok... you say you have thousands of rows that you need to do a histogram for... let's make thousands of rows of test data... note that this is NOT part of the solution... this table is just to give everyone some test data so they can test my solution, if they want.

    --===== Create a test table with a bunch of "wierd" codes in the SomeString column

         -- Takes about 10 seconds to execute.

     SELECT TOP 100000 --<<Create 100 THOUSAND rows of data

            SomePK     = IDENTITY(INT,1,1),

            SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+65))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+65))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*3+91))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+65))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+65))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+65))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*3+91))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+97))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+97))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+97))

                       + SPACE(5) --Adds some trailing spaces just to show they aren't counted

       INTO dbo.JBMTest

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

      ALTER TABLE dbo.JBMTest

            ADD PRIMARY KEY CLUSTERED (SomePK)

    Like I said, the code above is just for creating a test table and is NOT part of the solution.

    We're all set... the code to make a "Normalized Histogram" becomes super easy after that... and, it's pretty fast... does 100,000 rows of data (the whole table as a "set" all at once) in about 21 seconds and doesn't count trailing spaces.  Here's the code... might be worth storing the results in a temp table so you don't need to keep recalculating...

    --===== Create the NORMALIZED "histogram" using a join to the Tally table

         -- Takes about 21 seconds

     SELECT SomePK,

            Character  = SUBSTRING(source.SomeString,t.N,1),

            [Count]    = COUNT(*)

       FROM dbo.JBMTest source,

            dbo.Tally t

      WHERE t.N <= LEN(source.SomeString)

      GROUP BY source.SomePK,SUBSTRING(source.SomeString,t.N,1)

      ORDER BY source.SomePK,Character

    Neat, huh?  It works by using a highly constrained triangular join on the Tally table to find each character as a single character substring... like I said, it's fast, too. 

    Obviously, you would change the table name from "JBMTest" to the actual name of your table and you'd have to change the column names to match your table, as well.

    Please feel free to post back if you have any questions on this.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for that,

    I can think of a few other uses for this one so I will definitely give this a go!

    Thanks again

     

  • quote...notice that I hate loops and cursors so much that I even avoided them...

    But still using non ANSI-92 joins eh

     

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Heh... just seeing if you're awake, David

    Ah... old habits are hard to break

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply