Doing a count on characters within a string

Question

Post reply

Doing a count on characters within a string

Debbie Edwards

SSCrazy Eights

Points: 9217
More actions
April 3, 2007 at 4:18 am

#177138

Hello,
say you have a string of codes. E.g. XX/\/\/\/\#### all within one field
Then I want a column called X with a count of the number of X's in that string etc so you would have
X / \ # i
2 4 4 4 0
i doesnt exist in this field so it would come back with 0. Is there any way of doing this kind of query?
Thanks
Debbie

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 1

You could do it like this;

create table #x ( weirdcodes varchar(20) not null )

go

insert #x select 'XX/\/\/\/\####'

go

select weirdcodes,

(datalength(weirdcodes) - datalength(replace(weirdcodes, 'X', ''))) as 'X',

(datalength(weirdcodes) - datalength(replace(weirdcodes, '/', ''))) as '/',

(datalength(weirdcodes) - datalength(replace(weirdcodes, '\', ''))) as '\',

(datalength(weirdcodes) - datalength(replace(weirdcodes, '#', ''))) as '#',

(datalength(weirdcodes) - datalength(replace(weirdcodes, 'i', ''))) as 'i'

from #x

go

weirdcodes X / \ # i

-------------------- ----------- ----------- ----------- ----------- -----------

XX/\/\/\/\#### 2 4 4 4 0

(1 row(s) affected)

/Kenneth

Debbie Edwards SSCrazy Eights Points: 9217 More actions · Answer 2

Hi,

Thanks for that. I dont really want to go the temporary table route because I have got throusands of records that I need to do this too and then I would have to link the resulting columns back to the original columns.

Is there any way of doing this without the table. Im going to see if I can use the

select

(datalength(weirdcodes) - datalength(replace(weirdcodes, 'X', ''))) as 'X',

(datalength(weirdcodes) - datalength(replace(weirdcodes, '/', ''))) as '/',

(datalength(weirdcodes) - datalength(replace(weirdcodes, '\', ''))) as '\',

(datalength(weirdcodes) - datalength(replace(weirdcodes, '#', ''))) as '#',

(datalength(weirdcodes) - datalength(replace(weirdcodes, 'i', ''))) as 'i'

from #x

go

against the table itsself but Im still not sure what Im doing

Debbie

Debbie Edwards SSCrazy Eights Points: 9217 More actions · Answer 3

brilliant,

It works, I used the following within my query...

,

(datalength(wk.SESS_PER_WEEK) - datalength(replace(wk.SESS_PER_WEEK, 'X', ''))) as 'X',

(datalength(wk.SESS_PER_WEEK) - datalength(replace(wk.SESS_PER_WEEK, '/', ''))) as '/',

(datalength(wk.SESS_PER_WEEK) - datalength(replace(wk.SESS_PER_WEEK, '\', ''))) as '\',

(datalength(wk.SESS_PER_WEEK) - datalength(replace(wk.SESS_PER_WEEK, '#', ''))) as '#',

(datalength(wk.SESS_PER_WEEK) - datalength(replace(wk.SESS_PER_WEEK, 'i', ''))) as 'i'

Thanks for that!

Debbie

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 4

Yes, that was the point

The temptable was just for illustration.

You were supposed to substitute it with your actual table.

/Kenneth

Christopher1 SSC Journeyman Points: 93 More actions · Answer 5

Datalength was a great way of solving this, however I have a scenario where this won't work. I had to query Exchange, and find the # of email addresses that are copied on emails. These are seperated with a ';' The problem is that sometimes a space follows the semicolon, and sometimes it doesn't. When the space follows the semicolon, it's counted using the solution above. So I wrote the function below, which essentially does the same thing:

--takes a string, a character

--returns the # of times the character appears in the string

create function [dbo].[fnCountCharInString] (@string varchar(500), @char varchar(1))

returns int

as

begin

declare @stringpos int

declare @charcount int

declare @ascii int

set @stringpos = 1

set @charcount = 0

set @ascii = ascii(@char)

while @stringpos <= Len(@string)

begin

if ascii(substring(@string,@stringpos,1)) = @ascii

begin

set @charcount = @charcount + 1

end

set @stringpos = @stringpos + 1

end

return @charcount

end

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 6

I believe what you're looking for is a kind of "histogram". The other solutions are great but if there are any unexpected characters, the cross-tabs will miss them and, if I can, I avoid the RBAR associated with a function.

The first thing you need to do is make an nice little multi-use numbers table. It consists of nothing more than a well indexed column of sequential numbers and has a great number of uses. I call mine a "Tally" table because I use it to count and it sounds cooler than "Numbers". Here's how to make one (recommend you make it a permanent part of your programming arsenal)...

--===== Create and populate the Tally table on the fly

SELECT TOP 11000 --equates to more than 30 years of dates

IDENTITY(INT,1,1) AS N

INTO dbo.Tally

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

--===== Add a Primary Key to maximize performance

ALTER TABLE dbo.Tally

ADD CONSTRAINT PK_Tally_N

PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

--===== Allow the general public to use it

GRANT SELECT ON dbo.Tally TO PUBLIC

Heh... notice that I hate loops and cursors so much that I even avoided them to make the Tally table

Ok... you say you have thousands of rows that you need to do a histogram for... let's make thousands of rows of test data... note that this is NOT part of the solution... this table is just to give everyone some test data so they can test my solution, if they want.

--===== Create a test table with a bunch of "wierd" codes in the SomeString column

-- Takes about 10 seconds to execute.

SELECT TOP 100000 --<<Create 100 THOUSAND rows of data

SomePK = IDENTITY(INT,1,1),

SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*3+91))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*3+91))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*26+97))

+ SPACE(5) --Adds some trailing spaces just to show they aren't counted

INTO dbo.JBMTest

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (SomePK)

Like I said, the code above is just for creating a test table and is NOT part of the solution.

We're all set... the code to make a "Normalized Histogram" becomes super easy after that... and, it's pretty fast... does 100,000 rows of data (the whole table as a "set" all at once) in about 21 seconds and doesn't count trailing spaces. Here's the code... might be worth storing the results in a temp table so you don't need to keep recalculating...

--===== Create the NORMALIZED "histogram" using a join to the Tally table

-- Takes about 21 seconds

SELECT SomePK,

Character = SUBSTRING(source.SomeString,t.N,1),

[Count] = COUNT(*)

FROM dbo.JBMTest source,

dbo.Tally t

WHERE t.N <= LEN(source.SomeString)

GROUP BY source.SomePK,SUBSTRING(source.SomeString,t.N,1)

ORDER BY source.SomePK,Character

Neat, huh? It works by using a highly constrained triangular join on the Tally table to find each character as a single character substring... like I said, it's fast, too.

Obviously, you would change the table name from "JBMTest" to the actual name of your table and you'd have to change the column names to match your table, as well.

Please feel free to post back if you have any questions on this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Debbie Edwards SSCrazy Eights Points: 9217 More actions · Answer 7

Thanks for that,

I can think of a few other uses for this one so I will definitely give this a go!

Thanks again

David Burrows SSC Guru Points: 65142 More actions · Answer 8

...notice that I hate loops and cursors so much that I even avoided them...

But still using non ANSI-92 joins eh

Far away is close at hand in the images of elsewhere.
Anon.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 9

Heh... just seeing if you're awake, David

Ah... old habits are hard to break

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)