Removal of vulgarities

  • Does anybody have a stored procedure, an algorithm or know of any articles of how one would remove rows based on a table of stored words?  So if there is a bad word or phrase in the table then the row is eliminated or marked as vulgar.  The project is to remove vulgarities from a list of Email addresses.

     

    Thanks for the info,

    Steve

  • Is there one specific column in the table that would potentially have the vulgar word, or is it spread out among multiple columns?  It seems like a pretty straight-forward task either way - but the solution is going to be different.

    You could use a while loop to scan through the rows and check for any that match the condition is LIKE '%' + TableWithVulgarWords.VulgarWordColumn + '%'.  When the condition is met, you could move the record(s) over to the quarantine area for further review.

    Cris

  • Just one column and your suggestion was what I was thinking of doing.  I just didn't want reinvent the wheel if there is some useful advise.

     

    Thanks

    Steve

  • CREATE TABLE BadWords(Word varchar(4) PRIMARY KEY)

    INSERT BadWords

    SELECT 'bad'

    UNION ALL SELECT 'word'

    UNION ALL SELECT '#$!*'

    CREATE TABLE Contacts(Id smallint IDENTITY PRIMARY KEY, Email varchar(80))

    INSERT Contacts(Email)

    SELECT 'sbad@opus.com'

    UNION ALL SELECT 'good@egad.com'

    UNION ALL SELECT '#$!*@cupro.net'

    UNION ALL SELECT 'hiram@jpvh.net'

    DELETE c

    FROM Contacts c CROSS JOIN BadWords w

    WHERE c.Email LIKE '%' + w.Word + '%'



    --Jonathan

  • Wow, so simple.......!!!!!!!!

  • My solution so far is to create a table like this

    CREATE TABLE [T_BAD_WORD] (

     [BAD_ID] [int] NOT NULL ,

     [BAD_WORD] [varchar] (255) NOT NULL

    ) ON [PRIMARY]

    GO

    Then Create a series of select and UNION them like this

    Select 'SELECT [pers_id], , [email_invalid] FROM <my target table> Where like ''%' + [BAD_WORD] + '%'' UNION' FROM [dbo].[T_BAD_WORD]

     

    Run that big UNION and get the list.  Then they can be made invalid.

    Now, the next question I'm being asked, "Is there a standard vulgarities list?".  I am using the list we use to screen web sites and spam. 

    What a way to start the morning!

  • I don't have nor know where to get that list, but I think that if you convert the cross join from a delete to a select, that you will get your results much faster.

  • We do something very similar for the comments for our surveys. We do a cross join with our table of bad words. Also we have a field called ReplaceWith where we put in the characters or notation that we want to replace the bad word with. (i.e. Smack get replaced with [inappropriate] and other words will get replaced with [profanity], [insult], [sexual/hot comment] or whatever else the client wants us to replace it with.

    To the best of my knowledge, there isn't a standard profanity list. We have built ours over time. I can send you our list if you are interested. But you probably already have a good start with the Anti-Spam list.

    We are in the process of modifying our so that we can check for profane words in other languages too.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply