December 28, 2010 at 5:51 pm
I need to query the table #URLTbl created by my code below
looking at the field "URL" to spot any occurrences of words
contained in the second table created here #Badwords.
I've succeeded by using the select statement
select * from dbo.#URLTbl
where CHARINDEX('Porn',URL) > 0 or CHARINDEX('naked',URL)
> 1
at finding the words hidden in string in the URL field but cannot figure out how to make this statement use the contents of the #badwords table to return rows containing matches from the #UFLTbl
Thanks for any thoughts!
--To create #URLTbl and #BadWords temp tables run this entire section
IF OBJECT_ID('TempDB..#URLTbl','U') IS NOT NULL DROP TABLE #URLTbl
CREATE TABLE #URLTbl
(MsgDate Varchar(1024),
Username Varchar (1024),
URL Varchar (1024),
Rating Varchar (1024))
--===== Setup any special required conditions especially where dates are concerned
SET DATEFORMAT YMD
INSERT INTO #URLTbl(MsgDate,Username,URL,Rating)
SELECT '2010-11-10','webuser\patr','http://view.atdmt.com/action/porn/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavaty3398n','http://content.murder.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\matth','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-10','webuser\patr','http://view.atdmt.com/action/kill/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mmhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hmlavaty3398n','http://content.porn.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mamtth','http://assets3b.msnbc.msn.com/fight/rendering/img/bing.png','news' union all
SELECT '2010-11-10','webuser\patmr','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavamty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-10','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-12','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-12','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-12','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-15','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-15','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-15','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-15','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-17','webuser\patr','http://view.atdmt.com/action/porn/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-17','webuser\hlavaty3398n','http://content.murder.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\matth','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-17','webuser\patr','http://view.atdmt.com/action/kill/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mmhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-21','webuser\hmlavaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-21','webuser\mamtth','http://assets3b.msnbc.msn.com/fight/rendering/img/bing.png','news' union all
SELECT '2010-11-21','webuser\patmr','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavamty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-21','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-21','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-10','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-10','webuser\patr','http://view.atdmt.com/action/porn/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hlavaty3398n','http://content.murder.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\matth','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-10','webuser\patr','http://view.atdmt.com/action/kill/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mmhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-10','webuser\hmlavaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mamtth','http://assets3b.msnbc.msn.com/fight/rendering/img/bing.png','news' union all
SELECT '2010-11-10','webuser\patmr','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-10','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-25','webuser\hlavamty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-25','webuser\mattmh','http://assets3bomb.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-25','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-25','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-25','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-25','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-25','webuser\patrm','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-25','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-25','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-29','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-29','webuser\patr','http://view.atdmt.com/action/porn/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-29','webuser\mhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-29','webuser\hlavaty3398n','http://content.murder.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-29','webuser\matth','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-29','webuser\patr','http://view.atdmt.com/action/kill/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-29','webuser\mmhoag','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-29','webuser\hmlavaty3398n','http://content.yieldmanagergun.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-29','webuser\mamtth','http://assets3b.msnbc.msn.com/fight/gun/rendering/img/bing.png','news' union all
SELECT '2010-11-07','webuser\patmr','http://view.atdmt.com/action/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-07','webuser\mhoamg','http://cdn.unicast.msn.com/porn/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-07','webuser\hlavamty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-07','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news'union all
SELECT '2010-11-07','webuser\patrm','http://view.atdmt.com/action/MSN_Homebombpage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-07','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C181757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-07','webuser\hlavmaty3398n','http://content.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-07','webuser\mattmh','http://assets3b.msnbc.msn.com/rendering/img/bing.png','news' union all
SELECT '2010-11-07','webuser\patrm','http://view.atdmt.com/action/nude/MSN_Homepage_Remessaging_111808/nc?a=1','web banners' UNION ALL
SELECT '2010-11-07','webuser\mhoamg','http://cdn.unicast.msn.com/assets/A1/N6/M8379/C18naked1757/large.jpg?0.5824313560398167','web search' UNION ALL
SELECT '2010-11-07','webuser\hlavmaty3398n','http://content.bomb.yieldmanager.com/ak/q.gif','internet services' UNION ALL
SELECT '2010-11-10','webuser\mattmh','http://assets3b.msnbc.msn.com/fight/rendering/img/bing.png','news'
--Create Badwords table
IF OBJECT_ID('TempDB..#badwords','U') IS NOT NULL DROP TABLE #badwords
CREATE TABLE #Badwords
(Words nchar(25))
--===== Setup any special required conditions especially where dates are concerned
SET DATEFORMAT YMD
--===== Insert the test data into the test table
INSERT INTO #badwords
(words)
SELECT 'porn ' UNION ALL
SELECT 'nude ' UNION ALL
SELECT 'naked ' UNION ALL
SELECT 'gun ' UNION ALL
SELECT 'bomb ' UNION ALL
SELECT 'kill ' UNION ALL
SELECT 'murder '
--===== Set the identity insert back to normal
-- SET IDENTITY_INSERT #badwords ON
select * from dbo.#badwords
select * from dbo.#URLTbl
December 29, 2010 at 3:32 pm
Okay, it took me a few minutes, but here's code that works with your sample:
SELECT
CHARINDEX(RTRIM(B.Words), U.url),
*
FROM
#urltbl U
CROSS JOIN #Badwords AS B
WHERE
CHARINDEX(RTRIM(B.Words), U.url) <> 0
ORDER BY
username,
url
Key points are that I'm using a CROSS JOIN so that each URL is checked against each bad word. This means that if a URL has more than 1 bad word it will be returned once for each bad word it contains. The other is that in your example you are using nchar(25) for the bad words which means that each entry is being padded out to 25 characters (50 bytes) AND you padded the bad words with spaces in the example, so I had to do an RTRIM to remove the padding in order to find matches in the URL since the bad words in the URL are not likely to be followed by spaces.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 31, 2010 at 1:05 pm
Wow... so simple when you know how!
Thanks! .. and.. happy new year!
January 3, 2011 at 2:59 pm
If I want to add a time parameter restriction criteria to this, being that the timedate fields are all Varchar types, how would I do that? For example... I wanted all records between 12/10/2010 and 1/1/2011 in the field 'MSGDATE' from he URL table ?
Jack Corbett (12/29/2010)
Okay, it took me a few minutes, but here's code that works with your sample:
SELECT
CHARINDEX(RTRIM(B.Words), U.url),
*
FROM
#urltbl U
CROSS JOIN #Badwords AS B
WHERE
CHARINDEX(RTRIM(B.Words), U.url) <> 0
ORDER BY
username,
url
Key points are that I'm using a CROSS JOIN so that each URL is checked against each bad word. This means that if a URL has more than 1 bad word it will be returned once for each bad word it contains. The other is that in your example you are using nchar(25) for the bad words which means that each entry is being padded out to 25 characters (50 bytes) AND you padded the bad words with spaces in the example, so I had to do an RTRIM to remove the padding in order to find matches in the URL since the bad words in the URL are not likely to be followed by spaces.
January 3, 2011 at 3:17 pm
Just add that to the Where Clause.
SELECT
CHARINDEX(RTRIM(B.Words), U.url),
*
FROM
#urltbl U
CROSS JOIN #Badwords AS B
WHERE
MSGDATE >= '12/10/2010' AND
/* using the next day because if you want all of 1/1 and there are any times
included you need to do less than the next day */
MSGDATE < '1/2/2011' AND
CHARINDEX(RTRIM(B.Words), U.url) <> 0
ORDER BY
username,
url
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
January 4, 2011 at 11:45 am
Yes.... I had tried that, but I was searching on a field containing data/time data, but the datatype it was set to was Varchar (30) .. which caused the search to bomb.
This brings up a datatype question... I'm searching here on field "URL" and field "time" and want to create an index (clustered) for faster searching, but I think I need to minimize the size of the URL field to enable more than one index if possible. What type and sized datatype would be best to contain URL data and allow efficient searching, while perhaps leaving enough space to create an index on the date field "TIME"? I'm thinking Varchar (1024) is bigger than it needs to be, but also wonder if "TEXT" would allow for any decent searching... ?
My goal is to be able to search on the time field (datatype "datetime") and the URL field (current datatype Varchar 1024).
Jack Corbett (1/3/2011)
Just add that to the Where Clause.
SELECT
CHARINDEX(RTRIM(B.Words), U.url),
*
FROM
#urltbl U
CROSS JOIN #Badwords AS B
WHERE
MSGDATE >= '12/10/2010' AND
/* using the next day because if you want all of 1/1 and there are any times
included you need to do less than the next day */
MSGDATE < '1/2/2011' AND
CHARINDEX(RTRIM(B.Words), U.url) <> 0
ORDER BY
username,
url
January 4, 2011 at 12:35 pm
I didn't spend anytime looking at your data types, but now that I have I do have some suggestions.
1. If the column contains dates then it should be one of the date data types. If you keep it a character data type you are only asking for trouble. If you are forced to use varchar() for it then you should be doing date validation before you insert/update that column AND I suggest a computed column that IS a date data type or a view that converts that column to a date data type. Ideally you'll put the correct data type on the column.
2. You definitely do NOT want to use TEXT data type because it is a BLOP data type, not character, AND it has been deprecated by Microsoft.
3. An index on msgdate while it is a character column won't give you the results you think as it will be sorting in alphbetical order NOT date order. So 1/1/01 will come AFTER 01/2/02.
4. An index on the url column will not help this query. Because you have to use a function, CHARINDEX, on the column SQL Server MUST do a scan of every row. You might be better off looking into Full Text indexing if you need to speed up this query. I am NOT a full text expert, so I can't really help or tell you if it is the right technology for this problem.
I hope this has helped, a little.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply