December 28, 2011 at 3:08 pm
I wrote the following query to extract unique IDs based on a key word search of the text column from what is basically a running log table that could contain 100+ entries for every ID. The query works and gets me what i need but i would like to know if there is a better way to do this besides running 3 sub-selects one for each keyword?
SELECT Notes.ID,
Notes.DisplayTime,
Notes.NEText,
Case2.Other_ID
FROM Case2 Join Notes on Case2.ID = Notes.ID
Where DisplayTime between '2004-12-21 00:00:00' and '2011-12-21 23:59:59'
and NeText Like '%ethanol%'
and DisplayTime in
(Select
MAX(DisplayTime)as time
FROM Case2 Join Notes on Case2.ID = Notes.ID
Where DisplayTime between '2004-12-21 00:00:00' and '2011-12-21 23:59:59' and NeText Like '%ethanol%'
Group BY Notes.ID)
or DisplayTime between '2004-12-21 00:00:00' and '2011-12-21 23:59:59'
and NeText Like '%etoh%'
and DisplayTime in
(Select
MAX(DisplayTime)as time
FROM Case2 Join Notes on Case2.ID = Notes.ID
Where DisplayTime between '2004-12-21 00:00:00' and '2011-12-21 23:59:59' and NeText Like '%etoh%'
Group BY Notes.ID)
or DisplayTime between '2004-12-21 00:00:00' and '2011-12-21 23:59:59'
and NeText Like '%Alcohol%'
and DisplayTime in
(Select
MAX(DisplayTime)as time
FROM Case2 Join Notes on Case2.ID = Notes.ID
Where DisplayTime between '2004-12-21 00:00:00' and '2011-12-21 23:59:59' and NeText Like '%Alcohol%'
Group BY Notes.ID)
Thanks
E
December 29, 2011 at 1:45 am
It highly depends on your expected output, but this should do the trick.
DECLARE @Case2 TABLE (
ID int PRIMARY KEY CLUSTERED,
Other_ID int NOT NULL
)
DECLARE @Notes TABLE (
ID int PRIMARY KEY CLUSTERED,
DisplayTime datetime NOT NULL,
NEText nvarchar(500) NOT NULL
)
INSERT INTO @Case2 VALUES (1, 1)
INSERT INTO @Notes VALUES (1, '20111001', 'This row contains ethanol and must be returned')
INSERT INTO @Case2 VALUES (2, 2)
INSERT INTO @Notes VALUES (2, '20111001', 'This row contains etoh and must be returned')
INSERT INTO @Case2 VALUES (3, 3)
INSERT INTO @Notes VALUES (3, '20111001', 'This row contains Alcohol and must be returned')
INSERT INTO @Case2 VALUES (4, 4)
INSERT INTO @Notes VALUES (4, '20120101', 'This row is out of the date range')
INSERT INTO @Case2 VALUES (5, 5)
INSERT INTO @Notes VALUES (5, '20111001', 'This row contains Alcohol and must be returned')
INSERT INTO @Case2 VALUES (6, 6)
INSERT INTO @Notes VALUES (6, '20111001', 'This row contains no valid pattern')
SELECT Notes.ID,
Notes.DisplayTime,
Notes.NEText,
Case2.Other_ID
FROM @Case2 AS Case2
JOIN @Notes AS Notes
on Case2.ID = Notes.ID
WHERE DisplayTime between '2004-12-21 00:00:00' and '2011-12-21 23:59:59'
AND EXISTS (
SELECT 1
FROM (
SELECT 'ethanol'
UNION ALL
SELECT 'etoh'
UNION ALL
SELECT 'Alcohol'
) AS src (pattern)
WHERE NeText LIKE '%' + pattern + '%'
)
and DisplayTime in (
Select MAX(DisplayTime)as time
FROM @Case2 AS InnerCase2
Join @Notes AS InnerNotes
on InnerCase2.ID = InnerNotes.ID
Where DisplayTime between '2004-12-21 00:00:00' and '2011-12-21 23:59:59'
and NeText = Notes.NeText -- Outer reference
)
-- Gianluca Sartori
December 29, 2011 at 8:23 am
If I understood what you're trying to do, this is a more efficient approach.
SELECT n.[ID], n.DisplayTime, n.NEText, c.Other_ID
FROM @Case2 AS c
CROSS APPLY (
SELECT TOP (1) n.ID, n.DisplayTime, n.NEText
FROM @Notes AS n
INNER JOIN (
SELECT 'Ethanol'
UNION ALL
SELECT 'Etoh'
UNION ALL
SELECT 'Alcohol'
) AS p(Pattern)
ON NEText LIKE '%' + pattern + '%'
WHERE c.[ID] = n.[ID]
AND n.DisplayTime BETWEEN '2004-12-21 00:00:00' AND '2011-12-21 23:59:59'
ORDER BY n.DisplayTime DESC
) AS n
Here are the stats:
CROSS APPLY
Table '#2145C81B'. Scan count 0, logical reads 12, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#1F5D7FA9'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
MAX
Table '#2145C81B'. Scan count 0, logical reads 60, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#1F5D7FA9'. Scan count 5, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Drew
Edited: Added ORDER BY to produce the MAX for the criteria.
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 29, 2011 at 9:02 am
This is the kind of situation where full text indexing and a custom thesaurus file will come in very, very handy.
I've seen this kind of query many, many times, and on large tables, full text indexing will usually return results in milliseconds where Like can take minutes or even hours on the same data.
One I recently implemented needs to compare input values to street addresses. It uses a full text index and thesaurus to compare things like "123 First Street" to "123 1st St", in a 5-million row table. LIKE and a table of synonyms ("1st" for "First") would usually return results in about a minute (nowhere near good enough for a public-facing web page), where the full-text version returns matches in under a millisecond.
Definitely look into it for this.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
December 29, 2011 at 2:20 pm
Thanks for the replies and a few things I have not seen before that I will need to read up on. i will see if I can get the code to work in my DB and do a time test.
Thanks
E
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply