More efficient way to perform a lookup?

Question

More efficient way to perform a lookup?

ravi_k

Valued Member

Points: 69
More actions
July 15, 2013 at 8:17 pm

#395145

I've spent a fair bit of time searching, but can't seem to find a succinct answer...is there a more efficient/faster way of performing the following query? I simply have a lookup table with 45k records and want to search a string field for the presence of any of these lookup values. Currently this query takes 45 min on 800k records. Would it help to index [Table1] on [Field1]? Would a subquery with patindex() be faster?
UPDATE [Table1]
SET [Field2] = 'Yes'
FROM [Table1] A
JOIN [Lookup] as B
ON (A.Field1 like '% ' + B.LookupValue + ' %')
Thanks!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

r_noob (7/15/2013)
I've spent a fair bit of time searching, but can't seem to find a succinct answer...is there a more efficient/faster way of performing the following query? I simply have a lookup table with 45k records and want to search a string field for the presence of any of these lookup values. Currently this query takes 45 min on 800k records. Would it help to index [Table1] on [Field1]? Would a subquery with patindex() be faster?
UPDATE [Table1]
SET [Field2] = 'Yes'
FROM [Table1] A
JOIN [Lookup] as B
ON (A.Field1 like '% ' + B.LookupValue + ' %')
Thanks!

I think this is going to be a stunningly bad performer regardless. But an index might help. What might be even faster, believe it or not, is to do an EXISTS query instead of a JOIN. Hell, even a cursor approach could be faster than the join, which could hit millions of times on a 45000X800000 join of that type.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004312 More actions · Answer 2

I agree with Kevin. A looped search would probably be faster than a Join because you could then add a WHERE Field2 <> 'Yes" to the code which would speed things up quite a bit especially if there were some intelligent indexing around that column (at least on a temporary basis).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 3

You may want to try full text search here. Together with using WHERE EXISTS instead of JOIN, should be considerably faster than JOIN with non sargable LIKE. For some reason I think it will win over loop-based aproch too.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

I am not sure full text indexing will help with the '%' + column + '%' join ...

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 5

r_noob (7/15/2013)
I've spent a fair bit of time searching, but can't seem to find a succinct answer...is there a more efficient/faster way of performing the following query? I simply have a lookup table with 45k records and want to search a string field for the presence of any of these lookup values. Currently this query takes 45 min on 800k records. Would it help to index [Table1] on [Field1]? Would a subquery with patindex() be faster?
UPDATE [Table1]
SET [Field2] = 'Yes'
FROM [Table1] A
JOIN [Lookup] as B
ON (A.Field1 like '% ' + B.LookupValue + ' %')
Thanks!

Try this:

IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL

DROP TABLE #TempTable

CREATE TABLE #TempTable (

[ID] INT IDENTITY(1,1) NOT NULL,

[Field1] INT NULL,

[Field2] NVARCHAR(3) NULL,

PRIMARY KEY (ID))

INSERT INTO #TempTable

SELECT 22,'No' UNION ALL

SELECT 31,'No' UNION ALL

SELECT 42,'No' UNION ALL

SELECT 55,'No'

IF OBJECT_ID('tempdb..#LookupTable') IS NOT NULL

DROP TABLE #LookupTable

CREATE TABLE #LookupTable (

[ID] INT IDENTITY(1,1) NOT NULL,

[LookupValue] INT NULL,

[LookDesc] NVARCHAR(30) NULL,

PRIMARY KEY (ID))

INSERT INTO #LookupTable

SELECT 22,'Apple' UNION ALL

SELECT 33,'Orange' UNION ALL

SELECT 44,'Banana' UNION ALL

SELECT 55,'Pear'

UPDATE #TempTable

SET Field2 = 'Yes'

FROM

#TempTable AS A

INNER JOIN

#LookupTable AS B

ON A.ID = B.ID

WHERE

A.Field1 = B.LookupValue

SELECT * FROM #TempTable AS tt

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 6

r_noob (7/15/2013)
... ON (A.Field1 like '% ' + B.LookupValue + ' %')

That looks like words in a sentence. Try Jeff's splitter: split the sentence on space, remove any leading blanks and seek into your lookup table.

Like this:

SELECT *

FROM #Table1 a

CROSS APPLY dbo.DelimitedSplit8K(a.Field1, ' ') split

WHERE EXISTS (SELECT 1 FROM #Lookup b

WHERE b.LookDesc = split.Item)

SELECT *

FROM #Table1 a

CROSS APPLY dbo.DelimitedSplit8K(a.Field1, ' ') split

INNER LOOP JOIN #Lookup b

ON b.LookDesc = split.Item

Prep:

IF OBJECT_ID('tempdb..#Table1') IS NOT NULL

DROP TABLE #Table1

CREATE TABLE #Table1 (

[ID] INT IDENTITY(1,1) NOT NULL,

[Field1] VARCHAR(100),

[Field2] VARCHAR(3) NULL)

INSERT INTO #Table1

SELECT 'The quick brown fox','No' UNION ALL

SELECT 'jumped over','No' UNION ALL

SELECT 'the','No' UNION ALL

SELECT 'lazy dog','No' UNION ALL

SELECT 'One day in the life of Ivan Denisovitch', 'No'

INSERT INTO #Table1 select a.Field1, 'No' AS Field2 from #Table1 a, sys.columns b

CREATE UNIQUE CLUSTERED INDEX ucx_ID ON #Table1 (ID)

IF OBJECT_ID('tempdb..#Lookup') IS NOT NULL

DROP TABLE #Lookup

CREATE TABLE #Lookup ([LookDesc] VARCHAR(30) NOT NULL)

INSERT INTO #Lookup

SELECT 'dog' UNION ALL

SELECT 'fox' UNION ALL

SELECT TOP 45000 x = RIGHT(NEWID(),5)+RIGHT(NEWID(),5) FROM SYSCOLUMNS A, SYSCOLUMNS B

CREATE UNIQUE CLUSTERED INDEX ucx_LookDesc ON #Lookup (LookDesc)

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Stefan_G SSCertifiable Points: 6609 More actions · Answer 7

The performance of Chris' solution is fantastic.

If we use his setup script, we can use the following code to perform the update:

IF OBJECT_ID('tempdb..#t2') IS NOT NULL drop table #t2

SELECT DISTINCT ID

INTO #t2

FROM #Table1 a

CROSS APPLY dbo.DelimitedSplit8K(a.Field1, ' ') split

WHERE EXISTS (SELECT 1 FROM #Lookup b WHERE b.LookDesc = split.Item)

UPDATE #Table1

SET Field2 = 'Yes'

WHERE ID in (

select ID from #t2

)

This code executes in less than 100ms on my machine. I expect similar results for the real data. Not bad to go from 45 minutes to sub-second performance!

Well done!

/SG