Find records in table that match patterns in different table

Question

Post reply

Find records in table that match patterns in different table

roryp 96873

SSCertifiable

Points: 6649
More actions
April 19, 2012 at 11:11 am

#259633

I had a co-worker ask me to help him solve a problem and this was the best solution I could come up with. It worked, but I figured with a large amount of records, using a cross join would start to hurt the performance. I just wanted to know if anyone had ideas on a perhaps better performing solution.
The problem was he had a table of customers with descriptions on them and needed to find the ones whose description matched a few strings he was looking for anywhere in the description. It's not a big deal if the customer returns more than once if the description matches multiple patterns.
Here is some data with my solution:
create table #myData
(
userid int,
descr varchar(6)
)
go
insert into #myData
select 1, 'abc123'
union all select 2, 'abcdef'
union all select 3, 'lmnop'
union all select 4, '123456'
union all select 5, 'a1b2c3'
union all select 6, 'qwerty'
go
create table #patterns
(
pat varchar(5)
)
go
insert into #patterns
select '%abc%'
union all select '%123%'
union all select '%xyz%'
union all select '%ert%'
select *
from #myData myData
cross join #patterns patterns
where PATINDEX(patterns.pat, myData.descr) > 0

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 1

ColdCoffee

SSC-Dedicated

Points: 39972

April 19, 2012 at 11:19 am

#1476158

Rory, how big are the 2 tables?

roryp 96873 SSCertifiable Points: 6649 More actions · Answer 2

There was only about 6,000 in the customer table and 7 patterns we were looking for so it really wasn't an issue. This just got my curiosity up and I couldn't think of another solution and was just wondering if anyone else would have approached this in a different manner.

CapnHector SSCoach Points: 16143 More actions · Answer 3

roryp 96873 (4/19/2012)
There was only about 6,000 in the customer table and 7 patterns we were looking for so it really wasn't an issue. This just got my curiosity up and I couldn't think of another solution and was just wondering if anyone else would have approached this in a different manner.

if you want to use some dynamic sql its nice and fast.

DECLARE @SQL VARCHAR(MAX)

DECLARE @Pat VARCHAR (MAX)

SET @SQL = 'SELECT * FROM RandomGuid R WHERE RANDOM LIKE '

SET @Pat = STUFF((Select 'OR RANDOM LIKE ' + '''' + pat + '''' + ' ' FROM #patterns FOR XML PATH('')),1,15,'')

SET @SQL = @SQL + @Pat

--SELECT @SQL

EXEC (@SQL)

its faster by quite a bit on my million row giud table since im running your patterns all at once instead of one at a time and as the number of patterns or customers increases the time savings will increase (relative to cross join). you would need to replace the table (RandomGuid) and column (RANDOM) to meet your needs.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

roryp 96873 SSCertifiable Points: 6649 More actions · Answer 4

Thanks Hector, I like that approach. I've used dynamic SQL so little that I forget about it sometimes.

ScottPletcher SSC Guru Points: 101238 More actions · Answer 5

Did you adjust the original query so you're doing an apples-to-apples comparison?

SELECT *

FROM #myData md

CROSS JOIN #patterns p

WHERE

md.descr LIKE p.pat

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

CapnHector SSCoach Points: 16143 More actions · Answer 6

ScottPletcher (4/19/2012)
Did you adjust the original query so you're doing an apples-to-apples comparison?
SELECT *
FROM #myData md
CROSS JOIN #patterns p
WHERE
md.descr LIKE p.pat

still comparing the entire table 4 times, a little faster than the patindex but not much. even with out millisecond accurecy (i can post it if you want) cross joining (or cross applying) the #patterns takes almost 10 seconds while the dynamic build of the where clause to directly compare the filters is comming in at about 3 seconds on a million rows.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

ScottPletcher SSC Guru Points: 101238 More actions · Answer 7

I expected that, just wanted to be fair about it 🙂 .

The optimizer still needs improvement.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

CapnHector SSCoach Points: 16143 More actions · Answer 8

========== ORIGINAL METHOD ======================================================
SQL Server Execution Times:
CPU time = 9251 ms, elapsed time = 9267 ms.
========== With out PATINDEX =======================================================
SQL Server Execution Times:
CPU time = 9360 ms, elapsed time = 9500 ms.
========== Dynamic Sql ===============================================================
SQL Server Execution Times:
CPU time = 2761 ms, elapsed time = 2866 ms.

for giggles and to run the stats, using pat index is faster, not by much but faster.

here is what i ran

--CREATE the data

CREATE TABLE Random (col VARCHAR(36))

INSERT INTO Random

SELECT TOP 1000000 CONVERT(VARCHAR(36),NEWID()) AS name

FROM sys.all_columns sc1,sys.all_columns sc2

--Create our pattern table

create table #patterns

(

pat varchar(5)

)

insert into #patterns

select '%abc%'

union all select '%123%'

union all select '%xyz%'

union all select '%ert%'

SET NOCOUNT ON --make every thing quiet, does not matter very much in this case.

DECLARE @BitBucket VARCHAR(50) -- kill off any output

PRINT '========== ORIGINAL METHOD ======================================================'

SET STATISTICS TIME ON;

select @BitBucket = col

from Random r

cross join #patterns patterns

where PATINDEX(patterns.pat, r.col) > 0

SET STATISTICS TIME OFF;

PRINT '========== With out PATINDEX ======================================================='

SET STATISTICS TIME ON;

SELECT @BitBucket = col

FROM Random r

CROSS JOIN #patterns p

WHERE r.col LIKE p.pat

SET STATISTICS TIME OFF;

PRINT '========== Dynamic Sql ==============================================================='

SET STATISTICS TIME ON;

DECLARE @SQL VARCHAR(MAX)

DECLARE @Pat VARCHAR (MAX)

SET @SQL = 'DECLARE @BitBucket VARCHAR(50); SELECT @BitBucket = col FROM Random R WHERE col LIKE ' + STUFF((Select ' OR col LIKE ' + '''' + pat + '''' FROM #patterns FOR XML PATH('')),1,13,'')

EXEC (@SQL)

SET STATISTICS TIME OFF;

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]