April 19, 2012 at 11:11 am
I had a co-worker ask me to help him solve a problem and this was the best solution I could come up with. It worked, but I figured with a large amount of records, using a cross join would start to hurt the performance. I just wanted to know if anyone had ideas on a perhaps better performing solution.
The problem was he had a table of customers with descriptions on them and needed to find the ones whose description matched a few strings he was looking for anywhere in the description. It's not a big deal if the customer returns more than once if the description matches multiple patterns.
Here is some data with my solution:
create table #myData
(
userid int,
descr varchar(6)
)
go
insert into #myData
select 1, 'abc123'
union all select 2, 'abcdef'
union all select 3, 'lmnop'
union all select 4, '123456'
union all select 5, 'a1b2c3'
union all select 6, 'qwerty'
go
create table #patterns
(
pat varchar(5)
)
go
insert into #patterns
select '%abc%'
union all select '%123%'
union all select '%xyz%'
union all select '%ert%'
select *
from #myData myData
cross join #patterns patterns
where PATINDEX(patterns.pat, myData.descr) > 0
April 19, 2012 at 11:19 am
Rory, how big are the 2 tables?
April 19, 2012 at 11:22 am
There was only about 6,000 in the customer table and 7 patterns we were looking for so it really wasn't an issue. This just got my curiosity up and I couldn't think of another solution and was just wondering if anyone else would have approached this in a different manner.
April 19, 2012 at 12:21 pm
roryp 96873 (4/19/2012)
There was only about 6,000 in the customer table and 7 patterns we were looking for so it really wasn't an issue. This just got my curiosity up and I couldn't think of another solution and was just wondering if anyone else would have approached this in a different manner.
if you want to use some dynamic sql its nice and fast.
DECLARE @sql VARCHAR(MAX)
DECLARE @pat VARCHAR (MAX)
SET @sql = 'SELECT * FROM RandomGuid R WHERE RANDOM LIKE '
SET @pat = STUFF((Select 'OR RANDOM LIKE ' + '''' + pat + '''' + ' ' FROM #patterns FOR XML PATH('')),1,15,'')
--SELECT @sql
EXEC (@SQL)
its faster by quite a bit on my million row giud table since im running your patterns all at once instead of one at a time and as the number of patterns or customers increases the time savings will increase (relative to cross join). you would need to replace the table (RandomGuid) and column (RANDOM) to meet your needs.
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
April 19, 2012 at 1:15 pm
Thanks Hector, I like that approach. I've used dynamic SQL so little that I forget about it sometimes.
April 19, 2012 at 1:22 pm
Did you adjust the original query so you're doing an apples-to-apples comparison?
SELECT *
FROM #myData md
CROSS JOIN #patterns p
WHERE
md.descr LIKE p.pat
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 19, 2012 at 2:12 pm
ScottPletcher (4/19/2012)
Did you adjust the original query so you're doing an apples-to-apples comparison?
SELECT *
FROM #myData md
CROSS JOIN #patterns p
WHERE
md.descr LIKE p.pat
still comparing the entire table 4 times, a little faster than the patindex but not much. even with out millisecond accurecy (i can post it if you want) cross joining (or cross applying) the #patterns takes almost 10 seconds while the dynamic build of the where clause to directly compare the filters is comming in at about 3 seconds on a million rows.
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
April 19, 2012 at 3:56 pm
I expected that, just wanted to be fair about it 🙂 .
The optimizer still needs improvement.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 19, 2012 at 4:44 pm
========== ORIGINAL METHOD ======================================================
SQL Server Execution Times:
CPU time = 9251 ms, elapsed time = 9267 ms.
========== With out PATINDEX =======================================================
SQL Server Execution Times:
CPU time = 9360 ms, elapsed time = 9500 ms.
========== Dynamic Sql ===============================================================
SQL Server Execution Times:
CPU time = 2761 ms, elapsed time = 2866 ms.
for giggles and to run the stats, using pat index is faster, not by much but faster.
here is what i ran
--CREATE the data
CREATE TABLE Random (col VARCHAR(36))
INSERT INTO Random
SELECT TOP 1000000 CONVERT(VARCHAR(36),NEWID()) AS name
FROM sys.all_columns sc1,sys.all_columns sc2
--Create our pattern table
create table #patterns
(
pat varchar(5)
)
insert into #patterns
select '%abc%'
union all select '%123%'
union all select '%xyz%'
union all select '%ert%'
SET NOCOUNT ON --make every thing quiet, does not matter very much in this case.
DECLARE @Bitbucket VARCHAR(50) -- kill off any output
PRINT '========== ORIGINAL METHOD ======================================================'
SET STATISTICS TIME ON;
select @Bitbucket = col
from Random r
cross join #patterns patterns
where PATINDEX(patterns.pat, r.col) > 0
SET STATISTICS TIME OFF;
PRINT '========== With out PATINDEX ======================================================='
SET STATISTICS TIME ON;
SELECT @Bitbucket = col
FROM Random r
CROSS JOIN #patterns p
WHERE r.col LIKE p.pat
SET STATISTICS TIME OFF;
PRINT '========== Dynamic Sql ==============================================================='
SET STATISTICS TIME ON;
DECLARE @sql VARCHAR(MAX)
DECLARE @pat VARCHAR (MAX)
SET @sql = 'DECLARE @Bitbucket VARCHAR(50); SELECT @Bitbucket = col FROM Random R WHERE col LIKE ' + STUFF((Select ' OR col LIKE ' + '''' + pat + '''' FROM #patterns FOR XML PATH('')),1,13,'')
EXEC (@SQL)
SET STATISTICS TIME OFF;
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply