February 8, 2011 at 12:18 pm
Does anyone know of a quick and dirty way to do this any faster? I'm trying to find the Number of occurences in a string of a specific word. It's a long story but I'm doing script evaluations for a third product that stores the scripts in my SQL db.
IF OBJECT_ID ( 'tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp
IF OBJECT_ID( 'tempdb..#Tally') IS NOT NULL
DROP TABLE #Tally
-- Slightly modified from tally article
SELECT TOP 8000 --Largest Script Field
IDENTITY(INT,1,1) AS N
INTO #Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
ALTER TABLE #Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
CREATE TABLE #tmp (RowID INT IDENTITY( 1,1), SearchForString VARCHAR(20), SearchInString VARCHAR(2000))
INSERT INTO #tmp ( SearchForString, SearchInString) VALUES ( 'quick', 'The quick brown dirty fox jumped quickly over the quick dog.')
INSERT INTO #tmp ( SearchForString, SearchInString) VALUES ( 'duck', 'Duck Soup is better if you duck the ducks.')
INSERT INTO #tmp ( SearchForString, SearchInString) VALUES ( 'department', 'I am tired of the department of redundancy department.')
--SELECT * FROM #tmp
SELECT
RowID,
COUNT(*) AS NumOccurence
FROM
(SELECT
RowID
FROM
#tmp AS t
CROSS JOIN
#Tally AS tn
WHERE
CHARINDEX( SearchForString, SearchInString, tn.N) = tn.N
AND tn.N <= LEN( SearchInString)
) AS drv
GROUP BY
RowID
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 8, 2011 at 12:31 pm
Try this as well:
IF OBJECT_ID(N'tempdb..#tmp') IS NULL
BEGIN
CREATE TABLE #tmp (
RowID INT IDENTITY(1, 1),
SearchForString VARCHAR(20),
SearchInString VARCHAR(2000))
INSERT INTO #tmp
(SearchForString,
SearchInString)
VALUES ('quick',
'The quick brown dirty fox jumped quickly over the quick dog.')
INSERT INTO #tmp
(SearchForString,
SearchInString)
VALUES ('duck',
'Duck Soup is better if you duck the ducks.')
INSERT INTO #tmp
(SearchForString,
SearchInString)
VALUES ('department',
'I am tired of the department of redundancy department.')
END ;
SELECT *
FROM #tmp ;
SELECT RowID,
SearchForString,
COUNT(*) AS QtyOccurances
FROM #tmp
CROSS APPLY (SELECT Number
FROM dbo.Numbers
WHERE Number <= LEN(SearchInString)) AS Numbers
WHERE SearchForString = SUBSTRING(SearchInString, Number,
LEN(SearchForString))
GROUP BY RowID,
SearchForString
ORDER BY RowID ;
I have a Numbers table, but you can plug in your #Tally there just as easily.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 8, 2011 at 12:44 pm
Hm, nice alternative Gus, thanks. Never thought about using Cross Apply like that against Tally.
I used the #Tally to make the code directly consumable. 🙂
I'd originally avoided the substring figuring that the additional len calculation per Tally row would cause more heartache. Certainly can't hurt to take a swing with that though, after a second (and third) thought.
I'll get back to you on that, appreciate the time.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 8, 2011 at 1:00 pm
(450 row(s) affected)
Table '#Tally______________________________________________________________________________________________________________000000000193'. Scan count 450, logical reads 1350, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp________________________________________________________________________________________________________________000000000194'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 32 ms, elapsed time = 31 ms.
(450 row(s) affected)
Table '#Tally______________________________________________________________________________________________________________000000000193'. Scan count 450, logical reads 1350, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp________________________________________________________________________________________________________________000000000194'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 19 ms.
Thanks Gus, your way was much faster, almost 2x faster in this case when I expanded the data set.
Anyone who'd like to do their own benchmarking, find the code below.
IF OBJECT_ID ( 'tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp
IF OBJECT_ID( 'tempdb..#Tally') IS NOT NULL
DROP TABLE #Tally
-- Slightly modified from tally article
SELECT TOP 8000 -- Largest string in question, usually build out to 100,000.
IDENTITY(INT,1,1) AS N
INTO #Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
ALTER TABLE #Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
CREATE TABLE #tmp (RowID INT IDENTITY( 1,1), SearchForString VARCHAR(20), SearchInString VARCHAR(2000))
INSERT INTO #tmp ( SearchForString, SearchInString)
SELECt
drv.*
FROM
(SELECT'quick' AS sfs, 'The quick brown dirty fox jumped quickly over the quick dog.' AS sis UNION ALL SELECT
'duck', 'Duck Soup is better if you duck the ducks.' UNION ALL SELECT
'department', 'I am tired of the department of redundancy department.'
)AS drv CROSS JOIN
#tally AS tn
WHERe
tn.n <=150
select * from #tmp
--SELECT * FROM #tmp
SET STATISTICS IO, TIME ON
SELECT
RowID,
COUNT(*) AS NumOccurence
FROM
(SELECT
RowID
FROM
#tmp AS t
CROSS JOIN
#Tally AS tn
WHERE
CHARINDEX( SearchForString, SearchInString, tn.N) = tn.N
AND tn.N <= LEN( SearchInString)
) AS drv
GROUP BY
RowID
SELECT RowID,
SearchForString,
COUNT(*) AS QtyOccurances
FROM #tmp
CROSS APPLY (SELECT N
FROM #Tally
WHERE N <= LEN(SearchInString)) AS Numbers
WHERE SearchForString = SUBSTRING(SearchInString, N,
LEN(SearchForString))
GROUP BY RowID,
SearchForString
--ORDER BY RowID
;
SET STATISTICS IO, TIME OFF
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 8, 2011 at 1:01 pm
Here's a performance test I ran:
SET NOCOUNT ON;
IF OBJECT_ID(N'tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp ;
IF OBJECT_ID(N'tempdb..#T1') IS NOT NULL
DROP TABLE #T1 ;
IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL
DROP TABLE #T2 ;
CREATE TABLE #tmp (
RowID INT IDENTITY(1, 1),
SearchForString VARCHAR(20),
SearchInString VARCHAR(2000)) ;
GO
INSERT INTO #tmp
(SearchForString, SearchInString)
VALUES ('quick',
REPLICATE('The quick brown dirty fox jumped quickly over the quick dog.',
10))
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('duck', 'Duck Soup is better if you duck the ducks.')
, ('department',
'I am tired of the department of redundancy department.') ;
go 10000
SET STATISTICS IO ON ;
SET STATISTICS TIME ON ;
SELECT RowID,
COUNT(*) AS NumOccurence
INTO #T1
FROM (SELECT RowID
FROM #tmp AS t
CROSS JOIN dbo.Numbers AS tn
WHERE CHARINDEX(SearchForString, SearchInString, tn.Number) = tn.Number
AND tn.Number <= LEN(SearchInString)) AS drv
GROUP BY RowID ;
SELECT RowID,
COUNT(*) AS NumOccurence
INTO #T2
FROM #tmp
CROSS APPLY (SELECT Number
FROM dbo.Numbers
WHERE Number <= LEN(SearchInString)) AS Numbers
WHERE SearchForString = SUBSTRING(SearchInString, Number,
LEN(SearchForString))
GROUP BY RowID ;
SET STATISTICS TIME OFF ;
SET STATISTICS IO OFF ;
Results:
Beginning execution loop
Batch execution completed 10000 times.
Table 'Numbers'. Scan count 550000, logical reads 1650000, physical reads 2, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp'. Scan count 1, logical reads 5162, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 22828 ms, elapsed time = 23968 ms.
Table 'Numbers'. Scan count 550000, logical reads 1650000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp'. Scan count 1, logical reads 5162, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15063 ms, elapsed time = 16046 ms.
The I/O stats are the same for both, but the Cross Apply version was consistently faster in the CPU. I tried a variety of sizes on the temp table, from 3 rows on up to 550,000.
(The insert statement is one of the SQL 2008 new features. Won't matter for the tests.)
I had both dump their data into temp tables, to avoid adding the overhead to the queries of returning a dataset to the client machine.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 8, 2011 at 1:49 pm
Out of sheer curiousity I mucked with the build a bit to see if the substring or the crossapply was the determinator.
The substring is the faster function. I didn't see any significant difference between the crossapply and the cross join activities.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 8, 2011 at 2:09 pm
ok how about this: it's simply doing a REPLACE on the original phrase, and using datalength to get the differences:
No Tally, so i think it'll be faster than anything so far.
SELECT RowID,
--REPLACE(SearchInString,SearchForString,''),
--SearchForString,
--SearchInString,
--datalength(SearchInString),
--datalength(REPLACE(SearchInString,SearchForString,'')),
(datalength(SearchInString) - datalength(REPLACE(SearchInString,SearchForString,''))) / datalength(SearchForString)
FROM #tmp
Lowell
February 8, 2011 at 2:19 pm
:Wow::blink:
Nice idea Lowell... faster by a magnitude of nearly 10. (Edit: Heh, whoops, sorry. Faster 10x at the CPU level, 50% of time at the final timing. I didn't run this in discard results settings so my guess is a lot of that is the return to screen of the data.)
CHARINDEX Method
(360000 row(s) affected)
Table '#Tally______________________________________________________________________________________________________________0000000001AA'. Scan count 360000, logical reads 1080000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp________________________________________________________________________________________________________________0000000001AB'. Scan count 5, logical reads 3455, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15625 ms, elapsed time = 11448 ms.
SUBSTRING Method
(360000 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#Tally______________________________________________________________________________________________________________0000000001AA'. Scan count 360000, logical reads 1080000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tmp________________________________________________________________________________________________________________0000000001AB'. Scan count 5, logical reads 3455, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 12203 ms, elapsed time = 11158 ms.
DATALENGTH method
(360000 row(s) affected)
Table '#tmp________________________________________________________________________________________________________________0000000001AB'. Scan count 1, logical reads 3455, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1735 ms, elapsed time = 7074 ms.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 8, 2011 at 2:38 pm
Awesome idea Lowell. One quick note, (and this is probably completely unnecessary in this particular discussion, but to anyone that wants to implement this method elsewhere), is to watch the datatypes used with datalength.
DECLARE @a varchar(20)
DECLARE @b-2 nvarchar(20)
SET @a = 'aaaaa'
SET @b-2 = 'aaaaa'
SELECT DATALENGTH(@a), DATALENGTH(@b)
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply