November 11, 2011 at 1:40 pm
drew.allen (11/11/2011)
Even so, the performance may still be better using the string splitter than without. The only way to now for sure is thorough testing, which is why I initially said that a string splitter was at least worth considering.Drew
I already did using Jeff Moden's string splitting function (see code below). CHARINDEX is slightly faster in terms of CPU then the LIKE operator. String splitting takes 5 times more CPU and slightly more logical I/O but. Due to a parallel query plan the elapsed time is only about 50% slower. And an index obviously doesn't help.
Test rig:
IF OBJECT_ID('dbo.Metatags', 'U') IS NOT NULL
DROP TABLE dbo.Metatags
CREATE TABLE dbo.Metatags
(
MetatagID INT IDENTITY PRIMARY KEY,
MetatagDesc VARCHAR(900) NOT NULL
)
INSERT INTO
dbo.Metatags(MetatagDesc)
SELECT TOP 1000000
CAST(ABS(CHECKSUM(NEWID())) % 20 + 1 AS VARCHAR(2)) + '|' +
CAST(ABS(CHECKSUM(NEWID())) % 20 + 1 AS VARCHAR(2)) + '|' +
CAST(ABS(CHECKSUM(NEWID())) % 20 + 1 AS VARCHAR(2)) + '|' +
CAST(ABS(CHECKSUM(NEWID())) % 20 + 1 AS VARCHAR(2))
FROM
master.sys.all_columns C1
CROSS JOIN
master.sys.all_columns C2
--CREATE INDEX IX_Metatags_MetatagDesc on Metatags(MetatagDesc)
DECLARE @LookupID INT
SET @LookupID = 7
SET STATISTICS TIME ON
SET STATISTICS IO ON
PRINT '=== LIKE ==='
SELECT
*
FROM
dbo.Metatags
WHERE
'|' + MetatagDesc + '|' LIKE '%|' + CAST(@LookupID AS VARCHAR(2)) + '|%'
PRINT '=== CHARINDEX ==='
SELECT
*
FROM
dbo.Metatags
WHERE
CHARINDEX('|' + CAST(@LookupID AS VARCHAR(2)) +'|', '|' + MetatagDesc + '|') > 0
PRINT '=== String Splitter ==='
SELECT
*
FROM
dbo.Metatags
WHERE
@LookupID IN (SELECT Item FROM dbo.DelimitedSplit8K(MetatagDesc, '|'))
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
DROP TABLE dbo.Metatags
String splitter:
-- Taken from: http://www.sqlservercentral.com/articles/Tally+Table/72993/
CREATE FUNCTION [dbo].[DelimitedSplit8K](@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0
UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString, 1)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
E4
),
cteStart(N1) AS
(
--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT
t.N+1
FROM
cteTally t
WHERE
SUBSTRING(@pString, t.N, 1) = @pDelimiter OR t.N = 0
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT
ItemNumber = ROW_NUMBER() OVER (ORDER BY s.N1),
Item = SUBSTRING(@pString, s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter, @pString, s.N1), 0) - s.N1, 8000))
FROM
cteStart s
;
Result:
=== LIKE ===
(185752 row(s) affected)
Table 'Metatags'. Scan count 1, logical reads 4998, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 2918 ms, elapsed time = 3099 ms.
=== CHARINDEX ===
(185752 row(s) affected)
Table 'Metatags'. Scan count 1, logical reads 4998, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 2839 ms, elapsed time = 3026 ms.
=== String Splitter ===
(185752 row(s) affected)
Table 'Metatags'. Scan count 5, logical reads 5068, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 16645 ms, elapsed time = 6044 ms.
Or did you have another solution in mind with string splitting?
November 12, 2011 at 9:23 am
Hi Peter,
To speed the LIKE and CHARINDEX solutions further, consider comparing using a binary collation:
PRINT '=== LIKE ==='
SELECT
*
FROM
dbo.Metatags
WHERE
'|' + MetatagDesc + '|' LIKE '%|' + CAST(@LookupID AS VARCHAR(2)) + '|%' COLLATE Latin1_General_BIN2
PRINT '=== CHARINDEX ==='
SELECT
*
FROM
dbo.Metatags
WHERE
CHARINDEX('|' + CAST(@LookupID AS VARCHAR(2)) +'|' COLLATE Latin1_General_BIN2, '|' + MetatagDesc + '|') > 0
On my machine, this change reduced the CPU from 2400ms or so down to 700ms for LIKE and 500ms for CHARINDEX.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 12, 2011 at 10:11 am
Now that's a nice little trick to remember, Paul. Very impressive improvement. As a side note, I just run the script on both SQL 2005 and 2008 and noticed that the table is much smaller in SQL 2008 leading to much less logical read. Output from sp_spaceused:
2005
name rows reserved data index_size unused
--------- ----------- ------------------ ------------------ ------------------ ------------------
Metatags 1000000 40008 KB 39824 KB 160 KB 24 KB
2008
name rows reserved data index_size unused
---------- ------------------ ------------------ ------------------ ------------------
Metatags 1000000 26056 KB 25936 KB 104 KB 16 KB
Have internal structures been improved that much in 2008? Still working mostly with 2005.
Peter
November 12, 2011 at 10:41 am
Peter Brinkhaus (11/12/2011)
Now that's a nice little trick to remember, Paul. Very impressive improvement.
Yes, string comparisons using all the complex rules of a 'regular' collation can be very expensive. If the data compares properly using the semantics of a binary collation, it's a good trick. Of course, if the data has binary comparison semantics, we could/should have defined that on the source column...!
As a side node, I just run the script on both SQL 2005 and 2008 and noticed that the table is much smaller in SQL 2008 leading to much less logical read [...] Have internal structures been improved that much in 2008? Still working mostly with 2005.
No (at least not if you don't count compression!)
[edit: previous explanation was incorrect]
Try: ALTER INDEX ALL ON dbo.Metatags REBUILD on the 2005 instance. This will remove the 14-byte per row row-versioning information. I assume you created the table in 2005 in a database with SI or RCSI enabled.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 12, 2011 at 11:02 am
SQL Kiwi (11/12/2011)
SQL Server 2008 does a better job of avoiding page splits during the INSERT to the clustered index. Issue an ALTER INDEX ALL ON dbo.Metatags REBUILD on 2005 and you should see the space usage fall back to 2008 levels.
I always thought "page splits=fragmentation", but on SQL 2005 the fragmentation is less then on SQL 2008, if only 0.38% vs. 0.43%. But you're right, the ALTER INDEX does diminish the size of the table considerably.
The 2005 plan does contain an explicit Sort on the generated identity value in an attempt to reduce page-splitting, but 2008 has a much better trick: notice the DMLRequestSort = true on the Clustered Index Insert, which is an optimization to promote minimally-logged INSERTs to b-trees, a new feature on 2008 (assuming the database is in a recovery model state that allows minimally-logged operations).
Guess I have to do a lot more of reading. Any way, thanks for the explanation.
Peter
November 12, 2011 at 11:05 am
Peter Brinkhaus (11/12/2011)
I always thought "page splits=fragmentation", but on SQL 2005 the fragmentation is less then on SQL 2008, if only 0.38% vs. 0.43%. But you're right, the ALTER INDEX does diminish the size of the table considerably.
Yes, I realised my mistake too late - you quoted me already! I have put the correct explanation in my previous post. The minimal logging thing still applies, it's just not the reason for the extra space usage.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 12, 2011 at 11:37 am
SQL Kiwi (11/12/2011)
Peter Brinkhaus (11/12/2011)
I always thought "page splits=fragmentation", but on SQL 2005 the fragmentation is less then on SQL 2008, if only 0.38% vs. 0.43%. But you're right, the ALTER INDEX does diminish the size of the table considerably.Yes, I realised my mistake too late - you quoted me already! I have put the correct explanation in my previous post. The minimal logging thing still applies, it's just not the reason for the extra space usage.
No problem, you forced me to investigate my sandbox DB a little further. For reasons I don't know sys.databases.is_read_committed_snapshot_on=1. Turned it off with ALTER DATABASE Test SET ALLOW_SNAPSHOT_ISOLATION OFF. After rebuilding the table it has approximately the same size as in SQL 2008.
Peter
November 12, 2011 at 11:46 am
Peter Brinkhaus (11/12/2011)
No problem, you forced me to investigate my sandbox DB a little further. For reasons I don't know sys.databases.is_read_committed_snapshot_on=1. Turned it off with ALTER DATABASE Test SET ALLOW_SNAPSHOT_ISOLATION OFF. After rebuilding the table it has approximately the same size as in SQL 2008.
No, I do apologise for leading you in a false direction - serves me right for assuming things from the execution plan and not running the test for myself first.
By the way, turning off is_read_committed_snapshot_on = 1 would require ALTER DATABASE Test SET READ_COMMITTED_SNAPSHOT OFF. The sys.databases column for snapshot isolation (SI) would be snapshot_isolation_state_desc.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 12, 2011 at 11:54 am
SQL Kiwi (11/12/2011)
By the way, turning off is_read_committed_snapshot_on = 1 would require ALTER DATABASE Test SET READ_COMMITTED_SNAPSHOT OFF. The sys.databases column for snapshot isolation (SI) would be snapshot_isolation_state_desc.
Sorry for the misinfo, I already cleared the SSMS window when I posted, so I copied the wrong statement from BOL.
Peter
November 12, 2011 at 7:44 pm
{Edit} Never mind... I see that other folks have already brought up the problem of false positives when using LIKE without concatenation of the delimiters.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 12, 2011 at 7:58 pm
Dev (11/11/2011)
mvijaykumar.mca (11/11/2011)
Thanks Peter..SELECT *
from Metatags WHERE
'|' + MetatagDesc + '|' LIKE '%|2|%'
If the field contains 2|23|4 , and am searching 23, its shud search only 23 not 2, dats the scenario i was looking for and It worked....
Thanks for everyone for the quick response... 🙂
I insist you not to go with this solution. If it would have any column but key, there is no issue. You are playing with IDs. In future, there would be many issues just because of one wrong decision (or quick fix).
Why, Dev? What's wrong with this solution? It works perfectly for what it was designed for.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 13, 2011 at 1:43 am
Jeff Moden (11/12/2011)
Why, Dev? What's wrong with this solution? It works perfectly for what it was designed for.
I think he was encouraging the OP not to use the delimited id in the first place.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 14, 2011 at 1:49 am
SQL Kiwi (11/13/2011)
Jeff Moden (11/12/2011)
Why, Dev? What's wrong with this solution? It works perfectly for what it was designed for.I think he was encouraging the OP not to use the delimited id in the first place.
Yup. That's the only reason.
November 14, 2011 at 2:04 am
Dev (11/11/2011)
Peter Brinkhaus (11/11/2011)
Agreed with Dev, but you can do it with a LIKE:
SELECT
*
FROM
(
SELECT '2|4|6' -- '3|24|4'
) SampleTable(ID)
WHERE
'|' + ID + '|' LIKE '%|2|%'
Can we try the same code with ID ''3|24|4''
Yes of course we can and yes (of course) it works exactly as required.
Were you suggesting it wouldn't work? If so, why? Adding separators to act as leading and trailing delimiters is an absolutely standard technique for discovering if something is in a separated list (and of course it is the first step in most splitter designs, too).
Tom
November 14, 2011 at 2:08 am
My ONLY argument is this.
Dev (11/11/2011)
ID
---
2|4|6
It's bad design. If you have control on database schema, re-design it.
Viewing 15 posts - 16 through 30 (of 31 total)
You must be logged in to reply to this topic. Login to reply