How to check only the numeric values at the same time ignoring the symbols or characters using SQL queries

  • drew.allen (11/11/2011)


    Even so, the performance may still be better using the string splitter than without. The only way to now for sure is thorough testing, which is why I initially said that a string splitter was at least worth considering.

    Drew

    I already did using Jeff Moden's string splitting function (see code below). CHARINDEX is slightly faster in terms of CPU then the LIKE operator. String splitting takes 5 times more CPU and slightly more logical I/O but. Due to a parallel query plan the elapsed time is only about 50% slower. And an index obviously doesn't help.

    Test rig:

    IF OBJECT_ID('dbo.Metatags', 'U') IS NOT NULL

    DROP TABLE dbo.Metatags

    CREATE TABLE dbo.Metatags

    (

    MetatagID INT IDENTITY PRIMARY KEY,

    MetatagDesc VARCHAR(900) NOT NULL

    )

    INSERT INTO

    dbo.Metatags(MetatagDesc)

    SELECT TOP 1000000

    CAST(ABS(CHECKSUM(NEWID())) % 20 + 1 AS VARCHAR(2)) + '|' +

    CAST(ABS(CHECKSUM(NEWID())) % 20 + 1 AS VARCHAR(2)) + '|' +

    CAST(ABS(CHECKSUM(NEWID())) % 20 + 1 AS VARCHAR(2)) + '|' +

    CAST(ABS(CHECKSUM(NEWID())) % 20 + 1 AS VARCHAR(2))

    FROM

    master.sys.all_columns C1

    CROSS JOIN

    master.sys.all_columns C2

    --CREATE INDEX IX_Metatags_MetatagDesc on Metatags(MetatagDesc)

    DECLARE @LookupID INT

    SET @LookupID = 7

    SET STATISTICS TIME ON

    SET STATISTICS IO ON

    PRINT '=== LIKE ==='

    SELECT

    *

    FROM

    dbo.Metatags

    WHERE

    '|' + MetatagDesc + '|' LIKE '%|' + CAST(@LookupID AS VARCHAR(2)) + '|%'

    PRINT '=== CHARINDEX ==='

    SELECT

    *

    FROM

    dbo.Metatags

    WHERE

    CHARINDEX('|' + CAST(@LookupID AS VARCHAR(2)) +'|', '|' + MetatagDesc + '|') > 0

    PRINT '=== String Splitter ==='

    SELECT

    *

    FROM

    dbo.Metatags

    WHERE

    @LookupID IN (SELECT Item FROM dbo.DelimitedSplit8K(MetatagDesc, '|'))

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    DROP TABLE dbo.Metatags

    String splitter:

    -- Taken from: http://www.sqlservercentral.com/articles/Tally+Table/72993/

    CREATE FUNCTION [dbo].[DelimitedSplit8K](@pString VARCHAR(8000), @pDelimiter CHAR(1))

    RETURNS TABLE

    WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS

    (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS

    (

    --==== This provides the "zero base" and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT 0

    UNION ALL

    SELECT TOP (DATALENGTH(ISNULL(@pString, 1)))

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM

    E4

    ),

    cteStart(N1) AS

    (

    --==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT

    t.N+1

    FROM

    cteTally t

    WHERE

    SUBSTRING(@pString, t.N, 1) = @pDelimiter OR t.N = 0

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT

    ItemNumber = ROW_NUMBER() OVER (ORDER BY s.N1),

    Item = SUBSTRING(@pString, s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter, @pString, s.N1), 0) - s.N1, 8000))

    FROM

    cteStart s

    ;

    Result:

    === LIKE ===

    (185752 row(s) affected)

    Table 'Metatags'. Scan count 1, logical reads 4998, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2918 ms, elapsed time = 3099 ms.

    === CHARINDEX ===

    (185752 row(s) affected)

    Table 'Metatags'. Scan count 1, logical reads 4998, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2839 ms, elapsed time = 3026 ms.

    === String Splitter ===

    (185752 row(s) affected)

    Table 'Metatags'. Scan count 5, logical reads 5068, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 16645 ms, elapsed time = 6044 ms.

    Or did you have another solution in mind with string splitting?

  • Hi Peter,

    To speed the LIKE and CHARINDEX solutions further, consider comparing using a binary collation:

    PRINT '=== LIKE ==='

    SELECT

    *

    FROM

    dbo.Metatags

    WHERE

    '|' + MetatagDesc + '|' LIKE '%|' + CAST(@LookupID AS VARCHAR(2)) + '|%' COLLATE Latin1_General_BIN2

    PRINT '=== CHARINDEX ==='

    SELECT

    *

    FROM

    dbo.Metatags

    WHERE

    CHARINDEX('|' + CAST(@LookupID AS VARCHAR(2)) +'|' COLLATE Latin1_General_BIN2, '|' + MetatagDesc + '|') > 0

    On my machine, this change reduced the CPU from 2400ms or so down to 700ms for LIKE and 500ms for CHARINDEX.

  • Now that's a nice little trick to remember, Paul. Very impressive improvement. As a side note, I just run the script on both SQL 2005 and 2008 and noticed that the table is much smaller in SQL 2008 leading to much less logical read. Output from sp_spaceused:

    2005

    name rows reserved data index_size unused

    --------- ----------- ------------------ ------------------ ------------------ ------------------

    Metatags 1000000 40008 KB 39824 KB 160 KB 24 KB

    2008

    name rows reserved data index_size unused

    ---------- ------------------ ------------------ ------------------ ------------------

    Metatags 1000000 26056 KB 25936 KB 104 KB 16 KB

    Have internal structures been improved that much in 2008? Still working mostly with 2005.

    Peter

  • Peter Brinkhaus (11/12/2011)


    Now that's a nice little trick to remember, Paul. Very impressive improvement.

    Yes, string comparisons using all the complex rules of a 'regular' collation can be very expensive. If the data compares properly using the semantics of a binary collation, it's a good trick. Of course, if the data has binary comparison semantics, we could/should have defined that on the source column...!

    As a side node, I just run the script on both SQL 2005 and 2008 and noticed that the table is much smaller in SQL 2008 leading to much less logical read [...] Have internal structures been improved that much in 2008? Still working mostly with 2005.

    No (at least not if you don't count compression!)

    [edit: previous explanation was incorrect]

    Try: ALTER INDEX ALL ON dbo.Metatags REBUILD on the 2005 instance. This will remove the 14-byte per row row-versioning information. I assume you created the table in 2005 in a database with SI or RCSI enabled.

  • SQL Kiwi (11/12/2011)


    SQL Server 2008 does a better job of avoiding page splits during the INSERT to the clustered index. Issue an ALTER INDEX ALL ON dbo.Metatags REBUILD on 2005 and you should see the space usage fall back to 2008 levels.

    I always thought "page splits=fragmentation", but on SQL 2005 the fragmentation is less then on SQL 2008, if only 0.38% vs. 0.43%. But you're right, the ALTER INDEX does diminish the size of the table considerably.

    The 2005 plan does contain an explicit Sort on the generated identity value in an attempt to reduce page-splitting, but 2008 has a much better trick: notice the DMLRequestSort = true on the Clustered Index Insert, which is an optimization to promote minimally-logged INSERTs to b-trees, a new feature on 2008 (assuming the database is in a recovery model state that allows minimally-logged operations).

    Guess I have to do a lot more of reading. Any way, thanks for the explanation.

    Peter

  • Peter Brinkhaus (11/12/2011)


    I always thought "page splits=fragmentation", but on SQL 2005 the fragmentation is less then on SQL 2008, if only 0.38% vs. 0.43%. But you're right, the ALTER INDEX does diminish the size of the table considerably.

    Yes, I realised my mistake too late - you quoted me already! I have put the correct explanation in my previous post. The minimal logging thing still applies, it's just not the reason for the extra space usage.

  • SQL Kiwi (11/12/2011)


    Peter Brinkhaus (11/12/2011)


    I always thought "page splits=fragmentation", but on SQL 2005 the fragmentation is less then on SQL 2008, if only 0.38% vs. 0.43%. But you're right, the ALTER INDEX does diminish the size of the table considerably.

    Yes, I realised my mistake too late - you quoted me already! I have put the correct explanation in my previous post. The minimal logging thing still applies, it's just not the reason for the extra space usage.

    No problem, you forced me to investigate my sandbox DB a little further. For reasons I don't know sys.databases.is_read_committed_snapshot_on=1. Turned it off with ALTER DATABASE Test SET ALLOW_SNAPSHOT_ISOLATION OFF. After rebuilding the table it has approximately the same size as in SQL 2008.

    Peter

  • Peter Brinkhaus (11/12/2011)


    No problem, you forced me to investigate my sandbox DB a little further. For reasons I don't know sys.databases.is_read_committed_snapshot_on=1. Turned it off with ALTER DATABASE Test SET ALLOW_SNAPSHOT_ISOLATION OFF. After rebuilding the table it has approximately the same size as in SQL 2008.

    No, I do apologise for leading you in a false direction - serves me right for assuming things from the execution plan and not running the test for myself first.

    By the way, turning off is_read_committed_snapshot_on = 1 would require ALTER DATABASE Test SET READ_COMMITTED_SNAPSHOT OFF. The sys.databases column for snapshot isolation (SI) would be snapshot_isolation_state_desc.

  • SQL Kiwi (11/12/2011)


    By the way, turning off is_read_committed_snapshot_on = 1 would require ALTER DATABASE Test SET READ_COMMITTED_SNAPSHOT OFF. The sys.databases column for snapshot isolation (SI) would be snapshot_isolation_state_desc.

    Sorry for the misinfo, I already cleared the SSMS window when I posted, so I copied the wrong statement from BOL.

    Peter

  • {Edit} Never mind... I see that other folks have already brought up the problem of false positives when using LIKE without concatenation of the delimiters.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dev (11/11/2011)


    mvijaykumar.mca (11/11/2011)


    Thanks Peter..

    SELECT *

    from Metatags WHERE

    '|' + MetatagDesc + '|' LIKE '%|2|%'

    If the field contains 2|23|4 , and am searching 23, its shud search only 23 not 2, dats the scenario i was looking for and It worked....

    Thanks for everyone for the quick response... 🙂

    I insist you not to go with this solution. If it would have any column but key, there is no issue. You are playing with IDs. In future, there would be many issues just because of one wrong decision (or quick fix).

    Why, Dev? What's wrong with this solution? It works perfectly for what it was designed for.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/12/2011)


    Why, Dev? What's wrong with this solution? It works perfectly for what it was designed for.

    I think he was encouraging the OP not to use the delimited id in the first place.

  • SQL Kiwi (11/13/2011)


    Jeff Moden (11/12/2011)


    Why, Dev? What's wrong with this solution? It works perfectly for what it was designed for.

    I think he was encouraging the OP not to use the delimited id in the first place.

    Yup. That's the only reason.

  • Dev (11/11/2011)


    Peter Brinkhaus (11/11/2011)


    Agreed with Dev, but you can do it with a LIKE:

    SELECT

    *

    FROM

    (

    SELECT '2|4|6' -- '3|24|4'

    ) SampleTable(ID)

    WHERE

    '|' + ID + '|' LIKE '%|2|%'

    Can we try the same code with ID ''3|24|4''

    Yes of course we can and yes (of course) it works exactly as required.

    Were you suggesting it wouldn't work? If so, why? Adding separators to act as leading and trailing delimiters is an absolutely standard technique for discovering if something is in a separated list (and of course it is the first step in most splitter designs, too).

    Tom

  • My ONLY argument is this.

    Dev (11/11/2011)


    ID

    ---

    2|4|6

    It's bad design. If you have control on database schema, re-design it.

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply