How to check only the numeric values at the same time ignoring the symbols or characters using SQL queries

Question

How to check only the numeric values at the same time ignoring the symbols or characters using SQL queries

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply

matzer SSCrazy Eights Points: 9727 More actions · Answer 1

drew.allen (11/11/2011)
Even so, the performance may still be better using the string splitter than without. The only way to now for sure is thorough testing, which is why I initially said that a string splitter was at least worth considering.
Drew

I already did using Jeff Moden's string splitting function (see code below). CHARINDEX is slightly faster in terms of CPU then the LIKE operator. String splitting takes 5 times more CPU and slightly more logical I/O but. Due to a parallel query plan the elapsed time is only about 50% slower. And an index obviously doesn't help.

Test rig:

IF OBJECT_ID('dbo.Metatags', 'U') IS NOT NULL

DROP TABLE dbo.Metatags

CREATE TABLE dbo.Metatags

(

MetatagID INT IDENTITY PRIMARY KEY,

MetatagDesc VARCHAR(900) NOT NULL

)

INSERT INTO

dbo.Metatags(MetatagDesc)

SELECT TOP 1000000

CAST(ABS(CHECKSUM(NEWID())) % 20 + 1 AS VARCHAR(2)) + '|' +

CAST(ABS(CHECKSUM(NEWID())) % 20 + 1 AS VARCHAR(2))

FROM

master.sys.all_columns C1

CROSS JOIN

master.sys.all_columns C2

--CREATE INDEX IX_Metatags_MetatagDesc on Metatags(MetatagDesc)

DECLARE @LookupID INT

SET @LookupID = 7

SET STATISTICS TIME ON

SET STATISTICS IO ON

PRINT '=== LIKE ==='

SELECT

*

FROM

dbo.Metatags

WHERE

'|' + MetatagDesc + '|' LIKE '%|' + CAST(@LookupID AS VARCHAR(2)) + '|%'

PRINT '=== CHARINDEX ==='

SELECT

*

FROM

dbo.Metatags

WHERE

CHARINDEX('|' + CAST(@LookupID AS VARCHAR(2)) +'|', '|' + MetatagDesc + '|') > 0

PRINT '=== String Splitter ==='

SELECT

*

FROM

dbo.Metatags

WHERE

@LookupID IN (SELECT Item FROM dbo.DelimitedSplit8K(MetatagDesc, '|'))

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

DROP TABLE dbo.Metatags

String splitter:

-- Taken from: http://www.sqlservercentral.com/articles/Tally+Table/72993/

CREATE FUNCTION [dbo].[DelimitedSplit8K](@pString VARCHAR(8000), @pDelimiter CHAR(1))

RETURNS TABLE

WITH SCHEMABINDING AS

RETURN

--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

-- enough to cover VARCHAR(8000)

WITH E1(N) AS

(

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

), --10E+1 or 10 rows

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

cteTally(N) AS

(

--==== This provides the "zero base" and limits the number of rows right up front

-- for both a performance gain and prevention of accidental "overruns"

SELECT 0

UNION ALL

SELECT TOP (DATALENGTH(ISNULL(@pString, 1)))

ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM

E4

),

cteStart(N1) AS

(

--==== This returns N+1 (starting position of each "element" just once for each delimiter)

SELECT

t.N+1

FROM

cteTally t

WHERE

SUBSTRING(@pString, t.N, 1) = @pDelimiter OR t.N = 0

)

--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

SELECT

ItemNumber = ROW_NUMBER() OVER (ORDER BY s.N1),

Item = SUBSTRING(@pString, s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter, @pString, s.N1), 0) - s.N1, 8000))

FROM

cteStart s

;

Result:

=== LIKE ===

(185752 row(s) affected)

Table 'Metatags'. Scan count 1, logical reads 4998, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 2918 ms, elapsed time = 3099 ms.

=== CHARINDEX ===

(185752 row(s) affected)

Table 'Metatags'. Scan count 1, logical reads 4998, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 2839 ms, elapsed time = 3026 ms.

=== String Splitter ===

(185752 row(s) affected)

Table 'Metatags'. Scan count 5, logical reads 5068, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 16645 ms, elapsed time = 6044 ms.

Or did you have another solution in mind with string splitting?

Paul White SSC Guru Points: 150468 More actions · Answer 2

Hi Peter,

To speed the LIKE and CHARINDEX solutions further, consider comparing using a binary collation:

PRINT '=== LIKE ==='

SELECT

*

FROM

dbo.Metatags

WHERE

'|' + MetatagDesc + '|' LIKE '%|' + CAST(@LookupID AS VARCHAR(2)) + '|%' COLLATE Latin1_General_BIN2

PRINT '=== CHARINDEX ==='

SELECT

*

FROM

dbo.Metatags

WHERE

CHARINDEX('|' + CAST(@LookupID AS VARCHAR(2)) +'|' COLLATE Latin1_General_BIN2, '|' + MetatagDesc + '|') > 0

On my machine, this change reduced the CPU from 2400ms or so down to 700ms for LIKE and 500ms for CHARINDEX.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

matzer SSCrazy Eights Points: 9727 More actions · Answer 3

Now that's a nice little trick to remember, Paul. Very impressive improvement. As a side note, I just run the script on both SQL 2005 and 2008 and noticed that the table is much smaller in SQL 2008 leading to much less logical read. Output from sp_spaceused:

2005

name rows reserved data index_size unused

--------- ----------- ------------------ ------------------ ------------------ ------------------

Metatags 1000000 40008 KB 39824 KB 160 KB 24 KB

2008

name rows reserved data index_size unused

---------- ------------------ ------------------ ------------------ ------------------

Metatags 1000000 26056 KB 25936 KB 104 KB 16 KB

Have internal structures been improved that much in 2008? Still working mostly with 2005.

Peter

Paul White SSC Guru Points: 150468 More actions · Answer 4

Peter Brinkhaus (11/12/2011)
Now that's a nice little trick to remember, Paul. Very impressive improvement.

Yes, string comparisons using all the complex rules of a 'regular' collation can be very expensive. If the data compares properly using the semantics of a binary collation, it's a good trick. Of course, if the data has binary comparison semantics, we could/should have defined that on the source column...!

As a side node, I just run the script on both SQL 2005 and 2008 and noticed that the table is much smaller in SQL 2008 leading to much less logical read [...] Have internal structures been improved that much in 2008? Still working mostly with 2005.

No (at least not if you don't count compression!)

[edit: previous explanation was incorrect]

Try: ALTER INDEX ALL ON dbo.Metatags REBUILD on the 2005 instance. This will remove the 14-byte per row row-versioning information. I assume you created the table in 2005 in a database with SI or RCSI enabled.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

matzer SSCrazy Eights Points: 9727 More actions · Answer 5

SQL Kiwi (11/12/2011)
SQL Server 2008 does a better job of avoiding page splits during the INSERT to the clustered index. Issue an ALTER INDEX ALL ON dbo.Metatags REBUILD on 2005 and you should see the space usage fall back to 2008 levels.

I always thought "page splits=fragmentation", but on SQL 2005 the fragmentation is less then on SQL 2008, if only 0.38% vs. 0.43%. But you're right, the ALTER INDEX does diminish the size of the table considerably.

The 2005 plan does contain an explicit Sort on the generated identity value in an attempt to reduce page-splitting, but 2008 has a much better trick: notice the DMLRequestSort = true on the Clustered Index Insert, which is an optimization to promote minimally-logged INSERTs to b-trees, a new feature on 2008 (assuming the database is in a recovery model state that allows minimally-logged operations).

Guess I have to do a lot more of reading. Any way, thanks for the explanation.

Peter

Paul White SSC Guru Points: 150468 More actions · Answer 6

Peter Brinkhaus (11/12/2011)
I always thought "page splits=fragmentation", but on SQL 2005 the fragmentation is less then on SQL 2008, if only 0.38% vs. 0.43%. But you're right, the ALTER INDEX does diminish the size of the table considerably.

Yes, I realised my mistake too late - you quoted me already! I have put the correct explanation in my previous post. The minimal logging thing still applies, it's just not the reason for the extra space usage.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

matzer SSCrazy Eights Points: 9727 More actions · Answer 7

SQL Kiwi (11/12/2011)
Peter Brinkhaus (11/12/2011)
I always thought "page splits=fragmentation", but on SQL 2005 the fragmentation is less then on SQL 2008, if only 0.38% vs. 0.43%. But you're right, the ALTER INDEX does diminish the size of the table considerably.
Yes, I realised my mistake too late - you quoted me already! I have put the correct explanation in my previous post. The minimal logging thing still applies, it's just not the reason for the extra space usage.

No problem, you forced me to investigate my sandbox DB a little further. For reasons I don't know sys.databases.is_read_committed_snapshot_on=1. Turned it off with ALTER DATABASE Test SET ALLOW_SNAPSHOT_ISOLATION OFF. After rebuilding the table it has approximately the same size as in SQL 2008.

Peter

Paul White SSC Guru Points: 150468 More actions · Answer 8

Peter Brinkhaus (11/12/2011)
No problem, you forced me to investigate my sandbox DB a little further. For reasons I don't know sys.databases.is_read_committed_snapshot_on=1. Turned it off with ALTER DATABASE Test SET ALLOW_SNAPSHOT_ISOLATION OFF. After rebuilding the table it has approximately the same size as in SQL 2008.

No, I do apologise for leading you in a false direction - serves me right for assuming things from the execution plan and not running the test for myself first.

By the way, turning off is_read_committed_snapshot_on = 1 would require ALTER DATABASE Test SET READ_COMMITTED_SNAPSHOT OFF. The sys.databases column for snapshot isolation (SI) would be snapshot_isolation_state_desc.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

matzer SSCrazy Eights Points: 9727 More actions · Answer 9

SQL Kiwi (11/12/2011)
By the way, turning off is_read_committed_snapshot_on = 1 would require ALTER DATABASE Test SET READ_COMMITTED_SNAPSHOT OFF. The sys.databases column for snapshot isolation (SI) would be snapshot_isolation_state_desc.

Sorry for the misinfo, I already cleared the SSMS window when I posted, so I copied the wrong statement from BOL.

Peter

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 10

{Edit} Never mind... I see that other folks have already brought up the problem of false positives when using LIKE without concatenation of the delimiters.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 11

Dev (11/11/2011)
mvijaykumar.mca (11/11/2011)
Thanks Peter..
SELECT *
from Metatags WHERE
'|' + MetatagDesc + '|' LIKE '%|2|%'
If the field contains 2|23|4 , and am searching 23, its shud search only 23 not 2, dats the scenario i was looking for and It worked....
Thanks for everyone for the quick response... 🙂
I insist you not to go with this solution. If it would have any column but key, there is no issue. You are playing with IDs. In future, there would be many issues just because of one wrong decision (or quick fix).

Why, Dev? What's wrong with this solution? It works perfectly for what it was designed for.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 12

Jeff Moden (11/12/2011)
Why, Dev? What's wrong with this solution? It works perfectly for what it was designed for.

I think he was encouraging the OP not to use the delimited id in the first place.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Devendrakumar SSC-Forever Points: 42493 More actions · Answer 13

SQL Kiwi (11/13/2011)
Jeff Moden (11/12/2011)
Why, Dev? What's wrong with this solution? It works perfectly for what it was designed for.
I think he was encouraging the OP not to use the delimited id in the first place.

Yup. That's the only reason.

TomThomson SSC Guru Points: 104768 More actions · Answer 14

Dev (11/11/2011)
Peter Brinkhaus (11/11/2011)
Agreed with Dev, but you can do it with a LIKE:
SELECT
*
FROM
(
SELECT '2|4|6' -- '3|24|4'
) SampleTable(ID)
WHERE
'|' + ID + '|' LIKE '%|2|%'
Can we try the same code with ID ''3|24|4''

Yes of course we can and yes (of course) it works exactly as required.

Were you suggesting it wouldn't work? If so, why? Adding separators to act as leading and trailing delimiters is an absolutely standard technique for discovering if something is in a separated list (and of course it is the first step in most splitter designs, too).

Tom

Devendrakumar SSC-Forever Points: 42493 More actions · Answer 15

My ONLY argument is this.

Dev (11/11/2011)
ID
---
2|4|6
It's bad design. If you have control on database schema, re-design it.