April 4, 2012 at 12:08 pm
SQL is pretty bad at doing any character manipulation and comparison. I always use LIKE, CHARINDEX etc. as a last resort. I agree, your solution performs much better , but still is very processor intensive.
I have modified your query a bit to get rid of the CHARINDEX(using an ASCII comparison), and had an almost 8 X improvement on a very large table. It also got rid of the hard coded alphabet, and can be easily modified to include numbers and other collations when used as a function.
SET STATISTICS TIME OFF;
PRINT '========== Stan''s Method ======================================================='
SET STATISTICS TIME ON;
SELECT count(yt.Name)
FROM dbo.yourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND ASCII(SUBSTRING(yt.Name,t.N,1)) = ASCII(SUBSTRING(yt.Name,t.N+1,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N+1,1)) = ASCII(SUBSTRING(yt.Name,t.N+2,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N,1)) BETWEEN 65 and 122
)
;
SET STATISTICS TIME OFF;
SET STATISTICS TIME ON
April 4, 2012 at 1:19 pm
sharky (4/4/2012)
SQL is pretty bad at doing any character manipulation and comparison. I always use LIKE, CHARINDEX etc. as a last resort. I agree, your solution performs much better , but still is very processor intensive.I have modified your query a bit to get rid of the CHARINDEX(using an ASCII comparison), and had an almost 8 X improvement on a very large table. It also got rid of the hard coded alphabet, and can be easily modified to include numbers and other collations when used as a function.
SET STATISTICS TIME OFF;
PRINT '========== Stan''s Method ======================================================='
SET STATISTICS TIME ON;
SELECT count(yt.Name)
FROM dbo.yourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND ASCII(SUBSTRING(yt.Name,t.N,1)) = ASCII(SUBSTRING(yt.Name,t.N+1,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N+1,1)) = ASCII(SUBSTRING(yt.Name,t.N+2,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N,1)) BETWEEN 65 and 122
)
;
SET STATISTICS TIME OFF;
SET STATISTICS TIME ON
for my own curiosity the "SELECT TOP 1 1 ..." in the not exists clause, im guessing since its not exist if there is any thing returned that would match the where the actual columns are not important? i have always seen not exist with a SELECT yt.Name ... may have something new in my arsenal.
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
April 5, 2012 at 2:10 pm
No. The correlated subquery just check for any occurance of a 3 consecutive alphabet string, and the TOP 1 1 returns just the first record(This is Jeffs code), and knows it should ignore it. I am not a huge fan of a Tally table as the code can break when you port it, but I have kept it the same for comparrison.
The main difference in my code is the ASCII comparison instead of a string comparison(SQL sucks at it!!) Although more efficient, Jeffs code did not make a huge performance improvement on Craigs because it still had to rely on that ugly CHARINDEX()
The ASCII result is a 10 X increase in performance, with the same 1000000 table used as in Craig and Jeffs code. Also, the performance will deteriorate even more if we include numbers or other characters, while the ascii method will stay the same.
--===== Create a variable to dump to to take the display time out of the picture.
--DECLARE @Bitbucket VARCHAR(50);
--===== Now, run the tests
PRINT '========== Craig''s Method ======================================================'
SET STATISTICS TIME ON;
SELECT COUNT(*)
FROM dbo.yourTable a
CROSS APPLY (SELECT TOP 1 CASE WHEN a.name LIKE strings THEN 1 ELSE 0 END
FROM (SELECT TOP 24
'%' + SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ',tbl.N,3) + '%' AS strings
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),
(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),
(23),(24),(25),(26)) tbl(N)) innerQuery
ORDER BY CASE WHEN a.name LIKE strings THEN 1 ELSE 0 END DESC) b(filter)
WHERE filter = 0
;
SET STATISTICS TIME OFF;
PRINT '========== Jeff''s Method ======================================================='
SET STATISTICS TIME ON;
SELECT COUNT(*)
FROM dbo.YourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND CHARINDEX(SUBSTRING(yt.Name,t.N,3),'ABCDEFGHIJKLMNOPQRSTUVWXYZ') > 0
)
;
SET STATISTICS TIME OFF;
PRINT '========== Stan''s Method ======================================================='
SET STATISTICS TIME ON;
SELECT COUNT(*)
FROM dbo.yourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND ASCII(SUBSTRING(yt.Name,t.N,1)) = ASCII(SUBSTRING(yt.Name,t.N+1,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N+1,1)) = ASCII(SUBSTRING(yt.Name,t.N+2,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N,1)) BETWEEN 65 and 120
)
;
SET STATISTICS TIME OFF;
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 130 ms.
========== Craig's Method ======================================================
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 193908 ms, elapsed time = 68966 ms.
========== Jeff's Method =======================================================
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 172601 ms, elapsed time = 55602 ms.
========== Stan's Method =======================================================
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 17425 ms, elapsed time = 5899 ms.
April 5, 2012 at 2:17 pm
sharky (4/5/2012)
the TOP 1 1 returns just the first record
No, SELECT TOP 1 1 will return 1 if there is at least 1 record. I think that is what Hector was asking; why do SELECT TOP 1 * or SELECT TOP 1 column in a NOT EXISTS when SELECTING TOP 1 1 is suffice. Is there really an improvement on anything by doing this? For example...
SELECT COUNT(*)
FROM table
vs
SELECT COUNT(1)
FROM table
Same result... Same performance (I think)... Different coding pattern.
Jared
CE - Microsoft
April 5, 2012 at 2:57 pm
My question would be why do TOP <x> in an EXISTS when the point of an EXISTS (or NOT EXISTS) is to terminate as soon as it finds a row.
http://sqlinthewild.co.za/index.php/2011/04/05/to-top-or-not-to-top-an-exists/
Count 1 and count * are synonymous. So are EXISTS (select 1 ...) EXISTS (select * ...) and EXISTS (select <column>...)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 5, 2012 at 3:13 pm
GilaMonster (4/5/2012)
My question would be why do TOP <x> in an EXISTS when the point of an EXISTS (or NOT EXISTS) is to terminate as soon as it finds a row.http://sqlinthewild.co.za/index.php/2011/04/05/to-top-or-not-to-top-an-exists/
Count 1 and count * are synonymous. So are EXISTS (select 1 ...) EXISTS (select * ...) and EXISTS (select <column>...)
So adding the TOP is useless in the EXISTS (or NOT EXISTS). So really, its just extra typing and script that can become confusing? 🙂
Jared
CE - Microsoft
April 5, 2012 at 3:55 pm
I Agree with Gale. It will terminate anyway when it finds the first record. 1 is just the first record column placeholder which will imply that there is at least 1 record. I agree *, 1 is the same
April 5, 2012 at 4:25 pm
sharky (4/4/2012)
SQL is pretty bad at doing any character manipulation and comparison. I always use LIKE, CHARINDEX etc. as a last resort. I agree, your solution performs much better , but still is very processor intensive.I have modified your query a bit to get rid of the CHARINDEX(using an ASCII comparison), and had an almost 8 X improvement on a very large table. It also got rid of the hard coded alphabet, and can be easily modified to include numbers and other collations when used as a function.
SET STATISTICS TIME OFF;
PRINT '========== Stan''s Method ======================================================='
SET STATISTICS TIME ON;
SELECT count(yt.Name)
FROM dbo.yourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND ASCII(SUBSTRING(yt.Name,t.N,1)) = ASCII(SUBSTRING(yt.Name,t.N+1,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N+1,1)) = ASCII(SUBSTRING(yt.Name,t.N+2,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N,1)) BETWEEN 65 and 122
)
;
SET STATISTICS TIME OFF;
SET STATISTICS TIME ON
I applaud the innovation! The reason why I didn't use such a method is because it doesn't actually work correctly because the use of the ASCII function makes it case sensitive.
CREATE TABLE yourTable (name VARCHAR(50))
INSERT INTO yourTable
SELECT name
FROM (
SELECT 'Good' UNION ALL --Keep
SELECT 'Goabc' UNION ALL --Filter Out
SELECT 'Goghi' UNION ALL --Filter Out
SELECT 'Goghp' UNION ALL --Keep
SELECT 'Stuart' --Filter Out
) a(name);
PRINT '========== Stan''s Method ======================================================='
SELECT yt.Name
FROM dbo.yourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND ASCII(SUBSTRING(yt.Name,t.N,1)) = ASCII(SUBSTRING(yt.Name,t.N+1,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N+1,1)) = ASCII(SUBSTRING(yt.Name,t.N+2,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N,1)) BETWEEN 65 and 120
)
;
Results: Notice that "Stuart" should not be included because of the "Stu".
========== Stan's Method =======================================================
Name
--------------------------------------------------
Good
Goghp
Stuart
(3 row(s) affected)
However, now that I've see Stan's code, there may be a way to do this using the ASCII function and with a filter that will make a modification to Stan's code even faster. Good idea, Stan.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2012 at 4:39 pm
sharky (4/5/2012)
I am not a huge fan of a Tally table as the code can break when you port it,
So can any other code that you try to port. Even some forms of basic ANSI code will break between things like Oracle and SQL server. True portability is a myth.
Also, what's so not portable about a simple table of numbers and queries against it??? :blink:
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2012 at 4:43 pm
Jeff Moden (4/5/2012)
sharky (4/4/2012)
SQL is pretty bad at doing any character manipulation and comparison. I always use LIKE, CHARINDEX etc. as a last resort. I agree, your solution performs much better , but still is very processor intensive.I have modified your query a bit to get rid of the CHARINDEX(using an ASCII comparison), and had an almost 8 X improvement on a very large table. It also got rid of the hard coded alphabet, and can be easily modified to include numbers and other collations when used as a function.
SET STATISTICS TIME OFF;
PRINT '========== Stan''s Method ======================================================='
SET STATISTICS TIME ON;
SELECT count(yt.Name)
FROM dbo.yourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND ASCII(SUBSTRING(yt.Name,t.N,1)) = ASCII(SUBSTRING(yt.Name,t.N+1,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N+1,1)) = ASCII(SUBSTRING(yt.Name,t.N+2,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N,1)) BETWEEN 65 and 122
)
;
SET STATISTICS TIME OFF;
SET STATISTICS TIME ON
I applaud the innovation! The reason why I didn't use such a method is because it doesn't actually work correctly because the use of the ASCII function makes it case sensitive.
CREATE TABLE yourTable (name VARCHAR(50))
INSERT INTO yourTable
SELECT name
FROM (
SELECT 'Good' UNION ALL --Keep
SELECT 'Goabc' UNION ALL --Filter Out
SELECT 'Goghi' UNION ALL --Filter Out
SELECT 'Goghp' UNION ALL --Keep
SELECT 'Stuart' --Filter Out
) a(name);
PRINT '========== Stan''s Method ======================================================='
SELECT yt.Name
FROM dbo.yourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND ASCII(SUBSTRING(yt.Name,t.N,1)) = ASCII(SUBSTRING(yt.Name,t.N+1,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N+1,1)) = ASCII(SUBSTRING(yt.Name,t.N+2,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N,1)) BETWEEN 65 and 120
)
;
Results: Notice that "Stuart" should not be included because of the "Stu".
========== Stan's Method =======================================================
Name
--------------------------------------------------
Good
Goghp
Stuart
(3 row(s) affected)
However, now that I've see Stan's code, there may be a way to do this using the ASCII function and with a filter that will make a modification to Stan's code even faster. Good idea, Stan.
would you be able to slap an UPPER() or LOWER() around yt.Name in ASCII(SUBSTRING(yt.Name,t.N,1)) turning it into ASCII(SUBSTRING(UPPER(yt.Name),t.N,1)) to get around the Stu problem in stan's code?
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
April 5, 2012 at 4:44 pm
SQLKnowItAll (4/5/2012)
GilaMonster (4/5/2012)
My question would be why do TOP <x> in an EXISTS when the point of an EXISTS (or NOT EXISTS) is to terminate as soon as it finds a row.http://sqlinthewild.co.za/index.php/2011/04/05/to-top-or-not-to-top-an-exists/
Count 1 and count * are synonymous. So are EXISTS (select 1 ...) EXISTS (select * ...) and EXISTS (select <column>...)
So adding the TOP is useless in the EXISTS (or NOT EXISTS). So really, its just extra typing and script that can become confusing? 🙂
Yes... that's what happens when I post at 1:17AM. 😛 I was trying two different methods and forgot to remove the TOP. :blush:
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2012 at 4:48 pm
capn.hector (4/5/2012)
would you be able to slap an UPPER() or LOWER() around yt.Name in ASCII(SUBSTRING(yt.Name,t.N,1)) turning it into ASCII(SUBSTRING(UPPER(yt.Name),t.N,1)) to get around the Stu problem in stan's code?
Yes but you wouldn't want to slap it around yt.Name because then it will take a bit (admittedly, miniscule, but avoidable) more time and cpu to convert the whole string. It would be better to slap UPPER or LOWER around the SUBSTRING result so it only has to make the conversion to a single character.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2012 at 5:08 pm
Here's my whole test harness including the UPPER modification to make Stan's fine idea work correctly.
CREATE TABLE yourTable (name VARCHAR(50))
INSERT INTO yourTable
SELECT name
FROM (
SELECT 'Good' UNION ALL --Keep
SELECT 'Goabc' UNION ALL --Filter Out
SELECT 'Goghi' UNION ALL --Filter Out
SELECT 'Goghp' UNION ALL --Keep
SELECT 'Stuart' --Filter Out
) a(name);
INSERT INTO dbo.yourTable
(name)
SELECT TOP 1000000
CAST(NEWID() AS VARCHAR(50))
FROM sys.all_columns ac1,
sys.all_columns ac2
;
--===== Create a variable to dump to to take the display time out of the picture.
DECLARE @Bitbucket VARCHAR(50);
--===== Now, run the tests
PRINT '========== Craig''s Method ======================================================'
SET STATISTICS TIME ON;
SELECT @Bitbucket = name
FROM yourTable a
CROSS APPLY (SELECT TOP 1 CASE WHEN a.name LIKE strings THEN 1 ELSE 0 END
FROM (SELECT TOP 24
'%' + SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ',tbl.N,3) + '%' AS strings
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),
(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),
(23),(24),(25),(26)) tbl(N)) innerQuery
ORDER BY CASE WHEN a.name LIKE strings THEN 1 ELSE 0 END DESC) b(filter)
WHERE filter = 0
;
SET STATISTICS TIME OFF;
PRINT '========== Jeff''s Method ======================================================='
SET STATISTICS TIME ON;
SELECT @Bitbucket = yt.Name
FROM dbo.YourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND CHARINDEX(SUBSTRING(yt.Name,t.N,3),'ABCDEFGHIJKLMNOPQRSTUVWXYZ') > 0
)
;
SET STATISTICS TIME OFF;
PRINT '========== Stan''s Method ======================================================='
SET STATISTICS TIME ON;
SELECT @Bitbucket = yt.Name
FROM dbo.yourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND ASCII(SUBSTRING(yt.Name,t.N,1)) = ASCII(SUBSTRING(yt.Name,t.N+1,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N+1,1)) = ASCII(SUBSTRING(yt.Name,t.N+2,1)) - 1
AND ASCII(SUBSTRING(yt.Name,t.N,1)) BETWEEN 65 and 120
)
;
SET STATISTICS TIME OFF;
PRINT '========== Stan''s Method modified by UPPER ======================================================='
SET STATISTICS TIME ON;
SELECT @Bitbucket = yt.Name
FROM dbo.yourTable yt
WHERE NOT EXISTS
(
SELECT TOP 1 1
FROM dbo.Tally t
WHERE t.N BETWEEN 1 AND LEN(yt.Name)-2
AND ASCII(UPPER(SUBSTRING(yt.Name,t.N,1))) = ASCII(UPPER(SUBSTRING(yt.Name,t.N+1,1))) - 1
AND ASCII(UPPER(SUBSTRING(yt.Name,t.N+1,1))) = ASCII(UPPER(SUBSTRING(yt.Name,t.N+2,1))) - 1
AND ASCII(SUBSTRING(yt.Name,t.N,1)) BETWEEN 65 and 120
)
;
SET STATISTICS TIME OFF;
Here are the results from my machine.
========== Craig's Method ======================================================
SQL Server Execution Times:
CPU time = 59716 ms, elapsed time = 34084 ms.
========== Jeff's Method =======================================================
SQL Server Execution Times:
CPU time = 34212 ms, elapsed time = 17813 ms.
========== Stan's Method =======================================================
SQL Server Execution Times:
CPU time = 23711 ms, elapsed time = 12370 ms.
========== Stan's Method modified by UPPER =======================================================
SQL Server Execution Times:
CPU time = 26786 ms, elapsed time = 13881 ms.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2012 at 5:11 pm
sharky (4/5/2012)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 130 ms.
========== Craig's Method ======================================================
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 193908 ms, elapsed time = 68966 ms.
========== Jeff's Method =======================================================
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 172601 ms, elapsed time = 55602 ms.
========== Stan's Method =======================================================
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 17425 ms, elapsed time = 5899 ms.
I'm thinking that there's either something wrong with either your machine or your Tally table for the Tally table to do that badly on your box. What version/edition/sp/cu of SQL Server are you running and would you mind posting the code you used to create the Tally table? Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2012 at 5:18 pm
Jeff Moden (4/5/2012)
SQLKnowItAll (4/5/2012)
GilaMonster (4/5/2012)
My question would be why do TOP <x> in an EXISTS when the point of an EXISTS (or NOT EXISTS) is to terminate as soon as it finds a row.http://sqlinthewild.co.za/index.php/2011/04/05/to-top-or-not-to-top-an-exists/
Count 1 and count * are synonymous. So are EXISTS (select 1 ...) EXISTS (select * ...) and EXISTS (select <column>...)
So adding the TOP is useless in the EXISTS (or NOT EXISTS). So really, its just extra typing and script that can become confusing? 🙂
Yes... that's what happens when I post at 1:17AM. 😛 I was trying two different methods and forgot to remove the TOP. :blush:
Of course I meant no disrespect 🙂 Especially to my mentor!
Jared
CE - Microsoft
Viewing 15 posts - 16 through 30 (of 37 total)
You must be logged in to reply to this topic. Login to reply