October 24, 2012 at 2:56 am
Hi geniuses!
Hi have a field (VALUE), which contains data like:
MNN.helloworld
GFF.goodbyeworld
SSW.seeyoulaterworld.oops
And I want to retrieve data when there's only one '.' in the data.
Is this doable?
Thanks
Regards
October 24, 2012 at 3:02 am
--========================--
--== CREATE SAMPLE DATA ==--
--========================--
SELECT yourData
INTO #yourTable
FROM (VALUES('MNN.helloworld'),('GFF.goodbyeworld'),('SSW.seeyoulaterworld.oops')
)a(yourData);
--========================--
--== SOLUTION ==--
--========================--
SELECT *
FROM #yourTable
WHERE LEN(yourData)-1 = LEN(REPLACE(yourData COLLATE Latin1_General_BIN2,'.',''));
October 24, 2012 at 3:08 am
Thanks
It gives me an error: Argument data type ntext is invalid for argument 1 of len function
Solutions?
Regards
October 24, 2012 at 3:20 am
davdam8 (10/24/2012)
ThanksIt gives me an error: Argument data type ntext is invalid for argument 1 of len function
Solutions?
Regards
That would be because you didn't supply DDL or readily consumable sample data, which caused me to assume that you were not using a deprecated type.
You'll have to cast it to NVARCHAR or VARCHAR to use, e.g.
SELECT *
FROM #yourTable
WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));
October 24, 2012 at 3:26 am
WOW!
Works great!
Thanks body!
Regards
October 24, 2012 at 8:08 am
davdam8 (10/24/2012)
WOW!Works great!
Thanks body!
Regards
And if at all possible change your ntext to nvarchar(max). ntext is deprecated and complete PITA to work with.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
October 25, 2012 at 4:07 am
second option for the same requiment
declare @temp table(t varchar(50))
insert into @temp
select * from
(
values
('a.p'),
('y.a.p'),
('k.p')
)a (name)
select *
from @temp
where CHARINDEX('.',t,CHARINDEX('.',t)+1)=0
October 25, 2012 at 6:47 am
Two solutions. . . this means performance check!! 🙂
Here's 1 million rows of sample data: -
SET NOCOUNT ON;
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment;
END;
--1,000,000 Random rows of data
SELECT IDENTITY(INT,1,1) AS ID, CAST(LEFT(yourData, LEN(yourData) - 1) AS NTEXT) AS yourData
INTO #testEnvironment
FROM (SELECT TOP 1000000
REPLICATE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(NEWID(),'-','')
,'0',''),'9',''),'8',''),'7',''),'6',''),'5',''),'4',''),'3',''),'2',''),'1','') + '.',
(ABS(CHECKSUM(NEWID())) % 2) + 2)
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3
)a(yourData);
DECLARE @HOLDER INT, @Duration CHAR(12), @StartTime DATETIME;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE CHARINDEX('.', yourData, CHARINDEX('.', yourData) + 1) = 0;
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for CHARINDEX = %s',0,1,@Duration) WITH NOWAIT;
SELECT @StartTime = SYSDATETIME();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for LEN = %s',0,1,@Duration) WITH NOWAIT;
Duration for CHARINDEX = 00:00:10:660
Duration for LEN = 00:00:06:327
Second run (I recreated the test environment between each run):-
DECLARE @HOLDER INT, @Duration CHAR(12), @StartTime DATETIME;
SELECT @StartTime = SYSDATETIME();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for LEN = %s',0,1,@Duration) WITH NOWAIT;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE CHARINDEX('.', yourData, CHARINDEX('.', yourData) + 1) = 0;
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for CHARINDEX = %s',0,1,@Duration) WITH NOWAIT;
Duration for LEN = 00:00:10:927
Duration for CHARINDEX = 00:00:05:900
So the CHARINDEX function looks better, but both are pretty slow.
Anyone got a better way?
October 25, 2012 at 6:56 am
A mix of the two seems to help
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE CHARINDEX('.', yourData COLLATE Latin1_General_BIN2 , CHARINDEX('.', yourData COLLATE Latin1_General_BIN2) + 1) = 0;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537October 25, 2012 at 7:21 am
Good idea Mark. I also added an ugly XML split to count the number of "items", then modified the test script to clear the cache between each piece of code to attempt to keep it fairer.
SET NOCOUNT ON;
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment;
END;
--1,000,000 Random rows of data
SELECT IDENTITY(INT,1,1) AS ID, CAST(LEFT(yourData, LEN(yourData) - 1) AS NTEXT) AS yourData
INTO #testEnvironment
FROM (SELECT TOP 1000000
REPLICATE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(NEWID(),'-','')
,'0',''),'9',''),'8',''),'7',''),'6',''),'5',''),'4',''),'3',''),'2',''),'1','') + '.',
(ABS(CHECKSUM(NEWID())) % 2) + 2)
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3
)a(yourData);
DECLARE @HOLDER INT, @Duration CHAR(12), @StartTime DATETIME;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM (SELECT ID, yourData, MAX(rn)
FROM (SELECT ID, yourData, split.Part.value('text()[1]', 'VARCHAR(MAX)'), ROW_NUMBER() OVER(PARTITION BY yourData ORDER BY (SELECT NULL))
FROM (SELECT CAST('<p>' + REPLACE(CAST(yourData AS VARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.','</p><p>') + '</p>' AS XML),
CAST(yourData AS VARCHAR(MAX)), ID
FROM #testEnvironment) innerQ(xmlField, yourData, ID)
CROSS APPLY innerQ.xmlField.nodes('p') split(Part)
) a(ID, yourData,splitData,rn)
GROUP BY ID, yourData
) a(ID, yourData, rn)
WHERE rn = 2;
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for ugly xml split = %s',0,1,@Duration) WITH NOWAIT;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE CHARINDEX('.', yourData, CHARINDEX('.', yourData) + 1) = 0;
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for CHARINDEX = %s',0,1,@Duration) WITH NOWAIT;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for LEN = %s',0,1,@Duration) WITH NOWAIT;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE CHARINDEX('.', yourData COLLATE Latin1_General_BIN2 , CHARINDEX('.', yourData COLLATE Latin1_General_BIN2) + 1) = 0;
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for CHARINDEX with COLLATE = %s',0,1,@Duration) WITH NOWAIT;
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Duration for ugly xml split = 00:00:48:980
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Duration for CHARINDEX = 00:00:08:640
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Duration for LEN = 00:00:09:087
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Duration for CHARINDEX with COLLATE = 00:00:05:323
So the CHARINDEX with COLLATE that Mark suggested is the fastest so far - whilst my ugly XML splitter proves its worth as considerably slower 😀
October 25, 2012 at 8:15 am
How about this?
SELECT *
FROM #testenvironment
WHERE PARSENAME(CAST(yourdata AS NVARCHAR(MAX)),3) IS NULL
October 25, 2012 at 8:20 am
I was so close, thanks to the QotD that made me remember the PARSENAME function.
My code
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE PARSENAME( CAST(yourData AS NVARCHAR(MAX)),3) IS NULL
AND PARSENAME(CAST(yourData AS NVARCHAR(MAX)), 2) IS NOT NULL
Results using Cadavre's last test environment (without the XML). I deleted the "DBCC execution completed" comments.
Duration for CHARINDEX = 00:00:18:363
Duration for LEN = 00:00:09:347
Duration for PARSENAME = 00:00:06:870
Duration for COLLATE = 00:00:06:667
October 25, 2012 at 8:24 am
Davin21 (10/25/2012)
How about this?
SELECT *
FROM #testenvironment
WHERE PARSENAME(CAST(yourdata AS NVARCHAR(MAX)),3) IS NULL
I had the same idea, but I added an extra condition because your query will return results with no period (or dot) and with more than 3.
October 25, 2012 at 8:29 am
Luis Cazares (10/25/2012)
I was so close, thanks to the QotD that made me remember the PARSENAME function.My code
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE PARSENAME( CAST(yourData AS NVARCHAR(MAX)),3) IS NULL
AND PARSENAME(CAST(yourData AS NVARCHAR(MAX)), 2) IS NOT NULL
Results using Cadavre's last test environment (without the XML). I deleted the "DBCC execution completed" comments.
Duration for CHARINDEX = 00:00:18:363
Duration for LEN = 00:00:09:347
Duration for PARSENAME = 00:00:06:870
Duration for COLLATE = 00:00:06:667
Very good, that was sort of the idea I was trying to implement with the XML splitter. Didn't think of PARSENAME (foolishly!).
Also, we can get rid of the "DBCC execution completed" comments by adding "WITH NO_INFOMSGS" to the script, like so: -
SET NOCOUNT ON;
IF object_id('tempdb..#testEnvironment') IS NOT NULL
BEGIN
DROP TABLE #testEnvironment;
END;
--1,000,000 Random rows of data
SELECT IDENTITY(INT,1,1) AS ID, CAST(LEFT(yourData, LEN(yourData) - 1) AS NTEXT) AS yourData
INTO #testEnvironment
FROM (SELECT TOP 1000000
REPLICATE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(NEWID(),'-','')
,'0',''),'9',''),'8',''),'7',''),'6',''),'5',''),'4',''),'3',''),'2',''),'1','') + '.',
(ABS(CHECKSUM(NEWID())) % 2) + 2)
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3
)a(yourData);
DECLARE @HOLDER INT, @Duration CHAR(12), @StartTime DATETIME;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM (SELECT ID, yourData, MAX(rn)
FROM (SELECT ID, yourData, split.Part.value('text()[1]', 'VARCHAR(MAX)'), ROW_NUMBER() OVER(PARTITION BY yourData ORDER BY (SELECT NULL))
FROM (SELECT CAST('<p>' + REPLACE(CAST(yourData AS VARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.','</p><p>') + '</p>' AS XML),
CAST(yourData AS VARCHAR(MAX)), ID
FROM #testEnvironment) innerQ(xmlField, yourData, ID)
CROSS APPLY innerQ.xmlField.nodes('p') split(Part)
) a(ID, yourData,splitData,rn)
GROUP BY ID, yourData
) a(ID, yourData, rn)
WHERE rn = 2;
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for ugly xml split = %s',0,1,@Duration) WITH NOWAIT;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE CHARINDEX('.', yourData, CHARINDEX('.', yourData) + 1) = 0;
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for CHARINDEX = %s',0,1,@Duration) WITH NOWAIT;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for LEN = %s',0,1,@Duration) WITH NOWAIT;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE CHARINDEX('.', yourData COLLATE Latin1_General_BIN2, CHARINDEX('.', yourData COLLATE Latin1_General_BIN2) + 1) = 0;
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for CHARINDEX with COLLATE = %s',0,1,@Duration) WITH NOWAIT;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE PARSENAME(CAST(yourData AS NVARCHAR(MAX)),3) IS NULL
AND PARSENAME(CAST(yourData AS NVARCHAR(MAX)), 2) IS NOT NULL;
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for PARSENAME = %s',0,1,@Duration) WITH NOWAIT;
Duration for ugly xml split = 00:00:47:197
Duration for CHARINDEX = 00:00:07:593
Duration for LEN = 00:00:08:163
Duration for CHARINDEX with COLLATE = 00:00:04:787
Duration for PARSENAME = 00:00:05:193
Luis Cazares (10/25/2012)
Davin21 (10/25/2012)
How about this?
SELECT *
FROM #testenvironment
WHERE PARSENAME(CAST(yourdata AS NVARCHAR(MAX)),3) IS NULL
I had the same idea, but I added an extra condition because your query will return results with no period (or dot) and with more than 3.
I've got the agree with Luis, it's a good idea but it could return incorrect results.
October 25, 2012 at 1:07 pm
Hi
I thought I would add in a LIKE query to see how that compared. It wasn't as bad as I thought it would be.
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE yourData like '%.%' and yourData not like '%.%.%';
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for LIKE = %s',0,1,@Duration) WITH NOWAIT;
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
SELECT @StartTime = GETDATE();
SELECT @HOLDER = ID
FROM #testEnvironment
WHERE yourData COLLATE Latin1_General_BIN2 like '%.%' and yourData COLLATE Latin1_General_BIN2 not like '%.%.%';
SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);
RAISERROR('Duration for LIKE = %s',0,1,@Duration) WITH NOWAIT;
And got the following
Duration for CHARINDEX = 00:00:09:103
Duration for LEN = 00:00:10:790
Duration for CHARINDEX with COLLATE = 00:00:06:507
Duration for PARSENAME = 00:00:07:717
Duration for LIKE = 00:00:10:703
Duration for LIKE with COLLATE = 00:00:06:517
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply