July 1, 2013 at 8:58 am
I have a field with data like below I need to join on
patid
123453
124344ME
323390
3233MS
I need to remove the last 2 charcters if they are not numeric. So my results for the join should be
patid
123453
124344
323390
3233
July 1, 2013 at 9:05 am
timscronin (7/1/2013)
I have a field with data like below I need to join onpatid
123453
124344ME
323390
3233MS
I need to remove the last 2 charcters if they are not numeric. So my results for the join should be
patid
123453
124344
323390
3233
What would you do with "1234C5" and "12345C" ??
July 1, 2013 at 9:11 am
wont happen, the records with the 2 characters at the end are state abbreviations added on
July 1, 2013 at 9:13 am
A possible solution would be like this. I'm worried about performance though. Couldn't you correct the data in the first place?
CREATE TABLE #Table1( patid varchar(10))
CREATE TABLE #Table2( patid varchar(10))
INSERT INTO #Table1 VALUES('123453'),('124344ME'),('323390'),('3233MS')
INSERT INTO #Table2 VALUES('123453'),('124344'),('323390'),('3233')
SELECT *
FROM #Table1 a
JOIN #Table2 b ON CASE WHEN a.patid NOT LIKE '%[^0-9]%' THEN a.patid ELSE LEFT(a.patid, LEN(a.patid) - 2) END = b.patid
DROP TABLE #Table1
DROP TABLE #Table2
July 1, 2013 at 9:15 am
Here is one more way:
create table Demo (Col1 varchar(10))
go
insert into Demo(Col1) values ('123453'),('124344ME'),('123453'),('124344ME'),('323390'),('3233MS')
select Col1, substring(Col1,1, CASE WHEN Col1 like '%[aA-zZ]' THEN len(Col1)-2 else len(Col1) end)
from Demo
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
July 1, 2013 at 9:16 am
problem is with data like this (my bad should have included it)
1233IL
22M33
Would exclude 2nd record
July 1, 2013 at 11:05 am
Create some sample data:
IF OBJECT_ID('tempdb..#SampleData') IS NOT NULL
DROP TABLE #SampleData
CREATE TABLE #SampleData (
[ID] INT IDENTITY(1,1) NOT NULL,
[strVal] NVARCHAR(50) NULL,
PRIMARY KEY (ID))
INSERT INTO #SampleData
VALUES
('123453ME'),('124344'),('323390'),('32339826455MS'),
('345678'),('39WA'),('986545'),('1234AK'),
('345XX8'),('398644GA'),('F986545'),('98234AZ')
Now query the data and strip off the unwanted suffixes
as well as eliminating any entries with an embedded alpha char.
SELECT
r.strVal
FROM
(
SELECT
(CASE
WHEN PATINDEX('%[A-Za-z]%',RIGHT(strVal,2)) > 0
THEN REPLACE(strVal,RIGHT(strVal,2),'')
ELSE strVal
END) AS strVal
FROM
#SampleData AS sd
) r
WHERE
PATINDEX('%[A-Za-z]%',strVal) = 0
July 1, 2013 at 2:25 pm
SELECT
mt.##col1##, ..., jt.##co11##, ...
FROM dbo.maintable mt
LEFT OUTER JOIN jointable jt ON
--ignore last two bytes of patid if they are alpha, because it's a state abbrev, not part of key value
jt.key = LEFT(mt.patid, LEN(mt.patid) - CASE WHEN RIGHT(mt.patid, 2) LIKE '[a-z][a-z]' THEN 2 ELSE 0 END)
Edit: Added sql code tags to colorize code.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 1, 2013 at 5:08 pm
timscronin (7/1/2013)
wont happen, the records with the 2 characters at the end are state abbreviations added on
Then it's easy... Build a "State" table (you should have one anyway) and the your WHERE clause would look like the following...
WHERE RIGHT(PatID,2) IN (SELECT StateAbbv FROM dbo.State)
Be advised that no matter what you do, performance is going to stink because of the PatID column being wrapped in a function. What I'd recommend doing if you need to do this a lot is to make the RIGHT(Pat,2) a PERSISTED calculated column in the table and index it.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 2, 2013 at 9:22 am
I use the following SVF to extract integers only from phone number columns. So, you would run the following and it would return only the numbers from your patid column.
select dbo.ExtractInteger(patid) as [patid]
from table
USE [Sample]
GO
/****** Object: UserDefinedFunction [dbo].[ExtractInteger] Script Date: 07/02/2013 11:17:35 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[ExtractInteger](@String VARCHAR(2000))
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @Count INT
DECLARE @IntNumbers VARCHAR(1000)
SET @Count = 0
SET @IntNumbers = ''
WHILE @Count <= LEN(@String)
BEGIN
IF SUBSTRING(@String,@Count,1) >= '0'
AND SUBSTRING(@String,@Count,1) <= '9'
BEGIN
SET @IntNumbers = @IntNumbers + SUBSTRING(@String,@Count,1)
END
SET @Count = @Count + 1
END
RETURN @IntNumbers
END
July 2, 2013 at 6:39 pm
erikd (7/2/2013)
I use the following SVF to extract integers only from phone number columns. So, you would run the following and it would return only the numbers from your patid column.
select dbo.ExtractInteger(patid) as [patid]
from table
USE [Sample]
GO
/****** Object: UserDefinedFunction [dbo].[ExtractInteger] Script Date: 07/02/2013 11:17:35 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[ExtractInteger](@String VARCHAR(2000))
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @Count INT
DECLARE @IntNumbers VARCHAR(1000)
SET @Count = 0
SET @IntNumbers = ''
WHILE @Count <= LEN(@String)
BEGIN
IF SUBSTRING(@String,@Count,1) >= '0'
AND SUBSTRING(@String,@Count,1) <= '9'
BEGIN
SET @IntNumbers = @IntNumbers + SUBSTRING(@String,@Count,1)
END
SET @Count = @Count + 1
END
RETURN @IntNumbers
END
SVFs are notoriouly poor for performance, even when measured correctly.
Initially I'd do it like this. Until that is I had the time to go back and determine the fastest method.
WITH Phones (phone) AS (
SELECT '66814382803' UNION ALL SELECT '123A123456789'
UNION ALL SELECT '0814382803'
)
SELECT Item
FROM Phones
CROSS APPLY dbo.PatternSplitCM(phone, '[0-9]')
WHERE [Matched] = 1 AND LEN(Item) BETWEEN 7 AND 11
I am curious what your function would return for that second phone number in my list.
PatternSplitCM is a iTVF that can be found in the 4th article in my signature links. Since it is a general purpose function, much faster ways could probably be devised that would get you to the same result.
Note that PatternSplitCM would also work for the OP's case.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
July 2, 2013 at 6:51 pm
dwain.c (7/2/2013)
erikd (7/2/2013)
I use the following SVF to extract integers only from phone number columns. So, you would run the following and it would return only the numbers from your patid column.
select dbo.ExtractInteger(patid) as [patid]
from table
USE [Sample]
GO
/****** Object: UserDefinedFunction [dbo].[ExtractInteger] Script Date: 07/02/2013 11:17:35 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[ExtractInteger](@String VARCHAR(2000))
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @Count INT
DECLARE @IntNumbers VARCHAR(1000)
SET @Count = 0
SET @IntNumbers = ''
WHILE @Count <= LEN(@String)
BEGIN
IF SUBSTRING(@String,@Count,1) >= '0'
AND SUBSTRING(@String,@Count,1) <= '9'
BEGIN
SET @IntNumbers = @IntNumbers + SUBSTRING(@String,@Count,1)
END
SET @Count = @Count + 1
END
RETURN @IntNumbers
END
SVFs are notoriouly poor for performance, even when measured correctly.
Initially I'd do it like this. Until that is I had the time to go back and determine the fastest method.
WITH Phones (phone) AS (
SELECT '66814382803' UNION ALL SELECT '123A123456789'
UNION ALL SELECT '0814382803'
)
SELECT Item
FROM Phones
CROSS APPLY dbo.PatternSplitCM(phone, '[0-9]')
WHERE [Matched] = 1 AND LEN(Item) BETWEEN 7 AND 11
I am curious what your function would return for that second phone number in my list.
PatternSplitCM is a iTVF that can be found in the 4th article in my signature links. Since it is a general purpose function, much faster ways could probably be devised that would get you to the same result.
Note that PatternSplitCM would also work for the OP's case.
Hi Dwain,
I work with a lot of tables, but none of them are very large. The largest one I currently deal with is for a monthly tracker that is recently up to around 300k rows, so I don't notice a lot (if any) performance increases that I see people talk about. So, for my purposes, the SVF works well.
Here's what it returns from the second number you posted:
select dbo.extractinteger('123A123456789') as [For Dwain]
For Dwain
123123456789
I have your string splitter link open in a new tab. Will read and get back to you when I've tried it out a bit.
July 2, 2013 at 7:18 pm
erikd (7/2/2013)
I work with a lot of tables, but none of them are very large.
Please don't get me wrong when I say this. That is today. Once your application is successful beyond your wildest dreams and your tables start having many millions of rows, you'll probably be forced to rethink that statement. Performance should rarely be measured as "good enough," mainly because by the time it becomes "not good enough" it will be a serious challenge to make it good enough once again.
erikd (7/2/2013)
Here's what it returns from the second number you posted:
select dbo.extractinteger('123A123456789') as [For Dwain]
For Dwain
123123456789
I have your string splitter link open in a new tab. Will read and get back to you when I've tried it out a bit.
I knew the answer but what I was wondering is does that seem like the right thing to do? For example suppose the string was 123456789x1231 where x denotes an extension.
I hope you find value in the article. Like I said, I consider PatternSplitCM to be sort of a stopgap even though for the task it does it performs reasonably well. Most of the time, you'll find you can construct a more direct approach to a particular problem that will run faster.
PatternSplitCM will get you to working code faster but with a little more effort you can get your code working faster.
Did I just say that? 😉
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
July 2, 2013 at 7:48 pm
dwain.c (7/2/2013)
erikd (7/2/2013)
I work with a lot of tables, but none of them are very large.Please don't get me wrong when I say this. That is today. Once your application is successful beyond your wildest dreams and your tables start having many millions of rows, you'll probably be forced to rethink that statement. Performance should rarely be measured as "good enough," mainly because by the time it becomes "not good enough" it will be a serious challenge to make it good enough once again.
erikd (7/2/2013)
Here's what it returns from the second number you posted:
select dbo.extractinteger('123A123456789') as [For Dwain]
For Dwain
123123456789
I have your string splitter link open in a new tab. Will read and get back to you when I've tried it out a bit.
I knew the answer but what I was wondering is does that seem like the right thing to do? For example suppose the string was 123456789x1231 where x denotes an extension.
I hope you find value in the article. Like I said, I consider PatternSplitCM to be sort of a stopgap even though for the task it does it performs reasonably well. Most of the time, you'll find you can construct a more direct approach to a particular problem that will run faster.
PatternSplitCM will get you to working code faster but with a little more effort you can get your code working faster.
Did I just say that? 😉
No, you're right. Good enough isn't good enough. What I was getting at is that I don't have a table large enough to test good enough against better or best. I don't even have a particularly good testing environment (a SQL 2005 instance on a rather ancient and rickety server that half a dozen people RDP into at a time). But, a guy can dream, when he's done dreaming about Johnnie Walker Blue and rare steaks.
Phone numbers with extensions are screened out, because they can't be auto dialed, so it's GENERALLY not an issue, though it's caused a few headaches when clients send over their contact lists with everything plopped into one column.
I am going to try to thumb wrestle your pattern splitter to do what mine does and just return the cleaned result.
July 2, 2013 at 8:07 pm
Erik - Just because your Prod system doesn't have a million rows to test against, doesn't mean you can't test against a million rows!
The test harness in the article demonstrates the generic approach for this (which I do not take credit for by the way).
This would directly compare PatternSplitCM against your SVF.
CREATE TABLE #Phones (phone VARCHAR(12));
WITH Tally (n) AS (
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
INSERT INTO #Phones
SELECT 1000000+ABS(CHECKSUM(NEWID()))%1000000
FROM Tally;
DECLARE @Phone VARCHAR(12), @StartDT DATETIME;
PRINT 'PatternSplitCM';
SELECT @StartDT = GETDATE();
SET STATISTICS TIME ON;
SELECT @Phone=Item
FROM #Phones
CROSS APPLY dbo.PatternSplitCM(phone, '[0-9]')
WHERE [Matched]=1;
SET STATISTICS TIME OFF;
SELECT ElapsedMS=DATEDIFF(millisecond, @StartDT, GETDATE());
PRINT 'extractinteger';
SELECT @StartDT = GETDATE();
SET STATISTICS TIME ON;
SELECT @Phone=dbo.extractinteger(phone)
FROM #Phones;
SET STATISTICS TIME OFF;
SELECT ElapsedMS=DATEDIFF(millisecond, @StartDT, GETDATE());
GO
DROP TABLE #Phones;
Edit: I modified it to SELECT the ElapsedMS of each run, which is needed due to a bias that is introduced by applying STATISTICS TIME to an SVF as demonstrated here: http://www.sqlservercentral.com/articles/T-SQL/91724/
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply