September 14, 2016 at 12:43 pm
I have a string splitting function I am using but now I am running into an issue where people put commas into numbers in a varchar field, and the string is being split on the commas in the number field.
The current function is as follows .
[Code]
GO
/****** Object: UserDefinedFunction [dbo].[Split] Script Date: 09/14/16 2:39:48 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[Split] (
@InputString VARCHAR(8000),
@Delimiter VARCHAR(50)
)
RETURNS @Items TABLE (
Item VARCHAR(8000)
)
AS
BEGIN
IF @Delimiter = ' '
BEGIN
SET @Delimiter = ','
SET @InputString = REPLACE(@InputString, ' ', @Delimiter)
END
IF (@Delimiter IS NULL OR @Delimiter = '')
SET @Delimiter = ','
--INSERT INTO @Items VALUES (@Delimiter) -- Diagnostic
--INSERT INTO @Items VALUES (@InputString) -- Diagnostic
DECLARE @Item VARCHAR(8000)
DECLARE @ItemList VARCHAR(8000)
DECLARE @DelimIndex INT
SET @ItemList = @InputString
SET @DelimIndex = CHARINDEX(@Delimiter, @ItemList, 0)
WHILE (@DelimIndex != 0)
BEGIN
SET @Item = SUBSTRING(@ItemList, 0, @DelimIndex)
INSERT INTO @Items VALUES (@Item)
-- Set @ItemList = @ItemList minus one less item
SET @ItemList = SUBSTRING(@ItemList, @DelimIndex+1, LEN(@ItemList)-@DelimIndex)
SET @DelimIndex = CHARINDEX(@Delimiter, @ItemList, 0)
END -- End WHILE
IF @Item IS NOT NULL -- At least one delimiter was encountered in @InputString
BEGIN
SET @Item = @ItemList
INSERT INTO @Items VALUES (@Item)
END
-- No delimiters were encountered in @InputString, so just return @InputString
ELSE INSERT INTO @Items VALUES (@InputString)
RETURN
END -- End Function
[/code]
I would like to add something into this function to ignore the delimiter if it is in a situation where there is a number on either side . I know for a LIKE statement, I could do something along the lines of LIKE '%[0-9],[0-9]%'. However, I am not really sure how I could add a check of that nature to this function.
September 14, 2016 at 1:12 pm
Clean up your data before submitting it to your function.
DECLARE @input VARCHAR(250) = 'abc,def,ghi,123,jkl,345,678,mno,901,234,567,pqr'
WHILE PATINDEX('%[0-9],[0-9]%', @input) > 0
SET @input = STUFF(@input, PATINDEX('%[0-9],[0-9]%', @input) + 1, 1, '')
You should also change your function to something that performs much better. Search this site for DelimitedSplit8K.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
September 14, 2016 at 1:19 pm
September 14, 2016 at 1:26 pm
drew.allen (9/14/2016)
Clean up your data before submitting it to your function.
DECLARE @input VARCHAR(250) = 'abc,def,ghi,123,jkl,345,678,mno,901,234,567,pqr'
WHILE PATINDEX('%[0-9],[0-9]%', @input) > 0
SET @input = STUFF(@input, PATINDEX('%[0-9],[0-9]%', @input) + 1, 1, '')
You should also change your function to something that performs much better. Search this site for DelimitedSplit8K.
Drew
I don't really understand how this works. The only part I really get is that the PATINDEX looks for that pattern, but I am not really sure how the STUFF portion of this works . I'd like to have a better understanding of what I am doing here.
September 14, 2016 at 1:42 pm
Steven.Grzybowski (9/14/2016)
drew.allen (9/14/2016)
Clean up your data before submitting it to your function.
DECLARE @input VARCHAR(250) = 'abc,def,ghi,123,jkl,345,678,mno,901,234,567,pqr'
WHILE PATINDEX('%[0-9],[0-9]%', @input) > 0
SET @input = STUFF(@input, PATINDEX('%[0-9],[0-9]%', @input) + 1, 1, '')
You should also change your function to something that performs much better. Search this site for DelimitedSplit8K.
Drew
I don't really understand how this works. The only part I really get is that the PATINDEX looks for that pattern, but I am not really sure how the STUFF portion of this works . I'd like to have a better understanding of what I am doing here.
The code searches for the pattern {digit}{comma}{digit} and returns the position in the string where the pattern is found. The STUFF function replaces the character at the found position + 1 with an empty string (so it removes the comma). The WHILE bit makes sure that all of the commas which match the search pattern are removed.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
September 14, 2016 at 3:53 pm
Alternatively, you could create a specialized version of DelimitedSplit8K_LEAD[/url] that handles this (I'll let you come up with a better name;-)). My modification is bold/underlined.
CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD_XX]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Tableβ produces values from 0 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
AND (SUBSTRING(@pString,t.N-1,3) NOT LIKE CONCAT('[0-9]',@pDelimiter,'[0-9]'))
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
FROM cteStart s;
Example:
DECLARE @input VARCHAR(250) = 'abc,def,ghi,123,jkl,345,678,mno,901,234,567,pqr';
SELECT * FROM dbo.DelimitedSplit8K_LEAD_XX(@input,',');
Results:
ItemNumber Item
-------------------- -------------
1 abc
2 def
3 ghi
4 123
5 jkl
6 345,678
7 mno
8 901,234,567
9 pqr
-- Itzik Ben-Gan 2001
September 14, 2016 at 4:33 pm
Best way to fix this is to have a pork chop dinner with the people that are providing the data. π
--Jeff Moden
Change is inevitable... Change for the better is not.
September 14, 2016 at 6:43 pm
Some questions, since you submitted no sample data.
The problem involves two or more numbers being strung together like this:
23,346
246,802
7,890,123
23,346,246,802,7,890,123
Questions:
First, does this varchar column contain ONLY numbers?
Second, do all the numbers contain decimals as well as commas?
Third, can you go back to the source of this data and tell them they have to use another delimiter instead of commas? "|" might be a good one.
Edited to add: The best answer is #3. If they say no to that then pork chops should definitely be on the menu. The data entry is responsible for making the data digestible. Garbage in, garbage out.
Edited to add question #4: Are all the numbers the same number of digits?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 15, 2016 at 6:55 am
Jeff Moden (9/14/2016)
Best way to fix this is to have a pork chop dinner with the people that are providing the data. π
I wish I could. The problem is that the comma delimited data are a result of a CRM, and I am pretty sure I would have a revolt by the account executives if I told them to not put commas in their numbers.
September 15, 2016 at 7:00 am
Steven.Grzybowski (9/15/2016)
Jeff Moden (9/14/2016)
Best way to fix this is to have a pork chop dinner with the people that are providing the data. πI wish I could. The problem is that the comma delimited data are a result of a CRM, and I am pretty sure I would have a revolt by the account executives if I told them to not put commas in their numbers.
So change the column delimiter to something which does not otherwise appear. | is often a good candidate.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
September 15, 2016 at 7:21 am
Phil Parkin (9/15/2016)
Steven.Grzybowski (9/15/2016)
Jeff Moden (9/14/2016)
Best way to fix this is to have a pork chop dinner with the people that are providing the data. πI wish I could. The problem is that the comma delimited data are a result of a CRM, and I am pretty sure I would have a revolt by the account executives if I told them to not put commas in their numbers.
So change the column delimiter to something which does not otherwise appear. | is often a good candidate.
+1
-- Itzik Ben-Gan 2001
September 15, 2016 at 10:15 am
Hey, I want a +1 too !!
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 20, 2016 at 9:49 am
So change the column delimiter to something which does not otherwise appear. | is often a good candidate.
or better yet fix the input routines to strip out the superfluous commas before they even get to the database. If they need to display them use an appropriate format.
Store numeric data in numeric columns not character ones.
I suppose you should be grateful you don't need to handle values formatted for France as well. i.e. 1.000.000,000.000 :w00t:
September 20, 2016 at 10:00 am
crmitchell (9/20/2016)
I suppose you should be grateful you don't need to handle values formatted for France as well. i.e. 1.000.000,000.000 :w00t:
I never understand numbers that have multiple periods in the middle of them...
I mean, i gave you the number "123.456.789" what number is that?
1. One hundred and twenty three million, four hundred and fifty six thousand, seven hundred and eighty nine
2. One hundred and twenty three thousand, four hundred and fifty six, point seven eight nine
3. One hundred and twenty three, point four five six seven eight nine
4. It's not a number, of some kind of GPS co-ordinate
:pinch:
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
September 20, 2016 at 1:03 pm
Just for the fun of it... Performance is about the same as what Alan posted previously...
DECLARE
@String VARCHAR(250) = '123,346,abc,def,ghi,123,jkl,345,678,mno,901,234,567,pqr',
@Delimiter CHAR(1) = ',';
WITH
n (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_Tally (n) AS (
SELECT TOP (DATALENGTH(@String))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM n n1, n n2, n n3, n n4
),
cte_Split AS (
SELECT
t.N,
Val = SUBSTRING(@String, t.N, 1),
LagLead = CONCAT(SUBSTRING(@String, t.N - 1, 1), SUBSTRING(@String, t.N + 1, 1))
FROM
cte_Tally t
),
cte_Starts AS (
SELECT N = 0
UNION ALL
SELECT
s.N
FROM
cte_Split s
WHERE
s.Val = @Delimiter
AND TRY_CAST(s.LagLead AS INT) IS NULL
)
SELECT
ItemNumber = ROW_NUMBER() OVER (ORDER BY s.N),
Item = SUBSTRING(@String, s.n +1, ISNULL(LEAD(s.n, 1) OVER (ORDER BY s.N), 8000) - s.n -1)
FROM
cte_Starts s;
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply