December 2, 2010 at 7:18 am
I wonder if anyone could help me with a problem I'm having.
I have some data in a table that contains many joined words e.g. ThisIsASample
What I would like to to is change it to read This Is A Sample
I have about 5,000 records and don't want to do it manually but cannot figure out how to do it ๐
December 2, 2010 at 7:23 am
Tim-24860 (12/2/2010)
I wonder if anyone could help me with a problem I'm having.I have some data in a table that contains many joined words e.g. ThisIsASample
What I would like to to is change it to read This Is A Sample
I have about 5,000 records and don't want to do it manually but cannot figure out how to do it ๐
Tim, this could be anywhere between trivial and impossible, depending upon the distribution of the words. Can you provide more info? A sample would be awesome.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 2, 2010 at 7:27 am
Chris, thanks for the reply. Here's a small sample
QualificationsLength
DepthMustBeNumeric
HeightMustBeNumeric
WidthMustBeNumeric
DepthNotValid
HeightNotValid
QuantityIsMandatory
QuantityNotValid
WidthNotValid
FactorNotValid
DimensionNotNumeric
DimensionNotValid
StartDateEarlierThanCurrentDate
CostCodeIsMandatory
QuantityMustBeNumeric
InvalidRatioNumeric
MissingRatio
December 2, 2010 at 8:07 am
Try this, Tim:
DROP TABLE #Sentences
CREATE TABLE #Sentences (Sentence VARCHAR(100))
INSERT INTO #Sentences (Sentence)
SELECT 'QualificationsLength' UNION ALL
SELECT 'DepthMustBeNumeric' UNION ALL
SELECT 'HeightMustBeNumeric' UNION ALL
SELECT 'WidthMustBeNumeric' UNION ALL
SELECT 'DepthNotValid' UNION ALL
SELECT 'HeightNotValid' UNION ALL
SELECT 'QuantityIsMandatory' UNION ALL
SELECT 'QuantityNotValid' UNION ALL
SELECT 'WidthNotValid' UNION ALL
SELECT 'FactorNotValid' UNION ALL
SELECT 'DimensionNotNumeric' UNION ALL
SELECT 'DimensionNotValid' UNION ALL
SELECT 'StartDateEarlierThanCurrentDate' UNION ALL
SELECT 'CostCodeIsMandatory' UNION ALL
SELECT 'QuantityMustBeNumeric' UNION ALL
SELECT 'InvalidRatioNumeric' UNION ALL
SELECT 'MissingRatio'
SELECT s.Sentence,
NewSentence = LTRIM(STUFF(STUFF(STUFF(STUFF(STUFF(s.Sentence, S1, 0, ' '), S2, 0, ' '), S3, 0, ' '), S4, 0, ' '), S5, 0, ' '))
FROM #Sentences s
CROSS APPLY (
SELECT
s1 = ISNULL(MAX(CASE WHEN Seq = 1 THEN d.n END), 1),
s2 = ISNULL(MAX(CASE WHEN Seq = 2 THEN d.n END), 1),
s3 = ISNULL(MAX(CASE WHEN Seq = 3 THEN d.n END), 1),
s4 = ISNULL(MAX(CASE WHEN Seq = 4 THEN d.n END), 1),
s5 = ISNULL(MAX(CASE WHEN Seq = 5 THEN d.n END), 1)
FROM (
SELECT n.n, Seq = ROW_NUMBER() OVER (ORDER BY n DESC)
FROM (SELECT TOP 100 n = ROW_NUMBER() OVER (ORDER BY Name) FROM master.dbo.syscolumns) n
WHERE ASCII(SUBSTRING(s.Sentence, n.n, 1)) BETWEEN 65 AND 90) d
) Splitter
Edit:
-- Rough Stats:
-- 1,419,874 rows of test data fixed to max six words
-- all results returned to client in 27 seconds
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 2, 2010 at 8:10 am
Hello you an try the following split script
create table Exceptions (ShortText nvarchar(100))
insert into Exceptions values (
'QualificationsLength'),(
'DepthMustBeNumeric'),(
'HeightMustBeNumeric'),(
'WidthMustBeNumeric'),(
'DepthNotValid'),(
'HeightNotValid'),(
'QuantityIsMandatory'),(
'QuantityNotValid'),(
'WidthNotValid'),(
'FactorNotValid'),(
'DimensionNotNumeric'),(
'DimensionNotValid'),(
'StartDateEarlierThanCurrentDate'),(
'CostCodeIsMandatory'),(
'QuantityMustBeNumeric'),(
'InvalidRatioNumeric'),(
'MissingRatio')
select * from Exceptions
cross apply dbo.CaseSensitiveSQLSplitFunction(Exceptions.ShortText)
You can find the sql source of case sensitive sql split string function here
December 2, 2010 at 9:14 am
Thank you for your replies.
Eralper - I would like the function to return one record not a record per split word.
December 2, 2010 at 11:32 pm
Hello Tim,
Sorry for misunderstanding
Here is an alternative script
CREATE FUNCTION udfSample
(
@STR nvarchar(400)
)
returns nvarchar(400)
begin
declare @i int, @j-2 int, @w nvarchar(400)
select @i = 1, @j-2 = len(@str)
while @i <= @j-2
begin
if substring(@str,@i,1) = UPPER(substring(@str,@i,1)) collate Latin1_General_CS_AS
begin
set @STR = STUFF(@str,@i,0,' ')
select @i = @i + 1, @j-2 = len(@str)
end
set @i = @i + 1
end
return ltrim(@str)
end
GO
select dbo.udfSample('StartDateEarlierThanCurrentDate')
You could also use a sql string concatenation function to bring pieces back as a sentence too.
December 3, 2010 at 3:52 am
My attempt at this one.
declare @v-2 varchar(50)='StartDateEarlierThanCurrentDate'
declare @splitvalues varchar(50)=''
;with cte
as
(
select row_number()over(order by (select 1))id,SUBSTRING(@v,number,1)num,number from master..spt_values
where type='p' and number between 1 and LEN(@v)and SUBSTRING(@v,number,1)=UPPER(SUBSTRING(@v,number,1))
collate Latin1_General_CS_AS_WS
)
select @splitvalues=@splitvalues + ' ' +
substring(@v,c2.number,isnull(c1.number-c2.number,LEN(@v))) from cte c1 right join cte c2 on c1.id=c2.id+1
select @splitvalues
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
December 3, 2010 at 6:42 am
Here's another solution using a slightly modified version of Chis' test table data.
Solution:
;WITH TrimmedSentences(SentenceID, Sentence) AS
( -- Trim preceding white space of all sentences
SELECT SentenceID, LTRIM(Sentence) FROM #Sentences
),
Tally(N) AS
( -- Generate a numbers table to split the sentences (max. 2048) into characters
SELECT number FROM master.dbo.spt_values WHERE type = 'P'
),
SplittedCharacters(SentenceID, CharNo, C) AS
( -- Generate a row for each character in a sentence including its position
SELECT
S.SentenceID, T.N, SUBSTRING(S.Sentence, T.N, 1)
FROM
TrimmedSentences S
CROSS JOIN
Tally T
WHERE
T.N BETWEEN 1 AND LEN(S.Sentence)
)
SELECT
L.Sentence,
-- Use a FOR XML PATH subquery to concatenate characters back into a single sentence
(
SELECT
CASE
-- Prefix uppercase characters with a space (except the first one)
WHEN C.CharNo > 1 AND ASCII(C) BETWEEN 65 AND 90 THEN
' ' + C
ELSE
C
END
FROM
SplittedCharacters C
WHERE
C.SentenceID = L.SentenceID
ORDER BY
CharNo
FOR XML PATH(''), TYPE
).value('./text()[1]', 'varchar(100)')
FROM
TrimmedSentences L
Test data:
DROP TABLE #Sentences
CREATE TABLE #Sentences (SentenceID INT IDENTITY PRIMARY KEY, Sentence VARCHAR(100))
INSERT INTO #Sentences (Sentence)
SELECT 'Qualifications&Length' UNION ALL
SELECT ' DepthMustBeNumeric' UNION ALL
SELECT 'HeightMustBeNumericAndNotAlphanumericOrDateTime' UNION ALL
SELECT 'WidthMustBeNumeric' UNION ALL
SELECT 'DepthNotValid' UNION ALL
SELECT 'HeightNotValid' UNION ALL
SELECT 'QuantityIsMandatory' UNION ALL
SELECT 'QuantityNotValid' UNION ALL
SELECT 'WidthNotValid' UNION ALL
SELECT 'FactorNotValid' UNION ALL
SELECT 'DimensionNotNumeric' UNION ALL
SELECT 'DimensionNotValid' UNION ALL
SELECT 'StartDateEarlierThanCurrentDate' UNION ALL
SELECT 'CostCodeIsMandatory' UNION ALL
SELECT 'QuantityMustBeNumeric' UNION ALL
SELECT 'InvalidRatioNumeric' UNION ALL
SELECT 'MissingRatio'
December 3, 2010 at 7:11 am
Here's a reasonably performant version. Assumes you have a Numbers table. If not, search for Numbers table and Tally table, and you'll find how to create one, and a ton of different uses for them.
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T;
CREATE TABLE #T (
ID INT IDENTITY PRIMARY KEY,
String VARCHAR(100));
INSERT INTO #T (String)
SELECT 'QualificationsLength' FROM dbo.Numbers UNION ALL
SELECT 'DepthMustBeNumeric' FROM dbo.Numbers UNION ALL
SELECT 'HeightMustBeNumeric' FROM dbo.Numbers UNION ALL
SELECT 'WidthMustBeNumeric' FROM dbo.Numbers UNION ALL
SELECT 'DepthNotValid' FROM dbo.Numbers UNION ALL
SELECT 'HeightNotValid' FROM dbo.Numbers UNION ALL
SELECT 'QuantityIsMandatory' FROM dbo.Numbers UNION ALL
SELECT 'QuantityNotValid' FROM dbo.Numbers UNION ALL
SELECT 'WidthNotValid' FROM dbo.Numbers UNION ALL
SELECT 'FactorNotValid' FROM dbo.Numbers UNION ALL
SELECT 'DimensionNotNumeric' FROM dbo.Numbers UNION ALL
SELECT 'DimensionNotValid' FROM dbo.Numbers UNION ALL
SELECT 'StartDateEarlierThanCurrentDate' FROM dbo.Numbers UNION ALL
SELECT 'CostCodeIsMandatory' FROM dbo.Numbers UNION ALL
SELECT 'QuantityMustBeNumeric' FROM dbo.Numbers UNION ALL
SELECT 'InvalidRatioNumeric' FROM dbo.Numbers UNION ALL
SELECT 'MissingRatio ' FROM dbo.Numbers;
SELECT
(SELECT
CASE WHEN ASCII(SUBSTRING(String, Number, 1)) BETWEEN 65 AND 90 AND Number > 1
THEN ' ' + SUBSTRING(String, Number, 1)
ELSE SUBSTRING(String, Number, 1)
END
FROM DBA.dbo.Numbers
WHERE Number <= LEN(String)
ORDER BY Number
FOR XML PATH(''), TYPE).value('(./text())[1]','varchar(200)')
FROM #T;
Took an average of about 2 seconds to split a 170,000 row table on my desktop computer at work.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
January 3, 2011 at 7:47 am
Everybody - thank you very much for the great suggestions.
Eralper - I went with your one because it suited my requirements the closest.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply