March 1, 2012 at 8:41 am
Hello,
I need some help , if possible, with a query that will return all the separators from a text column.
Basically I have a text column with different row separators and words separators and I need a way to identify all of them.
Any help will be much apreciated,
Thank you!
March 1, 2012 at 8:52 am
What would define as "word separator"?
March 1, 2012 at 8:55 am
Basically I need to find all the characters that are between one word and another.
So every character that is different from [a-z].
For example in a text like:
"4 filets de rascasses,<br />\r4 petits rougets de roche,<br />\r4 morceaux de raie,<br />\r4 morceaux de bar"
The output should be:
' '
,
<br />\r4
But my problem is the text is extremely wide and I cannot identify all the separators, which can be everything different from a consequently appeareance of [a-z]
March 1, 2012 at 9:03 am
here's just one way to do it;
this strips out every char that is a-z,A-Z and 0-9, leaving, i assume, what would be the word separators, like space, dash, comma, etc.
/*--Results
- - , ,
-
- ()
*/
SELECT dbo.StripAlphaNumeric(YOURCOLUMNS) FROM YOURTABLE
the scalar function:
CREATE FUNCTION StripAlphaNumeric(@OriginalText VARCHAR(8000))
RETURNS VARCHAR(8000)
BEGIN
DECLARE @CleanedText VARCHAR(8000)
;WITH tally (N) as
(SELECT TOP 10000 row_number() OVER (ORDER BY sc1.id)
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2)
SELECT @CleanedText = ISNULL(@CleanedText,'') +
CASE
--ascii numbers are 48(for '0') thru 57 (for '9')
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 48 AND 57
THEN ''
--ascii upper case letters A-Z is 65 thru 90
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 65 AND 90
THEN ''
--ascii lower case letters a-z is 97 thru 122
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 97 AND 122
THEN''
ELSE SUBSTRING(@OriginalText,Tally.N,1) END
FROM tally WHERE Tally.N <= LEN(@OriginalText)
RETURN @CleanedText
END
Lowell
March 1, 2012 at 9:07 am
There is no way to automaticaly identify such separtors.
You can find all non-letter characters, but SQL will not know for granted that <BR > is a separator, as B & R are letters. Also, you may find all other html tags in it.
Google for "remove html tags SQL" you should find some info on it...
March 1, 2012 at 9:10 am
Lowell (3/1/2012)
here's just one way to do it;this strips out every char that is a-z,A-Z and 0-9, leaving, i assume, what would be the word separators, like space, dash, comma, etc.
/*--Results
- - , ,
-
- ()
*/
SELECT dbo.StripAlphaNumeric(YOURCOLUMNS) FROM YOURTABLE
the scalar function:
CREATE FUNCTION StripAlphaNumeric(@OriginalText VARCHAR(8000))
RETURNS VARCHAR(8000)
BEGIN
DECLARE @CleanedText VARCHAR(8000)
;WITH tally (N) as
(SELECT TOP 10000 row_number() OVER (ORDER BY sc1.id)
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2)
SELECT @CleanedText = ISNULL(@CleanedText,'') +
CASE
--ascii numbers are 48(for '0') thru 57 (for '9')
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 48 AND 57
THEN ''
--ascii upper case letters A-Z is 65 thru 90
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 65 AND 90
THEN ''
--ascii lower case letters a-z is 97 thru 122
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 97 AND 122
THEN''
ELSE SUBSTRING(@OriginalText,Tally.N,1) END
FROM tally WHERE Tally.N <= LEN(@OriginalText)
RETURN @CleanedText
END
Your function will remove letters from html tags. OP wants to remove tags completely.
It could be anything: html tags, rtf tags and so on...
March 1, 2012 at 9:41 am
ralu_k_17 (3/1/2012)
Basically I need to find all the characters that are between one word and another.So every character that is different from [a-z].
For example in a text like:
"4 filets de rascasses,<br />\r4 petits rougets de roche,<br />\r4 morceaux de raie,<br />\r4 morceaux de bar"
The output should be:
' '
,
<br />\r4
But my problem is the text is extremely wide and I cannot identify all the separators, which can be everything different from a consequently appeareance of [a-z]
What about "4? Shouldn't that be on your list? It certainly does not follow the rule of [a-z]. How did you want to handle numbers [0-9]?? Seems that you need a lot more clarification on your requirements.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 1, 2012 at 11:49 pm
I apologize for confusions.
I need the letters to.
What I do have is a text column with ingredients from certain recipes. And all the ingredients should be separated only by ','.
At this point, I have all kinds of separators (ex. '-', ';', '<br \>', '\r').
I cannont identify all the separtors just by looking at the column, because a have a lot of records.
So, I kind of need a way to identify everything that could be a separator and replace it with ',' (one separator for all rows).
As an example:
- 4 oranges,<br />\r- 4 feuilles de menthe,<br />\r- 2 cuillères à soupe de sucre,<br />\r- 1/2 cuillère à café de cannelle,<br />\r- 1 cuillère à soupe d'eau de fleur d'oranger,<br />\r- 100 grammes de noix concassées.
- 4 darnes de cabillaud de 250 à 300 g\r- 20 cl de fumet de poisson\r- 2 échalotes grises\r- 100 g de mie de pain fraîche\r- 100 g de beurre\r- 1 citron\r- 1 cuillère à soupe de fines herbes fraîches hachées \r- 1 brin de persil\r- 3 brins de cerfeuil \r- 3 brins de ciboulette\r- Sel fin de cuisine\r- Poivre noir en grains au moulin
(2 list of ingredients)
March 2, 2012 at 7:22 am
ralu_k_17 (3/1/2012)
So, I kind of need a way to identify everything that could be a separator and replace it with ',' (one separator for all rows).
You simply are going to have to come up with something as a way knowing what is a separator and what isn't. This is sort of like defining a process with no rules.
You have already identified 4 so at least you have a start. What about possible characters that are not visible like {tab} or carriage returns and line feeds?
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 2, 2012 at 7:32 am
I modified the function dbo.StripAlphaNumeric to also eliminate characters with ASCII code 192-> 252 because I have some of those there that I do not consider separators.
Then used the function to identify some of the separators and then I did a list of updates where I replaced every found separator with ',' and then eliminate multiple spaces and multiple commas.
After those updates there were some visible anomalies that I also fixed with replace.
Because it has no rules, that was all I could think of, taken it step by step.
From random checks it seems to be ok.
Thank you very much for all your help!
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply