Function to return all separators from text column

  • Hello,

    I need some help , if possible, with a query that will return all the separators from a text column.

    Basically I have a text column with different row separators and words separators and I need a way to identify all of them.

    Any help will be much apreciated,

    Thank you!

  • What would define as "word separator"?

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Basically I need to find all the characters that are between one word and another.

    So every character that is different from [a-z].

    For example in a text like:

    "4 filets de rascasses,<br />\r4 petits rougets de roche,<br />\r4 morceaux de raie,<br />\r4 morceaux de bar"

    The output should be:

    ' '

    ,

    <br />\r4

    But my problem is the text is extremely wide and I cannot identify all the separators, which can be everything different from a consequently appeareance of [a-z]

  • here's just one way to do it;

    this strips out every char that is a-z,A-Z and 0-9, leaving, i assume, what would be the word separators, like space, dash, comma, etc.

    /*--Results

    - - , ,

    -

    - ()

    */

    SELECT dbo.StripAlphaNumeric(YOURCOLUMNS) FROM YOURTABLE

    the scalar function:

    CREATE FUNCTION StripAlphaNumeric(@OriginalText VARCHAR(8000))

    RETURNS VARCHAR(8000)

    BEGIN

    DECLARE @CleanedText VARCHAR(8000)

    ;WITH tally (N) as

    (SELECT TOP 10000 row_number() OVER (ORDER BY sc1.id)

    FROM Master.dbo.SysColumns sc1

    CROSS JOIN Master.dbo.SysColumns sc2)

    SELECT @CleanedText = ISNULL(@CleanedText,'') +

    CASE

    --ascii numbers are 48(for '0') thru 57 (for '9')

    WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 48 AND 57

    THEN ''

    --ascii upper case letters A-Z is 65 thru 90

    WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 65 AND 90

    THEN ''

    --ascii lower case letters a-z is 97 thru 122

    WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 97 AND 122

    THEN''

    ELSE SUBSTRING(@OriginalText,Tally.N,1) END

    FROM tally WHERE Tally.N <= LEN(@OriginalText)

    RETURN @CleanedText

    END

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • There is no way to automaticaly identify such separtors.

    You can find all non-letter characters, but SQL will not know for granted that <BR > is a separator, as B & R are letters. Also, you may find all other html tags in it.

    Google for "remove html tags SQL" you should find some info on it...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Lowell (3/1/2012)


    here's just one way to do it;

    this strips out every char that is a-z,A-Z and 0-9, leaving, i assume, what would be the word separators, like space, dash, comma, etc.

    /*--Results

    - - , ,

    -

    - ()

    */

    SELECT dbo.StripAlphaNumeric(YOURCOLUMNS) FROM YOURTABLE

    the scalar function:

    CREATE FUNCTION StripAlphaNumeric(@OriginalText VARCHAR(8000))

    RETURNS VARCHAR(8000)

    BEGIN

    DECLARE @CleanedText VARCHAR(8000)

    ;WITH tally (N) as

    (SELECT TOP 10000 row_number() OVER (ORDER BY sc1.id)

    FROM Master.dbo.SysColumns sc1

    CROSS JOIN Master.dbo.SysColumns sc2)

    SELECT @CleanedText = ISNULL(@CleanedText,'') +

    CASE

    --ascii numbers are 48(for '0') thru 57 (for '9')

    WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 48 AND 57

    THEN ''

    --ascii upper case letters A-Z is 65 thru 90

    WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 65 AND 90

    THEN ''

    --ascii lower case letters a-z is 97 thru 122

    WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 97 AND 122

    THEN''

    ELSE SUBSTRING(@OriginalText,Tally.N,1) END

    FROM tally WHERE Tally.N <= LEN(@OriginalText)

    RETURN @CleanedText

    END

    Your function will remove letters from html tags. OP wants to remove tags completely.

    It could be anything: html tags, rtf tags and so on...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • ralu_k_17 (3/1/2012)


    Basically I need to find all the characters that are between one word and another.

    So every character that is different from [a-z].

    For example in a text like:

    "4 filets de rascasses,<br />\r4 petits rougets de roche,<br />\r4 morceaux de raie,<br />\r4 morceaux de bar"

    The output should be:

    ' '

    ,

    <br />\r4

    But my problem is the text is extremely wide and I cannot identify all the separators, which can be everything different from a consequently appeareance of [a-z]

    What about "4? Shouldn't that be on your list? It certainly does not follow the rule of [a-z]. How did you want to handle numbers [0-9]?? Seems that you need a lot more clarification on your requirements.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I apologize for confusions.

    I need the letters to.

    What I do have is a text column with ingredients from certain recipes. And all the ingredients should be separated only by ','.

    At this point, I have all kinds of separators (ex. '-', ';', '<br \>', '\r').

    I cannont identify all the separtors just by looking at the column, because a have a lot of records.

    So, I kind of need a way to identify everything that could be a separator and replace it with ',' (one separator for all rows).

    As an example:

    - 4 oranges,<br />\r- 4 feuilles de menthe,<br />\r- 2 cuillères à soupe de sucre,<br />\r- 1/2 cuillère à café de cannelle,<br />\r- 1 cuillère à soupe d'eau de fleur d'oranger,<br />\r- 100 grammes de noix concassées.

    - 4 darnes de cabillaud de 250 à 300 g\r- 20 cl de fumet de poisson\r- 2 échalotes grises\r- 100 g de mie de pain fraîche\r- 100 g de beurre\r- 1 citron\r- 1 cuillère à soupe de fines herbes fraîches hachées \r- 1 brin de persil\r- 3 brins de cerfeuil \r- 3 brins de ciboulette\r- Sel fin de cuisine\r- Poivre noir en grains au moulin

    (2 list of ingredients)

  • ralu_k_17 (3/1/2012)


    So, I kind of need a way to identify everything that could be a separator and replace it with ',' (one separator for all rows).

    You simply are going to have to come up with something as a way knowing what is a separator and what isn't. This is sort of like defining a process with no rules.

    You have already identified 4 so at least you have a start. What about possible characters that are not visible like {tab} or carriage returns and line feeds?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I modified the function dbo.StripAlphaNumeric to also eliminate characters with ASCII code 192-> 252 because I have some of those there that I do not consider separators.

    Then used the function to identify some of the separators and then I did a list of updates where I replaced every found separator with ',' and then eliminate multiple spaces and multiple commas.

    After those updates there were some visible anomalies that I also fixed with replace.

    Because it has no rules, that was all I could think of, taken it step by step.

    From random checks it seems to be ok.

    Thank you very much for all your help!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply