Remove Carriage Returns / Line Breaks / Any other reason the "text" moves down

  • I know this has been covered ad nauseum all over the internet for years. But I have literally tried a dozen different possible queries to remove these and the scripts either run and do nothing, or remove WAAAY too much information.

    I am working with a company to extract data from a program that was written in 2006-2007. The data in the column was entered as client notes. So, each time it looks like they would hit Enter a couple of times, enter a new date and then type the new note:

    3/15/07 Bill says he want the widget

    5/2/12 Bill needs to have the widget refurbished

    And so on.

    The column was originally programmed by the software developer as a Text column. (Nothing I can do about that...) I was able to successfully convert the TEXT column into a varchar(max) column. So far so good.  The Table = ClientDetails, the column = ClientNotes". For whatever reason, it doesn't matter what syntax I use I cannot remove what shows up as a Carriage Return or Line Feed in the Flat File Export. As examples, I have tried:

    REPLACE(REPLACE(REPLACE(REPLACE(ClientNotes, CHAR(9), ' '), CHAR(10), ' '), CHAR(13), ' '), CHAR(13) + CHAR(10), ' ') FROM ClientDetail

    SELECT REPLACE(REPLACE(REPLACE(ClientNotes, CHAR(9), ''), CHAR(10), ''), CHAR(13), '') as 'ClientNotes' FROM ClientDetails

    and others, so many others.

    I find it difficult to understand how something that I would think would be so simple is turning out to be so difficult. It there weren't 400K records, I'd tell them to go through it in the software and remove them manually.

    Any help as to how to do this would be greatly appreciated.

    Thank you

  • Yeah, that code should do it.

    But if you're using the column name "ClientNotes" directly in a SELECT you will get the original column value, not the one after the REPLACEs.  I' suspect that's the issue.  Instead use a CROSS APPLY and assign a separate column name to be sure you get the new value:

    SELECT ca1.ClientNotes
    FROM ClientDetails
    CROSS APPLY (
    SELECT REPLACE(REPLACE(REPLACE(ClientNotes, CHAR(9), ''), CHAR(10), ''), CHAR(13), '') as 'ClientNotes'
    ) AS ca1

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thank you for the reply. I tried that and the values in the original column are still the same.  Should that have output to a new column?

    I created a new varchar(max) column in the table called ClientNotes2. I couldn't figure out how to get the output to be put there.

  • why do you need to remove them to extract them?

    surely you are extracting into a properly defined CSV file and if so that does not matter - unless  you are loading onto a system that prefers to have them removed on the load for unknown reasons.

    if just to be able to view it on notepad without having those breaks then I would not waste to much time with it.

    in any case can you show us exactly what you are doing (sql executed) and the process you use to generate the extract file.

     

  • DCL wrote:

    Thank you for the reply. I tried that and the values in the original column are still the same.  Should that have output to a new column? I created a new varchar(max) column in the table called ClientNotes2. I couldn't figure out how to get the output to be put there.

    If you want to change the data in the table itself, you have to change Michael's code to an UPDATE.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The export is too large to open in Notepad, but opening the file in Word shows the breaks are there after export.

    I am not sure what else you are looking for. I have a Table = ClientDetails, where one of many columns = ClientNotes".

    I need to retain all the data in all of the columns, (FirstName, MiddleName, LastName, Address1 and so on) ClientNotes is not the only column.

    I tried the following two examples along with many more:

    REPLACE(REPLACE(REPLACE(REPLACE(ClientNotes, CHAR(9), ' '), CHAR(10), ' '), CHAR(13), ' '), CHAR(13) + CHAR(10), ' ') FROM ClientDetail

    AND

    SELECT REPLACE(REPLACE(REPLACE(ClientNotes, CHAR(9), ''), CHAR(10), ''), CHAR(13), '') as 'ClientNotes' FROM ClientDetails

    I also tried the query in the first reply, and it failed to update the data in the original column and did not create a new one.

    When the export happens, be it CSV or Excel, the Carriage Returns wreak havoc with the formatting. The page breaks cause a new row throwing the data out of sync with the column headers. After three days of research it seems to be an issue with older SQL databases (Like 2008) where a column is created as TEXT.

    To extract the data:

    Destination: Flat File Destination

    ClientData.txt

    Local: English US

    Code page: 1252 (ANSI - Latin I)

    Format: Delimited

    Text qualifier: <none>

    Checked - Column names in the first data row

    ___________

    Copy data from one or more tables of views

    ___________

    Source table or view: [dbo].[ClientDetails]

    Row delimiter: {CR}{LF}

    Column delimiter: Vertical Bar {|} *** I need to use this because there are commas, semicolons, colons, and tabs in the valid data***

     

    Attached is a snip from the output. In it you can see record with index number 65 begin because of the Carriage Return at the end of the previous Row. You can see that there are two additional Carriage Returns in the ClientNotes column which are treated as a call for a new Row. It is a serious problem when it happens in 90% of over 400K rows.

    Again, help in finally clearing these carriage returns will be greatly appreciated.

    Thank you

    Attachments:
    You must be logged in to view attached files.
  • Jeff,

    Like this?

    SELECT ca1.ClientNotes
    FROM ClientDetails
    UPDATE (
    SELECT REPLACE(REPLACE(REPLACE(ClientNotes, CHAR(9), ''), CHAR(10), ''), CHAR(13), '') as 'ClientNotes'
    ) AS ca1
  • no update would not be like that - but as said you do not need to do the update.

    update ca1
    set ClientNotes = replace....
    from ClientDetails as ca1

    but that changes the data and is not necessarily the correct thing to do.

     

    you are using SSIs to extract the data - and your issue is that you did not specify a text delimiter.

    Text qualifier: <none>

    should be

    Text qualifier: " (e.g. double quote)

    the above should solve the issue.

     

    regardless of it you are also selecting from the table directly - for you case and in addition to the text qualifier above, you should also change your source to be your sql where you select all your required fields and include the replace block instead of the original field

    Source table or view: [dbo].[ClientDetails] - change to be "select xxx, replace... from dbo.clientdetails

     

    EDIT: Finally - what is this extract for? is it to load to another system or just for people to check the contents?

    if to load to another system as long as you create it as a proper CSV (and a proper CSV has Double Quotes around all non numeric fields (and internal quotes from the source text are escaped with another double quote) then any system able to handle CSV files (Excel inclusive) will be able to process the file correctly even with the CR+LF on it.

  • What happened is that Company A was purchased by Company B. Neither company is in the tech industry.

    I have been trying to help my friend, an employee of Company A, to extract the data from their system and provide the files to Company B which asked for the data in a CSV format. The original software at Company A is no longer being used, so if UPDATE modified the data in the original column, that is completely ok.

    When you said, "in addition to the text qualifier above, all your required fields and include the replace block instead of the original field", are you referring to a new column? I have not been able to create a column with the properly formatted data, so I don't know how to execute your recommendation: "you should also change your source to be your sql where you select all your required fields and include the replace block instead of the original fieldSource table or view: [dbo].[ClientDetails] - change to be "select xxx, replace... from dbo.clientdetails"

    Again, the extract is being performed on a stagnant database. If I have to modify the data in the existing column, that is perfectly acceptable.

    Thank you again.

  • DCL wrote:

    and others, so many others.

    I would write a FUNCTION and use that to "wrap" each of the columns, then if you find "something else" that you need to be handled you can centralise that logic in the function.

    SELECT[MyID] =dbo.FN_INT2Param(MyID, 1)
    , [MyGUID] =dbo.FN_GUID2Param(MyGUID, 1)
    , [MyTitle] =dbo.FN_String2Param(MyTitle, 1)
    , [MyDate] =dbo.FN_Date2Param(MyDate, 1)
    , [MyNotes] =dbo.FN_Notes2Param(MyNotes, 1)
    FROMMyTable
    ORDER BY MyPKey

    where the 2nd parameter can be used for any optional processing - whether the data is to be surrounded by quotes, whether GUID should have the "-" removed, if you want a leading-comma prefixing, and so on.

    In these types of scenario I have found other, rogue, characters that muck up the export.  I suggest doing a query to find "any other rogue characters" which are used, and decide which of those also need removal. Something like this to find them

    SELECTMyID, MyNotes
    FROMdbo.MyTable
    WHEREMyNotes LIKE
    '%['
    + CHAR(1) + '-' + CHAR(8)
    --+ CHAR(9)-- TAB
    --+ CHAR(10)-- LineFeed
    + CHAR(11) + '-' + CHAR(12)
    --+ CHAR(13)-- Return
    + CHAR(14) + '-' + CHAR(31)
    + CHAR(128) + '-' + CHAR(255)
    + ']%' COLLATE Latin1_General_BIN2

    I'm surprised that you want to convert Date/NoteText/LineBreak to be all on one line as it will be very hard to read. Personally I would introduce a "mark character" in place of Line Break so they could be split in future, or better still split the Notes into individual KeyID/LineNo/Date/NoteText records on export and store them in that format. Depends a bit how the target system handles these sorts of Notes, but single-text-blob which is editable by the user (rather than only ever appended to) is open to rewriting history! by deleting/changing older material

     

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply