Making text UTF-8 compliant

  • Hi guys,

    I have a procedure which converts a rather big table (10-15,000 rows) into an XML feed for Google Base. It works pretty well but a couple of the columns are large blocks of text data and the feed needs to be UTF-8 compliant. Now, I've created a function which strips out unwanted / erroneous characters on the fly but this can be very slow as you can imagine when you're using it on a few fields in a large table.

    Is there anything out there which could sort this? Collation??

  • Now that is something that you might want to consider a CLR assembly for.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • ok, that's a bit of a new thing for me but would it mean I could add my assembly to the DB and integrate it into my t-sql to work on the unwanted chars?

  • Well if you've already got a UTF-16 TO UTF-8 conversion routine in .Net, you could just make a SQLCLR assembly from that.

    Thinking about it, there are ways to do it in pure SQL that might be reasonable, but I would need to get a good set of specs for it.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Not sure if this would work, but could you define a temporary table with the text columns having datatype varchar, as opposed to nvarchar??? Or CAST/CONVERT the column on the fly??

    Andy

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply