Has SQL Server Introduced a way for Converting HTML to non-HTML stored as VARCHA

  • If a table stores in a column of type VARCHAR the textual equivalent of an html page, does SQL Server have some but in function or process for converting that into non-html text, just regular text with carriage returns and the like? I've seen a few custom solutions for this but noting native to SQL Server.

    Kindest Regards,

    Just say No to Facebook!
  • no

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • OK thanks

    Kindest Regards,

    Just say No to Facebook!
  • You might be able to use Convert_HtmlToXml in the following to get XML which you might then be able to process as you want.

    https://sqlsharp.com/features/

    I suspect it would be better to do this type of processing on an application server.

  • This was removed by the editor as SPAM

  • Unfortunately that's not an option in this case. I am limited to writing valid T-SQL statements so if there's a way in T-SQL to do this great otherwise no.

     

    Thanks

    Kindest Regards,

    Just say No to Facebook!
  • I think the only way to do this entirely in SQL Server would be with "REPLACE" statements (ie replace <br> with a newline character) and after replacing everything you want to replace then doing a string split on the < character followed by a string split on the > character and then grabbing all of the even results as the even results will be what was between the < and >. May need to do some testing and tweaking as I'm sure that this isn't a 100% safe way to do it, but should be decent. For example, if the HTML contained "

    Hello World

    ", the first split would give you:

    h1>Hello World
    /h1>

    as 2 rows. Then you split that on > and you'd get:

    h1
    Hello World
    /h1

    so rows 1 and 3 are garbage and can be thrown out and row 2 is what you want.

    Problems with the above though are that things like links will have the link text dropped. Plus I imagine there are some use cases where this would fail like if there are some HTML tags you want to keep, but overall, it should capture what you are asking for and is using entirely "native" functionality - STRING_SPLIT() and REPLACE() are both native functions, just needing to build up your query around those. BUT if you are looking for something like a "FROM_HTML()" function built into SQL, I am not aware of any.

    My opinion though is that if you need to parse the HTML to something else, I would do that at the application layer that is either INSERTing the data into the database or SELECTing the data out. In general, I try to use my SQL Databases kind of like a bucket with the data being water. If I want to make tea, that water comes out and goes into into a teapot to make the tea. I wouldn't try to make the tea in the bucket. I put my data into SQL and once it is in, any manipulation I do on it is in an external tool (SSMS, custom applications, 3rd party applications, SSRS, SSIS, etc). You could use SSIS to parse the HTML as it supports scripting... so that could be another option.

     

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Yeah this is kind of what I figured would be the case. The handful of SP's/UDF's I've come across that do something like this are effectively replacing known HTML codes/characters with a non-html equivalent.

     

    Thanks

    • This reply was modified 1 year, 1 month ago by  YSLGuru.

    Kindest Regards,

    Just say No to Facebook!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply