November 3, 2020 at 8:57 am
It actually surprises me this should even be a request. Post data, however, appears to be stored as a varchar
, not an nvarchar
which is causing malformed post, including today's QOTD. Obviously this data has not been lost, but this should be fixed, as it can't be assumed that post will only contain characters from the code page.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
November 3, 2020 at 2:58 pm
Thom, the issue you are talking about appears to be what I reported 19 months ago:
My guess is that the data is using NVARCHAR
as the datatype (based on looking at the datamodel provided by the WordPress via .NET project that RedGate used for this site), but that there's either a VARCHAR
input parameter / variable along the way, or that the input data is being wrapped in single-quotes without the "N
" prefix. Either way, very frustrating indeed.
You can still get Unicode characters to show up in posts, questions, etc, but you need to HTML encode them first so that the data sent into the server is all standard ASCII characters. However, if a post / question / article / etc containing HTML encoded entities is ever edited, such entities will be rendered in the editor as the actual Unicode character, and thus saving the update will then convert the character into one or two "?
" depending on the character being BMP or Supplementary. For example:
Passing in: Ă
should store correctly as that encoded string and appear as follows when displayed later: Ă
For more details, please see my investigation here:
Take care,
Solomon..
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
November 3, 2020 at 4:28 pm
Thanks Solomon. It's bitterly disappointing that this was raised back in April of last year and not fixed yet; I didn't even think to look for a report and I (foolishly) assumed it would likely something that was caused in a recent update.
Looks like it's upstream though, as I tested on a Project Nami install I have and experienced the same problem. I'll see if i can find the problem myself in my own site and pass the feedback on.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
November 3, 2020 at 4:56 pm
Looks like it's upstream though, as I tested on a Project Nami install I have and experienced the same problem. I'll see if i can find the problem myself in my own site and pass the feedback on.
Sorry, I didn't mean to imply that the source of the problem is in RedGate's implementation of the code, just that I believe the problem is in code, not in the datatype used to store the data.
I haven't thought about this in 19 months, but I did some investigation at the time (going through the Project Nami GitHub repo) and believe that I had found the problem, but it was buried pretty deep and I was about to post an issue or even PR but they had updated the master branch with a partial update of the code (or something like that) and it was unclear if what I had found would be changing so I didn't want to do a bunch of work just to have it invalidated by changes about to merge in. I believe that I took some notes at the time and will see if I can find them. Part of the issue is that I don't have an install for this and so can't really test it (nor any time to set up such an environment). If I can find what I saw at the time and pass along some suggested changes, could you help me test it? And if successful, I can post the PR?
Please let me know. Take care, Solomon..
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
November 3, 2020 at 5:14 pm
November 6, 2020 at 3:20 pm
I haven't found the specific cause yet, but I am getting closer. I did just find, and submit a fix / PR for, an occasionally serious bug:
Improve RegEx for "IF in SELECT statement" in translations.php
Will keep looking for the cause of this one, though I did come across an open issue that appears to suggest that a config setting might also get this working:
Special characters not showing correct #303
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
November 6, 2020 at 9:11 pm
Test post:
trying to see if the following character is converted to a best-fit mapping: 〛
]
If it comes back as a regular ]
then it is being converted via best-fit, which points more directly at the problem being in SQL Server than in the PHP code.
And the following should come back as ??
: 𒍅
??
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
December 10, 2020 at 6:08 am
Hey Thom. I think I found a potential fix for this issue. I actually have two solutions, but one requires setting up the Project Nami DB to use a _UTF8
collation during the install, so probably not an option for your existing system. That also then requires SQL Server 2019 (or newer, whenever that happens). That option is the easier of the two, but as I said, kinda needs to happen from the beginning (technically you can change the collation of the DB and all of the objects in one shot, but if you had any characters with values in the 128 - 255 range, then they would get corrupted).
The second option is to add a single upper-case N
to a single line of the PHP code, in the prepare()
function that all of the SQL statements (I believe) get run through. I have this written up in the following Issue for that project:
Unicode (NVARCHAR) data converted to 8-bit code page (VARCHAR) as it is stored #417
Currently, the line to edit is #1351 of wp-db.php, but that same line of code might be a different line # if you have a different version than what's checked into the master branch. The line in question is currently:
$query = preg_replace( '/(?<!%)%s/', "'%s'", $query ); // Quote the strings, avoiding escaped strings like %%s.
and it should probably be:
$query = preg_replace( '/(?<!%)%s/', "N'%s'", $query ); // Quote the strings, avoiding escaped strings like %%s.
I have no way to test this, though I have tested the theory with PHP in general and it does fix the problem. As I mention in that GitHub issue, I'm a little concerned about non-string values that still get quoted, such as dates, times, and datetimes, and UNIQUEIDENTIFIERs / GUIDs. Those types of values will work in the current scheme of using single-quotes without an upper-case N
but won't work if the N
is there.
Please note that you need to have your connection using the 'CharacterSet'=> 'UTF-8'
option in the array that gets passed into sqlsrv_connect()
. Most likely you already have this set (if you save this character, 👾 , and it comes back as ??
instead of 👾
then you have it set).
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
January 11, 2021 at 12:32 pm
Hi Solomon,
Sorry not replied to this, December wasn't a great month for me, so didn't have my head in the game. I'll have a look at this later this week and report back. Thanks for your effort on this!
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
January 13, 2021 at 5:49 am
Yer welcome. And, no worries. 2020 wasn't a great month for me (and it was terribly long) ;-). Just let me know what you observe. If it works I will submit a PR for it.
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
January 18, 2021 at 10:29 am
Hi Solomon,
Yep, adding the notation character to that one line appears to do this job. I tested Comments, Posts, and Pages, and all 3 now correctly stored the characters.Well done the work here!
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
January 19, 2021 at 5:22 am
Great! Glad to hear that the testing so far is looking positive. Do you mind a few more tests, just to help make sure that other scenarios work, or at least aren't broken? One area that requires single quotes but cannot be prefixed with an upper-case "N" is dates. There aren't too many places that would have a date, but perhaps change the post date from "immediate" to either scheduled in the future or post-dated? And creating a post / page is the main thing, but what about round-trip issues when updating? So here is the list of tests that I think will mostly confirm this as a real fix:
Thanks again for taking the time to test this. If all of these scenarios pass, then I will submit the PR.
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
January 25, 2021 at 12:23 pm
All looks good; you can see snippet of the silliness below. 1, 2 and 6 I'd already tested. I didn't, however, test changing the scheduled date to a historical one as that would publish the post and I didn't want to do that. I did use a future date, and then changed it to another in the future and it was fine.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
January 27, 2021 at 11:30 pm
Hi Thom. Thanks again for doing this testing. I had forgotten about tags and categories, but I do see that you were able to add "Alien Monster" to them. Cool. And yes, that future date change should be fine.
Just out of curiosity, since I only mentioned updating the post content before, can you please confirm that you were able to successfully update the following without losing the supplementary character:
Thanks. And in the mean time, I will go ahead and submit that PR given that it's highly unlikely these remaining tests will fail given all that have succeeded so far.
Take care,
Solomon..
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
January 28, 2021 at 11:40 am
Hi Solomon,
Yep, I amended everything I could noticed no loss of characters, so it appears that everything does go through that one line that you highlighted for change.
If there is something missed, hopefully whomever reviews your PR addresses it in the feedback or even adds to it.
Thanks again!
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply