May 6, 2016 at 5:17 am
We have a system with proprietary algorithm that converts the data entered in an user interface into a concatenated string. The database is SQL Server 2008R2 (Sp3) but it is not conventional relational database by design.
The system is divided into multiple sections. For example, General section data has Company name and address. Once the data is saved from the user interface it gets saved in the database in the following manner. The system is written by our sister office in another country and we do not have the source code.
(ref - attachment -1)
In the past few days we are experiencing a problem wherein we are getting the corrupt data after it is saved in the database
(ref - attachment -2)
As suggested by the core developer we checked hexadecimal version of the text which is saved in the database using following query
Select CONVERT(varbinary(max),
Coded_Data),*
from
Table_name
where
Primary_key = <numeric value>
and section_name = 'General'
and we get the hex representation like below
0x02072616468696B612E74686F7261744072696C2E636F6D496D706F72746572206F662054656C65636F6D6D756E69636174696F6E202620456C656374726F6E69632050726F647563742C2052656C69616E63656469676974616C2073746F72657320746F2052656C69616E6365A
In the above representation when we remove leading zero after 0x and trailing “A” we get the correct english representation.
But this pattern of data corruption / invalid character is not the same for all records. Many records have hex representation as given below where we are not sure how to conver that back to correct version of english data
0x6167656D656E7420636F6E73756C74616E74456C65637472696320706F776572206C696E6520636F6E737472756374696F6E496E647573747269616C20706C616E7420636F6E737472756374696F6E4D616E75666163747572657320646573616C696E6174696F6E2065717569706D656E744F696C20616E642067617320706970656C696E6520636F6E737472756374696F6E5175616C6974
79206173737572616E636520636F6E73756C74616E74456E676167656420696E20636F6E737472756374696F6E202620636976696C20656E67696E656572696E679
The system have fields with different data types like date, text, varchar, numbers etc. Text fields may have data copied from any source like a web page an email etc.
The column that stores entire data can hold one records of 255 characters data type – varchar(255)
Collation - SQL_1xCompat_CP850_CI_AS
Server Collation - SQL_Latin1_General_CP1_CI_AS
This database server is recently upgraded from SQL Server 2000 to SQL Server 2008R2. We have started observing this issue immediately after migrating to SQL Server 2008R2
Can we identify which are the digits cuasing data corruption from hex version and remove them at once?
Thanks
Rohit Chitre
May 6, 2016 at 6:56 am
wow that's an interesting puzzle.
so is the varbinary string stored as a string datatype, or is it a varbinary(max)?
for a solution , we need the datatype as it is stored right now.
i have an idea, where if the converted string is not like %[a-Z,0-9]%, you would STUFF the two characters you identified with empty strings ,and see if THAT converts nicely.
that depends on whether it's a varbinary that needs to be converted to a string, tweak it , and cvonvert it back to varbinary, or if it is already a string representation of a hex value
Lowell
May 6, 2016 at 7:48 am
Thanks for the reply. The datatype of the column is varchar(255) where junk characters are seen.
if we convert that to varbinary(max) in select statement we get the pattern
May 6, 2016 at 8:01 am
If I try to remove leading zero from 0x0231 with
STUFF(column_name,1,1,'') then it removes 02 as that is the first character in binary and makes the binary pattern as 0x31 instead of 0x231
May 6, 2016 at 8:10 am
SQLRO (5/6/2016)
If I try to remove leading zero from 0x0231 withSTUFF(column_name,1,1,'') then it removes 02 as that is the first character in binary and makes the binary pattern as 0x31 instead of 0x231
i think you wanted STUFF(val,3,1,''), right?
/*
--Results
0x231
*/
With MyCTE
AS
(
SELECT '0x0231' As Val
)
SELECT STUFF(val,3,1,'') FROM MyCTE
Lowell
May 6, 2016 at 8:26 am
The val is actually a binary value. So I tried doing the below as per your suggestion.
declare @varBin as varbinary(max)
set @varBin = 0x06E3231322020202020203737363533313231322020202020203737323137364D616E756661637475726573206F66206361726275726574746F727320616E6420706172747320666F722074776F2026207468726565A
SELECT convert(varbinary(max),STUFF(@varBin,3,1,''))
--Result
0x06E31322020202020203737363533313231322020202020203737323137364D616E756661637475726573206F66206361726275726574746F727320616E6420706172747320666F722074776F2026207468726565A
May 6, 2016 at 8:50 am
ok, here's how i would tackle it;
assuming your analysis on the two chars is correct, this produces both "versions"
of the data;
now i would simply test for high ascii, and take only the one that does not contain high ascii in a case statement
let me see if i can slap together a function to test for that, but this gets you really close already, so i wanted to post it:
;WITH MyCTE(val)
AS
(
--bad
SELECT 0x02072616468696B612E74686F7261744072696C2E636F6D496D706F72746572206F662054656C65636F6D6D756E69636174696F6E202620456C656374726F6E69632050726F647563742C2052656C69616E63656469676974616C2073746F72657320746F2052656C69616E6365A
UNION ALL
--bad
SELECT 0x06C6F77656C6C4073746F726D726167652E636F6D20646F206E6F74207370616D206D6520696620796F75206465636F64652074686973B
UNION ALL
--good
SELECT 0x2053514C205365727665722032303038204461746120436F7272757074696F6E205353324B3820205353324B382052322020436F7272757074205265636F7264202D204E6F6E2D72656C6174696F6E616C20646174616261736520
),
ConvertedToString
AS
(
SELECT val As OriginalBinary,
CONVERT(varchar(max),val) As Converted,
master.sys.fn_varbintohexstr(val) As OriginalBinaryToString,
LEFT(STUFF(master.sys.fn_varbintohexstr(val),3,1,''),LEN(STUFF(master.sys.fn_varbintohexstr(val),3,1,'')) -1) As TrimmedAndStuffed,
CONVERT(varbinary(max),LEFT(STUFF(master.sys.fn_varbintohexstr(val),3,1,''),LEN(STUFF(master.sys.fn_varbintohexstr(val),3,1,'')) -1),1) As BackToBinary,
CONVERT(varchar(max),CONVERT(varbinary(max),LEFT(STUFF(master.sys.fn_varbintohexstr(val),3,1,''),LEN(STUFF(master.sys.fn_varbintohexstr(val),3,1,'')) -1),1)) As ConvertedTrimmed
FROM myCTE
)
SELECT * FROM ConvertedToString
Lowell
May 6, 2016 at 8:58 am
SQLRO (5/6/2016)
The val is actually a binary value. So I tried doing the below as per your suggestion.declare @varBin as varbinary(max)
set @varBin = 0x06E3231322020202020203737363533313231322020202020203737323137364D616E756661637475726573206F66206361726275726574746F727320616E6420706172747320666F722074776F2026207468726565A
SELECT convert(varbinary(max),STUFF(@varBin,3,1,''))
--Result
0x06E31322020202020203737363533313231322020202020203737323137364D616E756661637475726573206F66206361726275726574746F727320616E6420706172747320666F722074776F2026207468726565A
In addition to Lowell's suggestion, you could also just use CONVERT with a style to turn the varbinary to a string:
DECLARE @varBin as varbinary(max);
SET @varBin = 0x06E3231322020202020203737363533313231322020202020203737323137364D616E756661637475726573206F66206361726275726574746F727320616E6420706172747320666F722074776F2026207468726565A;
SELECT @varBin, CONVERT(varchar(max),@varbin); --Unaltered binary and converted
DECLARE @binary_as_string varchar(max);
SET @binary_as_string=CONVERT(varchar(max),@varbin,1); --Use style 1 to preserve the binary as a literal string
SET @binary_as_string=STUFF(@binary_as_string,3,1,''); --Remove the leading 0
SET @binary_as_string=LEFT(@binary_as_string,LEN(@binary_as_string)-1);--Remove the trailing A
SET @varBin=CONVERT(varbinary(max),@binary_as_string,1); --Use style 1 again to preserve the literal string
SELECT @varBin, CONVERT(varchar(max),@varbin); --Altered binary and converted
Cheers!
May 6, 2016 at 9:10 am
Thanks. Both the methods working fine. I am updating as many as I get in this range. The others would be still a challenge
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply