December 22, 2009 at 3:34 am
Hello all,
I have implemented a solution based on a SSIS message queue task that loads up a xml document (with charset ISO-8859-1) and sends it to a remote private queue. The message is delivered successfully to the client but in the middle, the original charset is replaced by UTF-8 and of course, strings that were originally with charset ISO-8859-1 now have unreadable chars. I can't see why this happens. I'm sure the xml file is loaded with the correct charset. Somewhere between loading message to a queue message and sending it to remote private queue, charset is converted into UTF-8.
Any ideas?
I'm using BIS2008, the host msmq that is sending the messages is W2k8 Standard 64.
Thanks in advance.
Here's my code in vbscript inside SSIS:
Public Sub Main()
Dim loRemotePrivateQueue As MessageQueue
Dim loMessage As New Message
Dim loXmlMessage As New Xml.XmlDocument
loRemotePrivateQueue = DirectCast(Dts.Connections("plada message queue connection manager").AcquireConnection(Dts.Transaction), MessageQueue)
loRemotePrivateQueue.Formatter = New XmlMessageFormatter(New Type() {GetType(XmlDocument)})
If Dts.Variables("XMLPath").Value.ToString.Trim.Length = 0 AndAlso Dts.Variables("XMLString").Value.ToString.Trim.Length = 0 Then
Dts.TaskResult = ScriptResults.Failure
Exit Sub
End If
If Dts.Variables("XMLPath").Value.ToString.Trim.Length > 0 Then
If My.Computer.FileSystem.FileExists(DirectCast(Dts.Variables("XMLPath").Value, String)) Then
'Dim lsXML As String = String.Empty
'Using loSR As IO.StreamReader = New IO.StreamReader(DirectCast(Dts.Variables("XMLPath").Value, String), Encoding.GetEncoding("ISO-8859-1"))
' lsXML = loSR.ReadToEnd()
'End Using
'loXmlMessage.LoadXml(lsXML)
loXmlMessage.Load(DirectCast(Dts.Variables("XMLPath").Value, String))
End If
End If
With loMessage
.Formatter = New XmlMessageFormatter(New Type() {GetType(XmlDocument)})
.Label = IIf(Dts.Variables("XMLCode").Value.ToString.Trim.Length > 0, Dts.Variables("XMLCode").Value.ToString, String.Empty).ToString
.CorrelationId = loMessage.Id
.Body = loXmlMessage
.AcknowledgeType = AcknowledgeTypes.FullReachQueue Or _
AcknowledgeTypes.FullReceive
.AdministrationQueue = DirectCast(Dts.Connections("ack message queue connection manager").AcquireConnection(Dts.Transaction), MessageQueue)
.Recoverable = True
.UseDeadLetterQueue = True
End With
loRemotePrivateQueue.Send(loMessage)
Dts.TaskResult = ScriptResults.Success
End Sub
January 5, 2010 at 9:13 am
So, no thoughts on this one?
January 5, 2010 at 10:07 pm
It looks like XmlDocument uses UTF-8 by default unless you have an encoding attribute in your xml. You haven't posted any sample of the XML so I can't say for sure that you don't have this in there, but ensuring it's there would be the easiest way to ensure that the document is loaded up using the correct encoding.
Alternatively, like the commented out seciton of your code, you could create a StreamReader, specifying the encoding in its constructor. You could then use either an XmlReader (or possibly the overloaded Load() method) to load the document.
HTH,
Steve.
January 5, 2010 at 10:09 pm
the encoding would be specified like...
<?xml version="1.0" encoding="ISO-8859-1" ?>
HTH,
Steve.
January 11, 2010 at 4:28 am
Thanks for the help stevefromOZ.
Anyway, as i said in my previous post, i'm sure i can load xmldocument in the intended encoding, "ISO-8859-1". My xml messages are generated with the attribute that you mentioned. I even tried to ensure reading my xml by specifying the encoding - that's the reason you see some commented code in there - but as i was able to probe, there's no need of that if you specify the encoding at the beginning of the file.
I tried to save the xml document into file again, after reading it, to check the encoding, and yes, the generated xml file was written with the "ISO-8859-1" attribute.
My problem is when i send the message using SSIS message queue task. My private destination queue receives the message using UTF-8 encoding... Very odd.
Any more ideas?
January 11, 2010 at 7:46 am
Have you done a test using a scripting method to send the msg (see here)? If this was to work, it would point to either a connection manager issue (or bug). If it doesn't work, then there's possibly some other properties off the MessageQueue object you could modify to make it work. Or worst case, it proves that trying to send anything other than UTF-8 is not possible.
HTH,
Steve.
January 12, 2010 at 2:36 am
Thanks stevefromOZ, but if you check again the code i've posted, it belongs to a script task that i'm using to send messages to a remote private queue... As far as i remember, i've followed that msdn link when i was implementing the solution.
At the time, i've checked the properties of Message Queue Connection and i didn't find anything that could be related to message encoding.
I'm starting to agree with you. This is some kind of limitation of Message Queue Connection...
In order to workaround this, my t-sql that generates xml messages is replacing all chars that could not be represented in UTF-8. I don't like this approach, but it seems the only way to make it work. Maybe in the future i should consider other techniques in order to send messages to private queues.
Someone else has experienced this? Could it be a reported SSIS bug?
Thanks.
February 11, 2010 at 8:15 am
Seems like we've reached a dead end concerning this topic...
February 11, 2010 at 11:43 am
For what it's worth, it *appears* that the following code will result in the message retaining the ISO-8859-1 encoding. I will note though, that I was mucking around with this, so there may be an extra step inserted that you might find you can remove easily without reverting back to receiving UTF-8 messages.
public void Main()
{
Msg.MessageQueue loRemotePrivateQueue;
Msg.Message loMessage = new Msg.Message();
Xml.XmlDocument loXmlMessage = new Xml.XmlDocument();
loRemotePrivateQueue = (Msg.MessageQueue)Dts.Connections["MQCM"].AcquireConnection(Dts.Transaction);
Type[] xt;
xt = new Type[1];
xt[0] = loXmlMessage.GetType();
loRemotePrivateQueue.Formatter = new Msg.XmlMessageFormatter(xt);
System.IO.StreamReader sr = new System.IO.StreamReader(@"c:\temp\xml_test_out.xml", System.Text.Encoding.GetEncoding("ISO-8859-1"));
byte[] bytearr = System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(sr.ReadToEnd());
System.IO.MemoryStream str = new System.IO.MemoryStream(bytearr);
loMessage.Formatter = new Msg.XmlMessageFormatter(xt);
loMessage.BodyStream = str;
loMessage.Label = "test2";
loMessage.CorrelationId = loMessage.Id;
loMessage.AcknowledgeType = Msg.AcknowledgeTypes.FullReachQueue | Msg.AcknowledgeTypes.FullReceive;
loMessage.Recoverable = true;
loMessage.UseDeadLetterQueue = true;
loRemotePrivateQueue.Send(loMessage);
Dts.TaskResult = (int)ScriptResults.Success;
}
The other thing to note, and this was the basis for me concluding that the encoding had 'stuck' - when pushing the message using your original code, any/all encoding set in the XML declaration was removed (implicit UTF-8). When reviewing the message contents after using the code above, the encoding attribute remained intact.
Basic approach was to ensure that the object/stream that was pushed to the message body was, in fact, ISO-8859-1 and not a default UTF-8 that I think XMLDocument defaults to.
HTH,
Steve.
February 11, 2010 at 11:46 am
Forgot to mention too, when reviewing the messages in MSMQ, the byte count did rise by 23 for those messages where it *appears* that the encoding has stuck.
Steve.
February 12, 2010 at 7:47 am
Thanks a lot Steve.
I'll check your solution out. Anyway, it's rather odd, 'cause i remember loading xmldocument and before pushing it into msmq message, saved it into xml file again and encoding didn't change...
The main difference between your approach and mine is that you recurred to an array of bytes to load xml file and pushed the array using bodystream property of message queue instead of using xmldocument and body property of message queue.
My guess is that bodystream property ensures correct encoding.
Thanks once again.
PM
February 12, 2010 at 8:23 am
Hope it works for you. I didn't have (didn't take the time to include) any characters that would have proven out the encoding on the received message, so am assuming you have that data to physically confirm it works. I was running on the basis of the encoding declaration being stripped (therefore default UTF-8) but using the bodystream it definitely kept the encoding declaration.
I'm not sure if it's the bodystream or that the xml document; depending on how it's loaded, it will default to UTF-8. One thing that seems to happen is that the declaration is just that, it doesn't truly determine the encoding of the data itself, so out of setting the declaration and actually pushing data into the xml document with a specific encoding - the latter is definitely the one that dictates the encoding of the doc.
Good luck, let us'all know how it goes.
Steve.
February 12, 2010 at 11:53 am
Worked like a charm. Bodystream seems to be the answer...
Thanks once more.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply