publishing binary content

Nov 13, 2010 at 5:45 AM
Edited Nov 13, 2010 at 2:52 PM

Hi,

Is it possible to transfer binary (byte[]) content over Laharsub?

it seems that all message are encoded using UTF8 and sometime the bytes can be decoded incorrectly - thus changing the byte[] values.

I am using the Silverlight client, and I have a feeling that the client is causing the problem.

BTW - I think you are doing a great job

Thanks!

Nov 13, 2010 at 2:54 PM

I managed to solve it.

It's a bug in the silverlight client code.

the message is encoded in UTF8, thus some of the bytes get corrupted when converting from byte[] to string and back.

I'll commit my fix to the silverlight client source code if you'd like.

Rubinsh.

Coordinator
Nov 14, 2010 at 8:28 PM

Rubinsh, could you describe the fix you have made? Which method did you have to modify and how? 

Nov 15, 2010 at 7:56 PM

Hi,

I modified methods in the HttpLongPollManager class in the Silverlight client.

The methods I modified were:

OnPollResponse, ParseMultipartMime and ParseMimePart.

The modifications I did are the following: 

Instead of reading the message stream using UTF8 encoding and then disposing the stream, I read the stream to a string but also kept the original stream.

I passed the original stream as an additional parameter to the ParseMultipartMime and ParseMimePart mehtods.

when creating the PubSubMessage object in the ParseMimePart method, I assinged to the Body property a new memory stream that was only a substream of the original stream (the part between the boundaries)

This way - I did not loose any binary information when converting a stream to UTF8.

 

here is the relevat code:

(The Bold parts are the parts that where changed)

 

void OnPollResponse(IAsyncResult result)
        {            
            lock (this.syncRoot) // prevent race condition with OnAbort
            {
                if (this.poll != null && ! this.pollAborted)
                {
                    bool success = false;
                    HttpWebResponse response = null;
                    try
                    {
                        response = (HttpWebResponse)this.poll.EndGetResponse(result);
                        Stream stream = response.GetResponseStream();
                        // TODO: implement asynchronous multipart/mixed parsing without response buffering
                        string body = this.ReadEntireStream(stream);
                        if (!string.IsNullOrEmpty(body))
                        {
                            this.ParseMultipartMime(body,stream);
                        }
                        success = true;
                    }
                    catch (Exception e)
                    {
                        this.FaultAllSubscriptions(e);
                    }
                    finally
                    {
                        this.poll = null;
                        if (response != null)
                        {
                            response.Close();
                        }
                    }
                    if (success && this.subscriptions.Count > 0)
                    {
                        this.StartPoll();
                    }
                }
            }
        }

    void ParseMultipartMime(string result,Stream stream)
        {
            int i = result.IndexOf("\r\n");
            if (i < 3)
            {
                throw new InvalidOperationException("Cannot determine multipart/mixed boundary. Malformed HTTP long poll response.");
            }
            string boundary = result.Substring(0, i);

            int mimePartIndex = 0;
            while (result.Length > mimePartIndex)
            {
                int boundaryIndex = result.IndexOf(boundary, mimePartIndex);
                if (boundaryIndex == -1)
                {
                    break;
                }
                if (mimePartIndex > 0)
                {
                    string mimePart = result.Substring(mimePartIndex, boundaryIndex - mimePartIndex - 2);
                    this.ParseMimePart(mimePart,mimePartIndex,boundary.Length+4, stream);
                }
                mimePartIndex = boundaryIndex + boundary.Length + 2;
            }
        }
  void ParseMimePart(string part,int index,int endBoundryLength,Stream stream)
        {
            PubsubMessage message = new PubsubMessage();

            int i = part.IndexOf("\r\n");
            if (i <= 0)
            {
                throw new InvalidOperationException("Cannot determine the content type of the MIME part.");
            }
            Match m = contentTypeRegex.Match(part, 0, i);
            if (!m.Success)
            {
                throw new InvalidOperationException("Cannot determine the content type of the MIME part.");
            }
            message.ContentType = m.Groups[1].Value;

            m = contentDescriptionRegex.Match(part, i);
            if (!m.Success)
            {
                throw new InvalidOperationException("Cannot determine topicId and messageId of the MIME part.");
            }
            message.TopicId = int.Parse(m.Groups[1].Value, CultureInfo.InvariantCulture);
            message.MessageId = int.Parse(m.Groups[2].Value, CultureInfo.InvariantCulture);

            //Encoding.UTF8.GetBytes(part.Substring(part.IndexOf("\r\n\r\n") + "\r\n\r\n".Length))
            index += part.IndexOf("\r\n\r\n") + "\r\n\r\n".Length;
            var buffer = new byte[stream.Length - index - endBoundryLength];
            stream.Seek(index, SeekOrigin.Begin);
            stream.Read(buffer, 0, (int)stream.Length - index - endBoundryLength);
            message.Body = new MemoryStream(buffer);
            this.DispatchMessage(message);
        }

 

and that's it.

This allows us to receive also binary information.

for example - try to convert the following byte array to string and back to byte[] using UTF8 encoding:

var bitsArr = new byte[1] { 0xda };
var str = Encoding.UTF8.GetString(bitsArr, 0, bitsArr.Length);
byte[] newBytes = Encoding.UTF8.GetBytes(str);
the newBytes array has now 3 items and not one because the string did not recognize the original character in the bitsArr array.
my fix solves the problem listed above.

Nov 16, 2010 at 7:46 PM
Edited Nov 17, 2010 at 5:48 AM
Tomasz,
This did not solve my problem.
the problem is with specific binary content - like the one I sent you.
also, by looking at the silverlight client source, I did not see any reference to the ContentType attribute that affects the way the message is read - the message is always parsed using UTF8 encoding.
your unit test should test the following array
var bitsArr = new byte[1] { 218 };
this array fails conversion with UTF8.
Thanks,
Rubinsh.

2010/11/16 tjanczuk <notifications@codeplex.com>

From: tjanczuk

Rubinsh,

Laharsub allows exchange of binary messages without the changes you have shown above. One way to publish a binary message is to use the following code:

byte[] body = new byte[] { 0, 1, 2, 3 };
PubsubMessage m1 = new PubsubMessage{
TopicId = 1,
ContentType = "application/laharsub",
Body = new MemoryStream(body)
};

 

I have added test cases that show a publish\subscribe lifecycle for binary messages with changeset http://laharsub.codeplex.com/SourceControl/changeset/changes/0260da6fe600 (note this is the dev branch as opposed to default branch).

I think the problem you may be running into with the code above is related to the BOM character in the context of UTF-8 encoding. You can read more about it at http://en.wikipedia.org/wiki/Byte_order_mark.

Thanks,
Tomasz

Coordinator
Nov 17, 2010 at 3:31 AM

Rubinsh, I realized the approach above does not work for all binary messages only after posting my response you quote above, which is why I pulled it off from this thread. I have filed http://laharsub.codeplex.com/workitem/8 to address this issue in a first class way. In the meantime, one (suboptimal) way to work around the problem is to Base64-encode the binary data and send the resulting string instead. 

I would like to fix this issue along with refactoring the code to allow for notification streaming.

Nov 17, 2010 at 5:53 AM

Thanks,

I would be more than happy to contribute in this issue if you'd like - and to Laharsub in general.

In the mean time, I'll just continue using my modified silverlight client (posted above) - that is working well for me.

BTW - I'm currently testing Laharsub for a multi user online drawing application in silverlight, and Laharsub looks promising.

Coordinator
Nov 17, 2010 at 7:01 AM

Rubinsh, I appreciate your offer. Unfortunately I am legally constrained from accepting direct contributions to this project. You are more than welcome to create your own fork and make any changes you deem necessary there. On my part I will work on fixing the issue in the main branch. I would love to know how your application progresses and whether you have any comments or suggestions for Laharsub going forward. 

Thanks,
Tomasz 

Nov 17, 2010 at 4:46 PM

I'm using both SL and console clients to communicate and have not run into this issue.
The main difference in my code mods is that I don't muck around with the response stream(seeking and such).
I map the response stream to a memorystream and I get rid of the response stream ASAP:

 //Extension method on Stream

 public static byte[] MapToByteArray(this Stream stream)
 {
     var ms = stream.MapToMemoryStream();
     return ms.ToArray();
 }

 //Extension method on Stream
 public static MemoryStream MapToMemoryStream(this Stream stream)
 {
           byte[] buffer = new byte[1024];
           int bytesRead = 0;
           MemoryStream ms = new MemoryStream();
           //map the stream to an in-memory stream and dispose of it as quickly as we can since we don't
           //need to keep the source stream open any longer than we need to.
           using (stream)
           {
               while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
               {
                   ms.Write(buffer, 0, bytesRead);
               }
           }
           ms.Seek(0, SeekOrigin.Begin);
           return ms;
  }


 private IEnumerable<PubSubMessage> ParseResponse(HttpWebResponse response)
 {
        if (response.StatusCode != HttpStatusCode.OK)
        {
               //we have a problem!
               throw new InvalidOperationException(
                   string.Format(CultureInfo.InvariantCulture,
                   "Error while processing server response.Server returned status code {0}:{1}.",
                   response.StatusCode,
                   response.StatusDescription));
        }
        using(MemoryStream ms = response.GetResponseStream().MapToMemoryStream())
        {
                 //do whatever to parse the resonse stream
        }
 }

 

Also when constructing the PubSubMessage on the client in ParseMimepart(string part) I create a new memorystream to hold the body
and create a byte[] out of that


void PubSubMessage ParseMimePart(string part)
{
   .....
   .....
   var body = Encoding.UTF8.GetBytes(part.Substring(part.IndexOf("\r\n\r\n") + "\r\n\r\n".Length));
            var ms = new MemoryStream(body);
            var bodyBytes = ms.MapToByteArray();
            PubSubMessage message = CreateGenericPubSubMessage(
                new PubSubMessage
                {   TopicId = topicId,
                    Body = bodyBytes,
                    MessageId = messageId,
                    FullTypeNameOfBody = fullTypeNameOfBody
                }, serializer);
            return message;     
}

 

Wonder if the above makes any difference....

Nov 18, 2010 at 8:08 AM

I don't think that it does.

The problem occurs when converting specific binary content, like the array I posted before.

Whenever you use UTF8 encoding to convert byte[] to string and back you will have this problem.

I think that you don't have this problem just because your specific binary content represents characters that are recognized by the UTF8 encoding.

Nov 19, 2010 at 5:30 AM

I see. so it looks like for now we can get away by modifying the GetBodyAsString() in PubSubMessage from this

 public string GetBodyAsString()
 {
       if (this.Body == null)
        {
            return null;
        }
        else
        {
            return Encoding.UTF8.GetString(this.Body, 0, this.Body.Length);
        }
 }

to this:

 public string GetBodyAsString()
 {
       if (this.Body == null)
       {
           return null;
       }
       else
       {
           return Convert.ToBase64String(this.Body);
       }
}

Thoughts....?

Coordinator
Dec 5, 2010 at 12:06 AM
Edited Dec 5, 2010 at 12:08 AM

I have investigated this issue and it is now fixed in the dev branch. The fix will be included in the next release of Laharsub. 

Changeset http://laharsub.codeplex.com/SourceControl/changeset/changes/5afbb112a0ac contains the client code fix, and changeset http://laharsub.codeplex.com/SourceControl/changeset/changes/9b6aeca7d918 contains Silverlight and .NET unit tests for publishing and subscribing to a binary message. The latter includes, among others, a regression test for the case of publishing a single byte with value 218 that rubinsh brought up above. 

The code below shows how to create a binary message to be published:

 

byte[] body = new byte[] { 0, 1, 2, 3 }; // arbitrary binary array
PubsubMessage m1 = new PubsubMessage {
    TopicId = 1,
    ContentType = "application/foobar", // content type is application-specific, it is not prescribed by Laharsub
    Body = new MemoryStream(body)
};

 

When a binary message is received through a subscription, the binary content should be consumed directly from the PubsubMessage.Body stream as opposed to calling PubsubMessage.GetBodyAsString(). The GetBodyAsString is a helper method that can be used to convert the binary message body to a string if the application expects a UTF-8 encoded string. 

Thanks,
Tomasz