Problems with deserialization

Hi Axon devs

I’m working on a project where we use Axon and a JPA repository for events. Until recently I was using a 2.0 RC build of Axon from December 2012 and everything was fine; events could be serialized and then deserialized later without problems.

But when I switched to the new 2.0 version, I suddenly started getting errors for new events, i.e. events serialized after the version bump.

Events are written without issue, but later when they are reloaded I get a StreamException from XStream when accessing the payload of a serialized domain event for the first time. The specific error is “only whitespace content allowed before start tag and not \ufffd (position: START_DOCUMENT seen \ufffd… @1:1)”, and like I mentioned, it only applies to events serialized after I switched to the newest version. I use the default XStream serialization.

The Axon dependencies haven’t changed since my RC build, AFAICT, but using “git bisect” from my working build of Axon (GIT: “e1ce97c6e30affece161a341113d0e1999a79a4a”), I can see that the problem seems to have been introduced in a commit annotated “Big refactoring of upcasting process” (GIT: “6549a136c813aa6dbaee444c606141a0bc0cfae4”).

Any ideas as to what happened?

/Mads

PS: I should note that I’m using Hibernate 3.5.6-Final, since that caused some problems earlier that were swiftly resolved by the Axon team :slight_smile:

Hi Mads,

that’s weird. The commit you’re pointing to changes nothing to the way objects are serialized. I’m trying to reproduce the problem using Hibernate 3.5.6, but I can’t get it to fail.

A little research shows that fhe \ufffd character is a UTF-8 replacement character. It indicates a character that could not be mapped to a UTF-8 codepoint. That would mean your serialized event starts with a “weird” character. What encoding parameter do you have on your database connection? What encoding are the database tables?

It is possible that the error is cause by xstream writing an unmappable character (\u0000 maybe?). COuld you set a breakpoint on JpaEventStore line 152. Could you check the values of the first bytes? What happens when you create a string using that byte[] using UTF-8 encoding. You can also set a breakpoint in XStreamSerializer line 140 to see what bytes are actually available when reading. I’m curious about what the first 2 bytes are.

Cheers,

Allard

Hi Allard

I don’t know if it’s relevant, but the payload is a Java serialized Protocol Buffer object, which while it implements Serializable, is obviously not intended for serialization this way. The current default Xstream serialization is only temporary, as we intend to use PB’s own serialisation in a custom event serializer once we get a little further with the project. The primary reason I brought this issue to the forum is that I think there might be an unexpected “feature” in the current Axon event serialization :wink:

Anyway, using your breakpoints, I can confirm that the data that is written is the same as the data that is read. However the data is apparently invalid when interpreted as UTF-8.

The following test compares the initial block of bytes with their UTF-8 string interpretation. The example bytes were taken from the real payload, but truncated for clarity.

The test fails.

import java.nio.charset.Charset;
import java.util.Arrays;

import org.junit.Assert;
import org.junit.Test;

public class TestBytes {
byte data[] = { -84, -19, 0, 5, 115, 114, 0, 55, 99, 111, 109, 46,
103, 111, 111, 103, 108, 101, 46, 112, 114, 111, 116, 111,
98, 117, 102, 46, 71, 101, 110, 101, 114, 97, 116, 101, 100,
77, 101, 115, 115, 97, 103, 101, 76, 105, 116, 101, 36, 83 };

@Test
public void test() {
Assert.assertArrayEquals(data, new String(data,
Charset.forName(“UTF-8”)).getBytes());
}
}

After being interpreted as UTF-8, the first two bytes become “{ 60, 60 }” intead of “{ -84, -19 }”. The rest of the bytes remain ther same.

Maybe the first two bytes are some kind of byte-order marker?

The db table for events uses UTF-8 encoding (in MySQL).

CREATE TABLE domain_event_entry (
aggregate_identifier varchar(255) COLLATE utf8_bin NOT NULL,
sequence_number bigint(20) NOT NULL,
type varchar(255) COLLATE utf8_bin NOT NULL,
event_identifier varchar(255) COLLATE utf8_bin NOT NULL,
meta_data longblob,
payload longblob NOT NULL,
payload_revision varchar(255) COLLATE utf8_bin DEFAULT NULL,
payload_type varchar(255) COLLATE utf8_bin NOT NULL,
time_stamp varchar(255) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (aggregate_identifier,sequence_number,type)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

/Mads

Oops

After being interpreted as UTF-8, the first two bytes become “{ 60, 60 }” intead of “{ -84, -19 }”. The rest of the bytes remain ther same.

The initial bytes are “{ 63, 63 }” :slight_smile:

/Mads

Hi Mads,

it looks like you’re writing binary data, instead of a String with XML inside. Your byte[] outputs data that clearly doesn’t look like XML.
Perhaps you should try using the JavaSerializer. The XStream serializer expects the payload field to be a textual value (UTF-8 encoded, by default).
I think you’ve gone beyond the limits of what XStream can do…

Cheers,

Allard

Hi Mads,

it looks like you’re writing binary data, instead of a String with XML inside. Your byte outputs data that clearly doesn’t look like XML.

Ia agree.

Perhaps you should try using the JavaSerializer. The XStream serializer expects the payload field to be a textual value (UTF-8 encoded, by default).

The thing is, I haven’t changed anything in my code. I can switch to an older version of Axon and it works again. Furthermore, events serialised using old versions of Axon still work with the new Axon?!

Since I’m still in early development, I can scrap my old events and write a new Serializer that uses the build in serialisation in my Protocol Buffer objects. I’m just thinking other people might not be so lucky :wink:

/Mads

Just for reference, the old events look like this when interpreted as UTF-8 bytes:

<com.dibspayment.satellite.domain.payment.event.EventProtocolBuffers_-PaymentStartedEvent resolves-to=“com.google.protobuf.GeneratedMessageLite$SerializedForm”>com.dibspayment.satellite.domain.payment.event.EventProtocolBuffers$PaymentStartedEventEhAKFnAJExlDDaBRgQ/OewWqggGxBhobCLTD+CoSFERJQlMgQXJjaGl0cmFkZSBUZXN0IhQI1/K5
AhIBLRoKZGEgICAgICAgICgBigIPMDowOjA6MDowOjA6MDoxkgIMCgNES0sSAzIwOBgCqAIBsgJA
… SNIP …
cDovL2xvY2FsaG9zdDo5MDkwL2RpYnNwYXltZW50d2luZG93L0R1bW15UGFnZT9wYWdlPWFjY2Vw
dFJldHVyblVybLICGAoHb3JkZXJJZBINMTM1OTI4ODU4OTgwOMAC9wI=</com.dibspayment.satellite.domain.payment.event.EventProtocolBuffers_-PaymentStartedEvent>

That looks like XML with a nested chunk of encoded binary data.

/Mads

Hmmm… could you send an example of what the new events look like?