In our Axon 2.4 application we use sagas extensively in a multi-host environment. We use the DistributedCommandBus to spread work among hosts, and each host runs saga code locally as it generates events. Row-level locks are the key: you don’t want two hosts to load the same saga at the same time.
We made some changes to the saga manager code, which were merged into Axon’s 2.4.x branch (see PRs #411 and #427) to avoid deadlocks by imposing an update order on sagas. Then it’s a matter of locking each saga row as it’s read from the repository, which we do with a custom saga schema class:
public class LockingSagaSqlSchema extends PostgresSagaSqlSchema {
public static boolean shouldLock = true;
private final SagaSchema sagaSchema;
public LockingSagaSqlSchema(SagaSchema sagaSchema) {
super(sagaSchema);
this.sagaSchema = sagaSchema;
}
@Override
public PreparedStatement sql_loadSaga(Connection connection, String sagaId) throws SQLException {
if (shouldLock) {
final String sql =
"SELECT serializedSaga, sagaType, revision"
+ " FROM "
+ sagaSchema.sagaEntryTable()
+ " WHERE sagaId = ?"
+ " FOR UPDATE";
PreparedStatement preparedStatement = connection.prepareStatement(sql);
preparedStatement.setString(1, sagaId);
return preparedStatement;
} else {
return super.sql_loadSaga(connection, sagaId);
}
}
}
I’m working on porting our application to Axon 3.1 at the moment and am planning to use the same setup; only a subset of our application’s events get persisted to an event store, so we can’t use tracking saga managers. But if all your events are persisted, Axon 3.1’s distributed saga support would probably be a cleaner and easier solution.
-Steve