Multinode setup with cloud environments

coderinabstract · November 30, 2012, 5:17pm

Does this release candidate update the sample to show a multinode setup?

I just saw the Amazon CTO at the AWS re-invent keynote and some very large scale web/business applications doing very cool stuff with mult-node scalability and fault tolerance and some rules of thumb to follow for new application architectures and one thing they really were insistent and repeating as a rule of thumb on this that your app not aware of any state across nodes of any layers of the architecture.

One of the challenges of multinode environment management is to have loose coupling of components and small components and autoscaling with features in the cloud like Amazon aws (spinning up new jvms, db shards etc dynamically based on business and usage metrics for a seamless experience, some demos I saw were very impressive and awesome e.g. Pinterest) and tools like puppet cookbooks or chef help make the entire infrastructure become a programmable unit (as they called it)

one thing that is a question is how can Axon help with this strategy and is the command processing state tied to a node/jvm? and if so how can that state be extracted into a db/cache persistence layer and done in a more stateless manner?. I apologize as I am not very current in using 2.0 and this may well not be a question, however wanted to clarify. I plan on diving into it soon to get multinode setup going.

Thanks and Cheers

Allard · December 1, 2012, 9:25am

Hi,

the samples are in a different repository and do no have the same release cycle as the framework itself. However, at the 2.0 release itself, we do want to have the sample in a state where it showcases (at least some of) the scalability features of Axon 2. It won’t be a showtopper though if timing doesn’t permit that. That’s also the reason why it’s separate. I don’t want the sample to slow down the release cycle of the framework.

Axon’s distributed command bus with JGroups Connector follows the principles you’re outlining here. There is no shared state and you can dynamically scale up and down depending on the load on your application. Do note that this requires an Event Store that can handle the load of additional appends. Typically, you’d want to use a Cache to store aggregates in-memory. Otherwise you’d continuously read from the Event Store, making it an easy target for a bottleneck.

The only shared state between machines is the loadfactor of each node. But this is relatively static information. Based on the load factor (and the name of the node), each member of the Distributed Command Bus is capable of calculating the routing of each command. When a node drops, or joins, each node will recalculate routing automatically. No shared state at all…

Hope this clarifies it a bit.
Cheers,

Allard

coderinabstract · December 1, 2012, 8:55pm

Thanks Allard for clarification.

Any pointers on which Event stores in axon lib are compatible to to use. I have not experienced a Cache layer to store aggregates in memory and then take them to a persistance layer. A example or some architecture documentation of these options with pros/cons would be really helpful.

Currently there is a JPA eventStore with mysql we use, Evaluating mysql with something like Amazon Relational Data Service(RDS) for scaling mysql layer and amazon cloudfront with loadbalancer for tomcat jvms for axon app layer. Would a config like thatsatisfy the need outlined for statelessness and get multinode autoscaling gong?

The need would be axon app base jvm war files running on tomcat and mySql instances. Both these layers would have to have to be scaled/be fault tolerant in something like the amazon ecosystem above. In future amqp servers for offline processing and other jvms for cache, integration/api may also get added as a layer with different wars or jvms in the cloud.

With all of this, another question is the node name registry, hypothetically the jvms will spin up dynamically in the cloud and we may not have any knowledge of their identity in the autoscaling process. Would the load factor feature work with the new nodes in cluster for all axon based jvms getting spun up without any config changes about the new jvms?

Another question is that can load factor be turned off/as as a cloud management infrastructure does intelligent routing and spinning up of new virtual server to route request to based on analyzing load on jvms with infrastructure and business metrics. I guess, if the app is getting dynamic routing by a cloudfront technology, can it become even more stateless by muting of the load based routing within axon? and rely on an external load balancer?

Lot more to understand and digest, however want to continure the dialog on the bigger picture conceptually.

Thanks for your prompt responses as always and extreme innovation on this platform.

Allard · December 3, 2012, 8:55am

Hi,

first of all, it is important to keep in mind that the DistributedCommandBus only does routing of Commands. General-purpose load balancers will route information either (sort of) randomly or based on request properties. Sticky sessions is a common strategy. This will, however, not help command processing, unless you can guarantee that no two users are working on the same aggregate. And if you can guarantee that, you probably don’t need Axon

As for event stores, MySQL will suit fine. The Amazon RDS is a fine solution for an event store. Each node will just need mysql configuration that routes sql ops to RDS. This way, each node has access to the entire event store. But in practice, when operating in “all lights green” mode, the event store will only be written to. I’ll get to the caching a bit later on.

The fact that different nodes have randomly generated names is absolutely no problem. The command-bus is designed for exactly that purpose. It is the node’s name in combination with the load factor that’s sent to other nodes, so that each can calculate destinations independently. There is no central configuration needed.

Do be careful, when not using a single common Event Bus, though. If you use the SimpleEventBus with a DistributedCommandBus, Events are only dispatched locally. This may work fine, but Sagas that act on multiple aggregate may not receive all events you require. Those Sagas will need to be deployed on a machine that receives all events from the other nodes. A distributed Saga Manager is on the wishlist, but won’t be included in the first 2.0 release.

Then, caching… there is no need for complicated “cache layers”. All you need to do is configure a simple 1-node cache instance (for example EhCache) and configure that in your EventSourcingRepository. Axon will use the cache when possible. Since Aggregates are routed consistently, you will always hit the node that has the aggregate in its cache. This removes the need for Cache replication between the nodes.

Cheers,

Allard