Axon Jgroups configuration

Hi All,
In our application based on axon, jgroups connector is used for node identification. Please find attached tcp_gossip.xml used. It is noticed that most of the time is spend in node identification for command processing. Not sure if there is some issue with tcp_gossip configuration. This time noticed is some where near to 10 - 12 sec, first few requests, which goes on increasing as the load testing progresses.

Please suggest.

Thanks,
Vijaya

tcp_gossip.xml (2.07 KB)

Hi Vijaya,

how do you mean 10-12 seconds for node identification? Is that a one-time identification each time a new node is connected, or is that time spent each time a command is sent? The latter would be surprising, as the node is chosen based on information available in-memory. How many nodes do you have, and what is the average loadFactor you have configured for each of them?

Cheers,

Allard

This is not one time, every command posting is taking this long time. We have configured two nodes with loadFactor as 50 for both nodes.
One more issue noticed is, we frequently get this ConcurrencyModificationException for Aggregate. Here are the details of application

  1. Application is running on two nodes. Application is configured to use distributed command bus for distribution of commands using Jgroups connector.
  2. Application is receiving the messages in MessageListener using JMS protocol.
  3. In listener the command is created and pushed to command bus using command gateway.
  4. The node for execution of this command is decided using JGroups connector based on AnnotationRoutingStrategy.
  5. During initial request we don’t see much delay in the time request is received by the listener, till it is executed by the specific node. But later this delay goes on increasing and reaches to 10 -15 sec.
  6. Also it is noticed that we frequently get ConcurrencyModificationException for Aggregate, even though application is not executing two commands at the same time.

We tried tcp_gossip by enabling/ disabling dedicated thread pool configuration.

Do you see some issue in our implementation/ configurations or have any suggestion how to fix this delay in command execution and fixing ConcurrencyModificationException.

Thanks,
Vijaya

Hi Vijaya,

I found an earlier question about the exception, to which this was my response: https://groups.google.com/d/msg/axonframework/0cBm83ZGVq0/0AK5IOqpctEJ

Regarding the load test, and the hanging, it may very well be possible that the resources on the receiving end are getting full. If the DisruptorCommandBus is busy and the buffer is completely filled, processes that deliver new commands must wait for a position to become available on the buffer. The same will probably happen in the communication thread pools of jgroups, eventually leading to processes waiting on the publication side.
The best way to figure out what is blocking what, is to measure the behavior of the components using a profiler. I don’t think it is the routing process itself that takes 10 seconds. It’s more likely that getting the message out takes that amount of time because of a full backlog.

Cheers,

Allard