Hi Paul,
Axon enabled applications are usually communicating with each other by sharing messages (commands, events, queries). These messages are representing the core
API of these services. The messages are shared via distributed buses/gateways: CommandBus/CommandGateway, EventBus/EventGateway, QueryBus/QueryGateway. Axon Server, being a message broker, implements all three types of busses, and it will route messages out of the box. By default, it utilizes a Consistent Hashing algorithm making sure that your commands (belonging to the same aggregate) will be handled by the same instance of your application (for example, command handler in aggregate)
As I understand you have a WebSocket adapter/component surrounding your core API and exposing WebSocket endpoints. Personally, this WebSocket API is more convenient as a user-friendly API that is exposed to your customers and it does not have to be used for inter-service communication (you can use core API for this).
This is a related topic that you might find iteresting:
Load balancing websocket
services is not an Axon specific topic. You should be able to use service discovery and registration services (like Eureka for example) to achieve this.
Best,
Ivan