Yes, you should use the EmbeddedEventStore. It is initialized with an EventStorageEngine, which has several implementations for different storage backends. You could have a look at those implementations to get some inspiration how to implement one for Cassandra.
the key for the performance is not in the EmbeddedEventStore, but rather in the EventStorageEngine that it is using. The EmbeddedEventStore uses the Event Storage Engine to perform the actual storage of events, to ensure they can be accessed on other machines.
The EmbeddedEventStore will manage the streams from consumers, combining them into a single stream when multiple consumers are reading the same events. This is to optimize I/O in the case many consumers are reading from the HEAD of the Event Store.
If you really care about performance, I’d reconsider Cassandra first. While Cassandra can append data very fast, you pay a massive performance penalty if you want to perform duplicate key checks. We have seen very good results in a properly tuned RDBMS, and of course in AxonIQ’s Event Store Server.
Generally, you don’t need to tune the EmbeddedEventStore much. It’s mainly the StorageEngine that needs tuning. One thing that can be tuned in the EmbeddedEventStore is the delay between two “polls” to the EventStorageEngine to check for data that has been appended to the event store by another instance (data inserted locally will trigger an immediate poll).