Data corruption after unexpected failures of axon server - [AXONIQ-9200] Validation exception

Hi everyone.

While generating tons of events for test purposes I’ve encountered two problems.
First I’ve ran out of storage. Second, when fixed the first issue and starting all over again at some point i’ve got out of memory error.
Both times, after these critical errors I couldn’t startup axon server and was getting the following error:

[AXONIQ-9200] Validation exception: segment 563052 ending at 1126464

00…01126464.events is the name of events file I have in events store files location.
At that point I had around 80 000 000 events, latest events file was much far away from the corrupted one (1 126 464 ).

Nevertheless tried to remove some of the latest files and still cannot launch the server.

Are there any tools to fix corrupted files after server crashes ? Couldn’t find any suitable commands from axon-cli documentation.

Thanks!

Oh and here is the stack trace I’m getting

`
2020-01-26 08:32:33.021 ERROR 8 — [ main] o.s.boot.SpringApplication : Application run failed

org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name ‘axonHubEventService’ defined in URL [jar:file:/opt/axonserver/axonserver.jar!/BOOT-INF/classes!/io/axoniq/axonserver/grpc/axonhub/AxonHubEventService.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name ‘EventDispatcher’ defined in URL [jar:file:/opt/axonserver/axonserver.jar!/BOOT-INF/classes!/io/axoniq/axonserver/message/event/EventDispatcher.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name ‘eventStoreLocator’: Invocation of init method failed; nested exception is io.axoniq.axonserver.exception.MessagingPlatformException: [AXONIQ-9200] Validation exception: segment 563052 ending at 1126464
at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:769) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:218) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1341) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1187) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:845) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:877) ~[spring-context-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:549) ~[spring-context-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:140) ~[spring-boot-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:742) ~[spring-boot-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:389) ~[spring-boot-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:311) ~[spring-boot-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1213) ~[spring-boot-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1202) ~[spring-boot-2.1.6.RELEASE.jar!/:2.1.6.RELEASE]
at io.axoniq.axonserver.AxonServer.main(AxonServer.java:32) ~[classes!/:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) ~[axonserver.jar:na]
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) ~[axonserver.jar:na]
at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) ~[axonserver.jar:na]
at org.springframework.boot.loader.PropertiesLauncher.main(PropertiesLauncher.java:593) ~[axonserver.jar:na]
Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name ‘EventDispatcher’ defined in URL [jar:file:/opt/axonserver/axonserver.jar!/BOOT-INF/classes!/io/axoniq/axonserver/message/event/EventDispatcher.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name ‘eventStoreLocator’: Invocation of init method failed; nested exception is io.axoniq.axonserver.exception.MessagingPlatformException: [AXONIQ-9200] Validation exception: segment 563052 ending at 1126464
at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:769) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:218) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1341) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1187) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:277) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1251) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1171) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:857) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:760) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
… 27 common frames omitted
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name ‘eventStoreLocator’: Invocation of init method failed; nested exception is io.axoniq.axonserver.exception.MessagingPlatformException: [AXONIQ-9200] Validation exception: segment 563052 ending at 1126464
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:139) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:414) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1770) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:593) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:277) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1251) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1171) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:857) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:760) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
… 41 common frames omitted
Caused by: io.axoniq.axonserver.exception.MessagingPlatformException: [AXONIQ-9200] Validation exception: segment 563052 ending at 1126464
at io.axoniq.axonserver.localstorage.file.SegmentBasedEventStore.validate(SegmentBasedEventStore.java:284) ~[classes!/:na]
at io.axoniq.axonserver.localstorage.file.SegmentBasedEventStore.init(SegmentBasedEventStore.java:236) ~[classes!/:na]
at io.axoniq.axonserver.localstorage.LocalEventStore$Workers.init(LocalEventStore.java:449) ~[classes!/:na]
at io.axoniq.axonserver.localstorage.LocalEventStore.initContext(LocalEventStore.java:96) ~[classes!/:na]
at io.axoniq.axonserver.topology.DefaultEventStoreLocator.init(DefaultEventStoreLocator.java:31) ~[classes!/:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:363) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:307) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:136) ~[spring-beans-5.1.8.RELEASE.jar!/:5.1.8.RELEASE]
… 54 common frames omitted

`

To be more exact: between crashes I fully cleared event store data and reloaded server, each time starting all over again.
When server crash happened, client application continued to spam server with commands for some period of time.
Used caching, snapshots and sent commands async. Amazon ebs as storage. Speed was about 5000 events / second.

For now the only way to start it was to roll back to previous snapshot.

Hi,

the standard edition of AxonServer doesn’t have the same guarantees as the clustered version. The latter has mechanisms to automatically recover from failure using the data on the other nodes. Also, it has a transaction log where it add commits before moving them to the event log, making it more resilient against data corruption.

We did make some improvements to the Standard edition in the upcoming 4.3 version. But still, there are no absolute guarantees on the standard edition. Just to be sure, are you running the latest available version?

Kind regards,

Currently using axon server 4.2.4 version.
Does it affect non axon server event store solutions too, like relational db or mongo ?

Also regarding backups. Does it enough doing regular snapshots of a file system, or it could sometimes catch
“some state in the middle” and will result in corrupted data ? I’ve tried many times, couldn’t catch this state.

Thanks

Hi Стас,

the standalone version of Axon appends entries directly to the event log, once a commit arrives. In the Enterprise Edition, there is a commit log in between, from which transactions are replicated. That gives that version much more resilience against failure.
In both versions, taking backups by copying the event files over is a reasonable approach. On startup, they look for the last written position and continue from there. Once a file has been completely written, you only need to include it in one last backup. Subsequent backups won’t need to include that file anymore.

Cheers,

Hi Allard,

In both versions, taking backups by copying the event files over is a reasonable approach. On startup, they look for the last written position and continue from there. Once a file has been completely written, you only need to include it in one last backup. Subsequent backups won’t need to include that file anymore.

If I understand correctly in case of any failure having full copy of filesystem at a particular moment would be enough to launch axon server?
Only latest, not fully written and not closed file could be broken, thus removing it should help.

What fears me right now is not the lack of high availability for axon standard edition, but the fact if it goes down like the lack of a heap I had previously, axon server might
not be able to restart because some of the oldest segments became corrupted and thus full history is lost. I supposed that only recent, not closed segments
could be broken. In my situation it is ok to have some down time and partial los of recent data in case of any software / hardware issues.
Restoring from backup in such cases is an option too, but much less desired.
Probably I’m missing something and there is way to start it in case of unexpected errors and losing only recent data.

Thanks!