We’re trying to get our environment production ready and in doing so encountered an OOME. The main concern is not necessarily the root cause of it although I do have a question on that matter but the fact that the Consumer API thread dies silently while service continues to run. This results in our PaaS not picking up the fact that service crashed and attempt to restart it. Would it be possible to make the exceptions out of KafkaConsumer/Producer threads be more closely tied to the stability of the whole application or catch those exceptions.
The second question is with regard to the root cause of it. We noticed that that the OOME is occurring when there is no consumer lag on the service, meaning there are no outstanding events for it to consume, and yet it continues to crash with an OOME each time the service is restarted. If anyone has any clues to why this might be happening I very much appreciate your input.