PDF microservice decoupling and data duplication

I’m working on a microservice that other microservices can request a PDF file from. Here is the flow I am considering:

  • User calls the PDF generation endpoint (GraphQL mutation) on the microservice that needs the PDF, like Sales microservice. That endpoint dispatches a CreateInvoicePDFCommand or CreateQuotePDFCommand containing the related aggregate id.

  • Seperately, the client subscribes to updates via GraphQL subscription (using an Axon Query Bus subscription) to get notified when the PDF is ready.

  • Use a saga to coordinate the flow in PDF microservice? Listens for InvoicePDFRequestedEvent, dispatches a command to generate the PDF.

  • Read model in PDF microservice is updated and uses the QueryUpdateEmitter to update the client subscription with either byte[] containing the PDF or S3 url.

Is my flow correct, and how does the PDF microservice obtain the data it uses to generate PDFs? I’ve read that one approach is to set up a view that duplicates data (like invoices, quotes, etc) from the sales microservice and other related services into the PDF microservice. I’m worried that this will cause the PDF microservice database to become enormous over time.

Thank you in advance

To be honest, requesting a PDF generated from some data sounds more like a typical client/server use case – send the data, receive the PDF. Do you have a specific reason to model it with events?

Although I am wonderning something similar to @sebastian.hans his reply, I still have some guidance based on the flow.

  1. I don’t think you need a saga for this process. At least, I don’t see what “slightly complex business flow” it would need to manage. My rule of thumb is, if a regular event handler can do the task of listening to an event and triggering a command, I pick that. If I need to carry information from several aggregates or contexts, then a saga may make sense. Although I understand you have several microservices, your example does not explain how they interact, concerning this PDF generator. Hence, it’s tough to give a conclusive answer to this.
  2. I would not send over the entire PDF as a query update to be honest. I would only send a link to some form of data store for generated pdfs. My reasoning is that your pdfs might get rather large. Although you can use Axon’s messaging flow for that for sure, it might needlessly obstruct others. So, if the update message is just a link for the UI to download the PDF from, you’d be a bit leaner there.

Besides your flow question, you are concerned about how the PDF generator will get its data. This can be simple queries, just as the query used to return the result to the original requester of the PDF.

You can also “duplicate” information between the services instead of using queries. Doing so is not necessarily a bad thing. It’s a cutoff on how quickly you want the result to be at the place it needs to be.

My personal stance is to use queries when possible, as it gives decoupling throughout. Thus, you are able to move the code handling/sending queries from service to service, as the messages form the connection instead of database dependencies.


In all, I hope this helps you further, @redubr!