
Background
After replacing PDFKit with Grover in a long-running Rails application as I described here: simplethread.com/replacing-pdfkit-with-grover-for-rails-pdf-generation and being satisfied with the results I had the chance to start a new project. One of the requirements of this new application was the generation of a combined PDF for a resource with the possibility of multiple PDF, JPG, and PNG attachments.
Having had such a positive experience with Grover I reached for it once again to implement this new PDF generation.
Dependencies
After implementing the PDF generation with Grover and testing locally I was ready to deploy and test in our staging environment for this feature. This is when a discussion of dependencies came up. Grover is a wrapper around the Puppeteer NPM package that is used to interface with a headless Chrome browser to actually create the PDF. This requires there to be a working nodejs instance running in the container as well as Chrome being installed. In the dependency discussion, I was introduced to a relatively new gem github.com/excid3/ferrum_pdf that would remove the dependency on Puppeteer. The ferrum_pdf gem uses github.com/rubycdp/ferrum under the hood to interface directly with Chrome’s API. It is also inspired by github.com/Studiosity/grover.
Implementation
Seeing as how ferrum_pdf was inspired by Grover it has a similar API. Because of the need to combine other documents with the PDF version of the resource I decided once again to not use the middleware implementation, but generate the initial resource PDF by rendering a view to HTML directly and passing that into the FerrumPdf.render_pdf call. This turned out to be slightly different from Grover as the pdf_options are passed in as arguments rather than being able to set the defaults in an initializer. There is also a difference in the preprocessing. Rather than run the generated HTML through a preprocessor, you pass in the host and the protocol and the preprocessor will use those values to convert relative paths to full paths.
Having generated the initial resource PDF it was now time to combine all of the resources attached documents.
I ended up using github.com/boazsegev/combine_pdf to do the actual merging.
This was interesting as the documents to be combined could be PDFs, JPGs, or PNGs, but they all needed to end up in PDF format for the merging to work. The PDFs were relatively straight-forward but the images needed a little massaging before I could merge them. I ended up doing something so simple that I was surprised when it worked out so well. Ferrum PDF requires HTML and did not want to parse the image formats. So, I just turned the images into <img> tags and passed that HTML into the call. It ended up working well. The only caveat is that the images will be rendered full-size unless the width and height are set for the <img> tag.
Containerization
With the removal of the Puppeteer dependency the only addition to the container image was now Chrome. After deploying to the staging environment I was seeing timeouts when trying to reach the headless Chrome browser. This led me to the helpful document in ferrum_pdf here: github.com/excid3/ferrum_pdf/blob/main/FLY_HOSTING_GUIDE.md. These instructions were for hosting on fly.io, but with minimal changes I was able to adapt them to work with our application environment. The only real change was instead of passing in FerrumPdf.browser(options) I needed to pass in FerrrumPdf.browser(**options). Another deploy and everything was working smoothly.
Conclusion
I have been very happy with this replacement so far. Even though I very much enjoyed using Grover, not having nodejs and Puppeteer required as dependencies for PDF generation has made this much more straight-forward. I would recommend trying ferrum_pdf out for yourself if you ever need to build a feature like this in the future.
Loved the article? Hated it? Didn’t even read it?
We’d love to hear from you.