Generate PDF from React components

Posted on 2021-09-26 in Programmation

How do you generate a PDF from any React components? I am not talking about just allowing your users to print a page into a PDF with their browser, I am talking about transforming a page as PDF programmatically and saving it server side. The idea is for our user to see what the page looked like before moving forward in the application and this page becoming unaccessible.

Note

The final solution I describe, since it relies on a browser can be applied to any other framework like Angular or Aurelia or Vue.

There are several options to generate PDFs from React:

  • @react-pdf/renderer to generate beautiful PDF with dedicated components and only from its dedicated components. It's the best solution if you want a beautiful PDF and have the time/need for a dedicated template.
  • jsPDF with its html method which can transform any HTML element into a PDF. However, from experience, you need to adjust the scale of the page so it can if into the PDF and you can have render issues (with unbreakable spaces for instance).
  • A combo based on html2canvas and jsPDF to transform any HTML element into an image which you can then integrate into a PDF document. The main problem being: it's an image (so text cannot be selected), you cannot easily split it across multiple pages and if you try to put it into a very long page, it won't print properly. It's also quite slow.

All these libraries work client side. @react-pdf/renderer is also compatible with NodeJS so it can run server side. So, where to render the PDF, client side or server side?

Rendering it client side is appealing: no need for a dedicated service and you can leverage the browser to render the components. But, you are dependent on the browser of the user (the screen size will impact how the document is rendered and how it will be printed) and you must trust your user not to change anything.

Rendering it server side is more complex: you need a service to do it, but you render the document in an environment our control. You inject the data you want and render it the way you want. The service itself is complex: you need to launch a browser to render the components.

In our use case, we wanted to be sure the PDF was generated with our values without influences from media queries. So this meant rendering it server side. Since we couldn't use @react-pdf/renderer (we wanted to render arbitrary components), what could we do? Well, we decided to create a NodeJS service that runs a headless Chrome managed by Puppeteer to render the components. We then use the printing functions of Puppeteer and some CSS rules to print the page into a PDF with margins and a custom footer. The result isn't perfect: just like when you print a page with the browser, you can have page break in middle of words, but it's a fairly good compromise between time and quality.

Here is how we did it:

  1. We created a custom index-pdf-server.ts file to contain a dedicated entry point for the app. It configures our MaterialUI theme, our translations and selects which components to render on the page. It is also responsible for getting the data. To avoid roundtrips, we inject the data into the page with Puppeteer. It works like that:

    1. We launch Puppeteer.
    2. It loads the React app.
    3. We inject data into a dedicated element in the page with Puppeteer.
    4. The React app loads it and renders.
    5. We wait for the page to render in Puppeteer.
    6. We transform the page into a PDF.

    The gist of this index is this (imports omitted):

     1 const rootEl = document.getElementById('root');
     2 
     3 const Component: React.FC = () => {
     4     const intl = useIntl();
     5 
     6     const [reactData, setReactData] = useState<ReactData>();
     7 
     8     useEffect(() => {
     9         if (reactData) {
    10             return;
    11         }
    12 
    13         const interval = setInterval(() => {
    14             const reactDataElement = document.getElementById('react-data');
    15             if (reactDataElement && reactDataElement.textContent) {
    16                 setReactData(JSON.parse(reactDataElement.textContent));
    17             }
    18         }, 100);
    19 
    20         return () => clearInterval(interval);
    21     }, [reactData]);
    22 
    23     const page = useMemo(() => {
    24         if (!reactData) {
    25             return;
    26         }
    27 
    28         // Switch to get the component to render.
    29     }, [intl, reactData]);
    30 
    31     const canDisplayPage = useMemo(() => !reactData || !page, [reactData, page]);
    32 
    33     if (canDisplayPage) {
    34         return null;
    35     }
    36 
    37     return (
    38         <div>
    39         <div>{page}</div>
    40         {/* We will wait for this element to detect the end of render. */}
    41         <div id="print" />
    42         </div>
    43     );
    44 };
    45 
    46 if (rootEl) {
    47     ReactDOM.render(
    48         <ThemeProvider theme={theme}>
    49         <RawIntlProvider value={intlData}>
    50             <Component />
    51         </RawIntlProvider>
    52         </ThemeProvider>,
    53         rootEl,
    54     );
    55 }
    
  2. Our app uses react-app-rewired for app compilation. In order to use this index-pdf-server.ts file, we had to modify a bit our config-overrides.js file. In a nutshell, when we compile the app for the server, we set a SERVER environment variable to change the entry point and disable CSP and SRI. The most important part is this:

    if (process.env.SERVER) {
        config.entry = config.entry.replace('index', 'index-pdf-server');
    }
    

    This allows us to reuse almost all our code and configuration while being able to compile an app dedicated to our server. This is done with SERVER=true env-cmd -f .env.${REACT_APP_ENV} react-app-rewired build && cp -R build pdf-server/.

  3. We added some CSS code inside @media print media queries to improve the display and hide some buttons for print. As a nice side effect, if your users try to print a page, it will be more beautiful.

    Astuce

    You can use the break-after or break-before rules to force page breaks. For instance, break-after: page; will force a page break after this element.

  4. We can then create a server/server.js file in our project to handle the backend side of things. It has two routes: one to server the built application and one to receive the PDF request with its data (and respond with the PDF). It looks like this:

      1 const puppeteer = require('puppeteer');
      2 const express = require('express');
      3 const bodyParser = require('body-parser');
      4 const path = require('path');
      5 const winston = require('winston');
      6 const pdf = require('pdf-parse');
      7 
      8 // Allowed pages.
      9 const allowedPages = [];
     10 
     11 const port = process.env.PORT || 4000;
     12 const logLevel = process.env.LOG_LEVEL || 'info';
     13 const waitForSelectorTimeout = process.env.WAIT_FOR_SELECTOR_TIMEOUT
     14     ? parseInt(process.env.WAIT_FOR_SELECTOR_TIMEOUT)
     15     : 15_000;
     16 const pdfGenerationRetryCount = process.env.PDF_GENERATION_RETRY_COUNT
     17     ? parseInt(process.env.PDF_GENERATION_RETRY_COUN)
     18     : 3;
     19 
     20 const logger = winston.createLogger({
     21     level: logLevel,
     22     format: winston.format.json(),
     23     defaultMeta: { service: 'frontend-pdf-server' },
     24 });
     25 
     26 if (process.env.NODE_ENV === 'production') {
     27     logger.add(
     28         new winston.transports.Console({
     29             format: winston.format.json(),
     30         }),
     31     );
     32 } else {
     33     logger.add(
     34         new winston.transports.Console({
     35             format: winston.format.simple(),
     36         }),
     37     );
     38 }
     39 
     40 const app = express();
     41 app.use(bodyParser.urlencoded({ extended: true }));
     42 app.use(bodyParser.json());
     43 
     44 const injectReactDataIntoPage = async (page, requestBody) => {
     45     logger.debug('Injecting data into the page.');
     46     await page.evaluate(reactData => {
     47         const node = document.createElement('script');
     48         node.setAttribute('type', 'application/json');
     49         node.setAttribute('id', 'react-data');
     50         node.innerText = JSON.stringify(reactData);
     51         document.body.appendChild(node);
     52     }, requestBody);
     53 };
     54 
     55 const print = async (page, timestamp) => {
     56     const margin = 30;
     57     return await page.pdf({
     58         format: 'A4',
     59         printBackground: true,
     60         omitBackground: true,
     61         margin: { top: margin, bottom: margin, left: margin, right: margin },
     62         displayHeaderFooter: true,
     63         footerTemplate: `<p style="font-size: 2mm; position: absolute; right: 50%; transform: translateX(-50%)">${timestamp}</p>`,
     64         headerTemplate: '',
     65     });
     66 };
     67 
     68 const checkPDF = async (pdfBuffer, timestamp) => {
     69     const data = await pdf(pdfBuffer);
     70     // Do we have text beside the timestamp in the footer? If yes, it's good.
     71     // Otherwise, the PDF is invalid (we generated it before render completed).
     72     return data.text.replace(timestamp, '').trim().length > 100;
     73 };
     74 
     75 const printAndCheckPDF = async (page, timestamp) => {
     76     let isValid = false;
     77     let retryCount = 0;
     78     let pdfBuffer = null;
     79 
     80     // Most of the time, waiting for this element is enough for the page to render correctly and for
     81     // us to get a proper PDF. Once in a while, no text is rendered and the PDF is empty. So we always
     82     // check the generated PDF.
     83     await page.waitForSelector('#print', { timeout: waitForSelectorTimeout });
     84     do {
     85         retryCount += 1;
     86         // Wait a bit for the render before trying again (invalid PDF) or to be sure to
     87         // get a full render (first try).
     88         await page.waitForTimeout(retryCount * retryCount * 100);
     89         pdfBuffer = await print(page, timestamp);
     90         isValid = await checkPDF(pdfBuffer, timestamp);
     91     } while (!isValid && retryCount < pdfGenerationRetryCount);
     92 
     93     if (!isValid) {
     94         throw new Error(`Failed to generate PDF, even after ${retryCount} tries.`);
     95     }
     96 
     97     return pdfBuffer;
     98 };
     99 
    100 /**
    101 * Generate a PDF from React frontend components with a Chrome headless
    102 * managed by puppeteer.
    103 */
    104 app.post('/react-to-pdf', async (req, res) => {
    105     if (!req.body.page || !allowedPages.includes(req.body.page) || !req.body.pdfData) {
    106         res.writeHead(400);
    107         res.end();
    108         return;
    109     }
    110 
    111     const browserConsoleMessages = [];
    112     try {
    113         const browser = await puppeteer.launch({
    114         headless: true,
    115         dumpio: true,
    116         args: [
    117             '--disable-gpu',
    118             '--disable-dev-shm-usage',
    119             '--disable-setuid-sandbox',
    120             '--no-sandbox',
    121             '--disable-software-rasterizer',
    122         ],
    123         });
    124         const page = await browser.newPage();
    125         await page.setViewport({ width: 1980, height: 768 });
    126         // Force language for translation and number formatting.
    127         await page.evaluateOnNewDocument(() => {
    128             Object.defineProperty(navigator, 'language', {
    129                 get: function() {
    130                 return 'fr-FR';
    131                 },
    132             });
    133             Object.defineProperty(navigator, 'languages', {
    134                 get: function() {
    135                 return ['fr-FR', 'fr'];
    136                 },
    137             });
    138         });
    139         await page.goto(`http://localhost:${port}`);
    140         page.on('console', message => {
    141             browserConsoleMessages.push(message.text());
    142         });
    143 
    144         logger.debug('Handling print request', req.body.page);
    145 
    146         await injectReactDataIntoPage(page, req.body);
    147         const pdfBuffer = await printAndCheckPDF(page, req.body.timestamp);
    148 
    149         logger.debug('Sending response.');
    150         res.writeHead(200, {
    151             'Content-Type': 'application/pdf',
    152             'Content-Length': pdfBuffer.length,
    153         });
    154         res.end(pdfBuffer);
    155 
    156         await page.close();
    157         await browser.close();
    158     } catch (e) {
    159         logger.error(e);
    160         logger.error(browserConsoleMessages.join('\n'));
    161         res.writeHead(500);
    162         res.end();
    163         throw e;
    164     }
    165     logger.debug('Print succeeded.');
    166 });
    167 
    168 app.get('/', async (req, res) => {
    169     res.sendFile(path.join(__dirname, 'build/index.html'));
    170 });
    171 
    172 app.get('/health', (req, res) => {
    173     res.writeHead(200);
    174     res.end();
    175 });
    176 
    177 app.use(express.static('public'));
    178 app.use(express.static('build'));
    179 
    180 logger.info(`Listening on port ${port}`);
    181 app.listen(port);
    
  5. To deploy this service with Docker, you need to install several libraries as well as run the service as a user other than root. This Dockerfile can help you get started:

     1 FROM node:14-slim AS builder
     2 WORKDIR /app
     3 
     4 RUN apt-get update && \
     5 apt-get install -y python3 make gcc g++ openssl ca-certificates
     6 
     7 ARG REACT_APP_ENV=undefined
     8 ARG REACT_APP_WEBSITE_BASE_URL=undefined
     9 ARG REACT_APP_COMMIT_SHA=undefined
    10 ARG COMMIT_SHA=undefined
    11 
    12 ENV REACT_APP_ENV=$REACT_APP_ENV
    13 ENV REACT_APP_WEBSITE_BASE_URL=$REACT_APP_WEBSITE_BASE_URL
    14 ENV REACT_APP_COMMIT_SHA=$REACT_APP_COMMIT_SHA
    15 ENV COMMIT_SHA=$COMMIT_SHA
    16 
    17 COPY . ./
    18 
    19 RUN yarn install --frozen-lockfile
    20 RUN yarn build-pdf-server
    21 
    22 
    23 # Run the pdf-server in node.
    24 FROM node:14-slim AS runner
    25 RUN mkdir -p /var/www/frontend-pdf-server/build/
    26 WORKDIR /var/www/frontend-pdf-server/
    27 
    28 RUN apt-get update && \
    29     apt-get upgrade -y && \
    30     apt-get install -y dumb-init \
    31         fonts-liberation \
    32         gconf-service \
    33         libappindicator1 \
    34         libasound2 \
    35         libatk1.0-0 \
    36         libcairo2 \
    37         libcups2 \
    38         libfontconfig1 \
    39         libgbm-dev \
    40         libgdk-pixbuf2.0-0 \
    41         libgtk-3-0 \
    42         libicu-dev \
    43         libjpeg-dev \
    44         libnspr4 \
    45         libnss3 \
    46         libpango-1.0-0 \
    47         libpangocairo-1.0-0 \
    48         libpng-dev \
    49         libx11-6 \
    50         libx11-xcb1 \
    51         libxcb1 \
    52         libxcomposite1 \
    53         libxcursor1 \
    54         libxdamage1 \
    55         libxext6 \
    56         libxfixes3 \
    57         libxi6 \
    58         libxrandr2 \
    59         libxrender1 \
    60         libxss1 \
    61         libxtst6 \
    62         xdg-utils && \
    63     apt-get clean
    64 
    65 RUN groupadd --gid 1001 noderunner && \
    66     useradd noderunner --create-home --uid 1001 --gid 1001
    67 
    68 COPY --from=builder /app/build /var/www/frontend-pdf-server/build/
    69 COPY --from=builder /app/pdf-server/ /var/www/frontend-pdf-server/
    70 
    71 RUN yarn install --frozen-lockfile
    72 
    73 USER noderunner
    74 ENTRYPOINT ["/usr/bin/dumb-init", "--"]
    75 CMD ["node", "server.js"]
    
  6. As a bonus, if you also need to generate documents from a @react-pdf/renderer template, I suggest that you create the template in a JSX file and then import it with the import-jsx library like this:

    const importJsx = require('import-jsx');
    const { renderToStream } = require('@react-pdf/renderer');
    
    const Document = importJsx('./documents/my-document');
    
    const documentsToRenderFunctions = {
        MyDocument: Document,
    };
    const allowedDocuments = Array.from(Object.keys(documentsToRenderFunctions));
    
    /**
    * Generate documents based on dedicated React components from react-pdf/renderer.
    *
    * All the rendering is done in the NodeJS process.
    */
    app.post('/document', async (req, res) => {
        if (!req.body.document || !allowedDocuments.includes(req.body.document) || !req.body.pdfData) {
            res.writeHead(400);
            res.end();
            return;
        }
    
        logger.debug('Handling document generation request', req.body.document);
        try {
            const pdfStream = await renderToStream(
            documentsToRenderFunctions[req.body.document]({
                ...req.body.pdfData,
                timestamp: req.body.timestamp,
            }),
            );
            res.setHeader('Content-Type', 'application/pdf');
            pdfStream.pipe(res);
            pdfStream.on('end', () => logger.debug('Done streaming, PDF generation succeeded.'));
        } catch (e) {
            logger.error(e);
            res.writeHead(500);
            res.end();
            throw e;
        }
    });
    

To conclude, it's not as obvious as it seems. I even wander after all the time I spent to create and stabilize this service if it wouldn't have been shorter to just create a proper template to render the PDFs. Maybe yes, maybe no. Our product team really wanted to have almost the same display in the PDF as in the page, so for them it was best this way. I also think the way I inject data is not optimum: I could try to use a template and render a page that already has the data in it instead of injecting it and relying on a setInterval to read it. But this would involve more work to make it work without interfering with the build process.

It's also hard to detect when React is done redering the page. My method is probably not the best (wait for an element, then wait a bit for a timeout, then render the page and check we have text in the PDF), but it seems to work.

I'd say, despite all the issue this solution can have, right now it serves its purpose and allow us to move forward. I hope you enjoyed this post and if you have any comments or remarks, please leave a comment!