Mastering PDF Generation in Node.js: Puppeteer vs. PDFKit

Table of Contents

Generating PDFs is one of those requirements that inevitably lands on a backend developer’s desk. Whether it’s generating dynamic invoices, downloadable reports, or shipping labels, the ability to convert data into a portable, uneditable document is a staple of enterprise applications.

As we step into 2025, the Node.js ecosystem has matured significantly, but the core debate remains: Should you draw PDFs programmatically or render them via a headless browser?

In this guide, we aren’t just writing scripts; we are going to build a production-ready PDF Generation Microservice using Express. We will implement two distinct strategies:

The “Design-First” Approach using Puppeteer (HTML-to-PDF).
The “Performance-First” Approach using PDFKit (Programmatic Construction).

By the end of this article, you will have a clear understanding of the trade-offs, performance implications, and the code to implement both.

Prerequisites and Environment Setup
#

Before we dive into the code, let’s ensure your environment is ready. We are assuming you are running Node.js v20 (LTS) or v22.

Initialize the Project
#

Let’s create a dedicated directory for our service. We will use npm for package management.

mkdir node-pdf-service
cd node-pdf-service
npm init -y

Install Dependencies
#

We need express for the server, puppeteer for HTML rendering, and pdfkit for manual drawing. We’ll also add nodemon for development convenience.

npm install express puppeteer pdfkit
npm install --save-dev nodemon

Note: Installing Puppeteer will download a local version of Chrome/Chromium (~170MB). If you are deploying to Docker later, you’ll need a specific base image, which we will discuss in the “Production Considerations” section.

Project Structure
#

Your project structure should look like this:

node-pdf-service/
├── templates/
│   └── invoice.html   # For Puppeteer
├── src/
│   ├── app.js         # Main Express App
│   ├── htmlService.js # Puppeteer Logic
│   └── rawService.js  # PDFKit Logic
├── package.json
└── README.md

Architectural Overview
#

Before coding, let’s visualize how our service will handle requests. We want a unified API that can switch strategies based on the endpoint or complexity of the request.

graph TD User(Client Request) -->|GET /pdf/invoice| Router{API Gateway} subgraph "Node.js Application" Router -->|Strategy: HTML| PuppeteerService[Puppeteer Service] Router -->|Strategy: Raw| PDFKitService[PDFKit Service] PuppeteerService -->|Launch Browser| ChromeHeadless[Headless Chrome] ChromeHeadless -->|Render CSS/HTML| Buffer1[PDF Buffer] PDFKitService -->|Draw Text/Lines| Stream[PDF Stream] end Buffer1 -->|Response| User Stream -->|Pipe Response| User style User fill:#f9f,stroke:#333,stroke-width:2px style ChromeHeadless fill:#bbf,stroke:#333,stroke-width:2px style Router fill:#dfd,stroke:#333,stroke-width:2px

Strategy 1: The Design-First Approach (Puppeteer)
#

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium. This is the go-to solution when your design requirements are complex (flexbox, grids, custom fonts) or when you want to reuse existing frontend components.

1. Create the HTML Template
#

Create a file named templates/invoice.html. In a real-world scenario, you would use a template engine like Handlebars or EJS, but for this demo, we’ll inject data using simple string replacement.

<!-- templates/invoice.html -->
<!DOCTYPE html>
<html>
<head>
    <style>
        body { font-family: 'Helvetica', sans-serif; padding: 40px; }
        .header { display: flex; justify-content: space-between; margin-bottom: 20px; }
        .title { font-size: 24px; font-weight: bold; color: #333; }
        .table { width: 100%; border-collapse: collapse; margin-top: 20px; }
        .table th, .table td { border: 1px solid #ddd; padding: 8px; text-align: left; }
        .table th { background-color: #f2f2f2; }
        .total { text-align: right; margin-top: 20px; font-size: 18px; }
    </style>
</head>
<body>
    <div class="header">
        <div class="title">INVOICE #{{invoiceId}}</div>
        <div>Date: {{date}}</div>
    </div>
    <p>Billed to: <strong>{{customerName}}</strong></p>
    
    <table class="table">
        <thead>
            <tr>
                <th>Item</th>
                <th>Cost</th>
            </tr>
        </thead>
        <tbody>
            {{rows}}
        </tbody>
    </table>

    <div class="total">Total: ${{total}}</div>
</body>
</html>

2. Implement the Puppeteer Service
#

Create src/htmlService.js.

Crucial Performance Tip: Launching a browser is expensive. In production, you should maintain a pool of browser instances or a singleton browser instance, rather than launching a new browser for every request. For this standard implementation, we will use a singleton pattern for the browser.

// src/htmlService.js
const puppeteer = require('puppeteer');
const fs = require('fs');
const path = require('path');

let browserInstance = null;

async function getBrowser() {
    if (!browserInstance) {
        console.log('Launching new browser instance...');
        browserInstance = await puppeteer.launch({
            headless: 'new', // Opt-in to new headless mode
            args: ['--no-sandbox', '--disable-setuid-sandbox'] // Required for Docker environments
        });
    }
    return browserInstance;
}

const generatePdfFromHtml = async (data) => {
    const browser = await getBrowser();
    const page = await browser.newPage();

    // 1. Read Template
    const templatePath = path.join(__dirname, '../templates/invoice.html');
    let html = fs.readFileSync(templatePath, 'utf8');

    // 2. Hydrate Template (Manual string replace for simplicity)
    // In production, use Handlebars/EJS
    const rows = data.items.map(item => `<tr><td>${item.name}</td><td>$${item.price}</td></tr>`).join('');
    
    html = html
        .replace('{{invoiceId}}', data.id)
        .replace('{{date}}', new Date().toISOString().split('T')[0])
        .replace('{{customerName}}', data.customer)
        .replace('{{rows}}', rows)
        .replace('{{total}}', data.items.reduce((acc, item) => acc + item.price, 0));

    // 3. Set Content
    await page.setContent(html, { waitUntil: 'networkidle0' });

    // 4. Generate PDF
    const pdfBuffer = await page.pdf({
        format: 'A4',
        printBackground: true,
        margin: { top: '20px', bottom: '20px' }
    });

    await page.close(); // Only close the page, keep browser alive
    
    return pdfBuffer;
};

module.exports = { generatePdfFromHtml };

Strategy 2: The Performance-First Approach (PDFKit)
#

PDFKit is a PDF generation library for Node that lets you build documents by adding text, images, and vector graphics manually. It does not use a browser. It is incredibly fast and memory-efficient but requires manual coordinate calculation (drawing text at X, Y).

Implement the PDFKit Service
#

Create src/rawService.js. Notice that PDFKit works with Streams. This is excellent for Node.js performance as we can pipe the data directly to the HTTP response without buffering the whole file in RAM.

// src/rawService.js
const PDFDocument = require('pdfkit');

const generatePdfRaw = (data, res) => {
    // Create a document
    const doc = new PDFDocument({ margin: 50 });

    // Pipe the output directly to the HTTP response
    doc.pipe(res);

    // Header
    doc.fontSize(20).text(`INVOICE #${data.id}`, { align: 'left' });
    doc.fontSize(10).text(`Date: ${new Date().toISOString().split('T')[0]}`, { align: 'left' });
    doc.moveDown();

    // Customer info
    doc.text(`Billed to: ${data.customer}`);
    doc.moveDown();

    // Table Header (Manual drawing)
    const tableTop = 150;
    const itemX = 50;
    const costX = 400;

    doc.font('Helvetica-Bold');
    doc.text('Item', itemX, tableTop);
    doc.text('Cost', costX, tableTop);
    
    // Draw a line
    doc.moveTo(itemX, tableTop + 15).lineTo(550, tableTop + 15).stroke();

    // Table Rows
    let y = tableTop + 25;
    doc.font('Helvetica');
    
    let total = 0;
    
    data.items.forEach(item => {
        doc.text(item.name, itemX, y);
        doc.text(`$${item.price}`, costX, y);
        total += item.price;
        y += 20;
    });

    // Total
    doc.moveDown();
    doc.font('Helvetica-Bold').text(`Total: $${total}`, costX, y + 20);

    // Finalize the PDF and end the stream
    doc.end();
};

module.exports = { generatePdfRaw };

Bringing It Together: The Express Server
#

Now, let’s expose these services via an API in src/app.js.

// src/app.js
const express = require('express');
const { generatePdfFromHtml } = require('./htmlService');
const { generatePdfRaw } = require('./rawService');

const app = express();
const PORT = process.env.PORT || 3000;

// Dummy Data Generator
const getInvoiceData = () => ({
    id: Math.floor(Math.random() * 10000),
    customer: 'Acme Corp',
    items: [
        { name: 'Consulting Services', price: 1200 },
        { name: 'Server Maintenance', price: 300 },
        { name: 'Coffee Supply', price: 50 }
    ]
});

// Route 1: Puppeteer (Buffer based)
app.get('/pdf/html', async (req, res) => {
    try {
        const data = getInvoiceData();
        const pdfBuffer = await generatePdfFromHtml(data);

        res.set({
            'Content-Type': 'application/pdf',
            'Content-Length': pdfBuffer.length,
            'Content-Disposition': 'inline; filename="invoice-html.pdf"'
        });
        
        res.send(pdfBuffer);
    } catch (error) {
        console.error(error);
        res.status(500).send('Error generating PDF');
    }
});

// Route 2: PDFKit (Stream based)
app.get('/pdf/raw', (req, res) => {
    const data = getInvoiceData();
    
    res.set({
        'Content-Type': 'application/pdf',
        'Content-Disposition': 'inline; filename="invoice-raw.pdf"'
    });

    generatePdfRaw(data, res);
});

app.listen(PORT, () => {
    console.log(`PDF Service running on http://localhost:${PORT}`);
    console.log(`Test HTML-to-PDF: http://localhost:${PORT}/pdf/html`);
    console.log(`Test Raw PDF:     http://localhost:${PORT}/pdf/raw`);
});

To run the application:

node src/app.js

Comparative Analysis: Which Should You Use?
#

Choosing the right tool is more about constraints than preference. Let’s break down the metrics.

Feature	Puppeteer (HTML-to-PDF)	PDFKit (Programmatic)
Development Speed	🚀 Fast. Use CSS/HTML.	🐢 Slow. Manual coordinates (X, Y).
Styling Capabilities	⭐ Excellent. Supports Flexbox, Grid, CSS.	⚠️ Limited. Basic text and shapes.
Performance (CPU)	🔴 Heavy. Renders a full web page.	🟢 Light. Direct binary writing.
Memory Usage	🔴 High (50MB - 200MB+ per page).	🟢 Low (Streaming buffers).
Output File Size	Larger (embeds fonts/metadata).	Smaller (highly optimized).
Best Use Case	Complex Invoices, Reports with Charts.	High-volume Tickets, Labels, Simple Lists.

The “Hybrid” Recommendation
#

For most modern Node.js applications, I recommend starting with Puppeteer for developer velocity. CSS is simply too powerful to ignore. However, if your service needs to generate thousands of PDFs per minute (e.g., event ticketing), switch to PDFKit to save infrastructure costs.

Best Practices and Production Pitfalls
#

Implementing this locally is easy. Deploying it to production (AWS Lambda, Docker, Kubernetes) brings specific challenges.

1. Dockerizing Puppeteer
#

Puppeteer requires specific system dependencies (libraries for Chromium) that aren’t present in standard Node.js Alpine images.

You should use the official Puppeteer Docker image or install dependencies manually in your Dockerfile:

FROM ghcr.io/puppeteer/puppeteer:21.5.0
# Skip chromium download since we use the installed one
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
    PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable

WORKDIR /usr/src/app
COPY package*.json ./
RUN npm ci
COPY . .
CMD [ "node", "src/app.js" ]

2. Managing Memory Leaks
#

In the Puppeteer example above, we keep the browser instance open (browserInstance). This is great for speed but can lead to memory leaks if Chrome crashes or consumes too much RAM over time.

Solution: Implement a restart strategy.

Close and recreate the browser instance every ~100 requests.
Or use a library like puppeteer-cluster which manages concurrency and browser restarts automatically.

3. Don’t Block the Event Loop
#

Generating a PDF with Puppeteer can take 500ms to several seconds. If you await this in your main Express thread, you are holding open a connection.

Architecture Tip: For heavy loads, do not generate PDFs synchronously in the HTTP request.

Client requests PDF.
Server pushes job to a queue (e.g., BullMQ + Redis).
Server returns 202 Accepted with a jobId.
A background worker processes the PDF and uploads it to S3.
Client polls for completion or receives a Webhook.

Conclusion
#

Building PDF generation services in Node.js requires balancing developer experience with runtime performance.

Use Puppeteer when the visual layout is complex or changes frequently. The ability to use CSS grids and standard HTML templates outweighs the CPU cost for most business applications.
Use PDFKit when you need raw speed, low memory footprint, or are generating simple documents at massive scale.

By following the code provided in this guide, you now have a working foundation for both strategies. As you move to production, remember to containerize carefully and consider offloading generation to background queues to keep your API responsive.

Further Reading:

Happy Coding

Prerequisites and Environment Setup #

Initialize the Project #

Install Dependencies #

Project Structure #

Architectural Overview #

Strategy 1: The Design-First Approach (Puppeteer) #

1. Create the HTML Template #

2. Implement the Puppeteer Service #

Strategy 2: The Performance-First Approach (PDFKit) #

Implement the PDFKit Service #

Bringing It Together: The Express Server #

Comparative Analysis: Which Should You Use? #

The “Hybrid” Recommendation #

Best Practices and Production Pitfalls #

1. Dockerizing Puppeteer #

2. Managing Memory Leaks #

3. Don’t Block the Event Loop #

Conclusion #

Related Articles

The Architect’s Pulse: Engineering Intelligence