Skip to content

Text with pypdf2 rendered incorrectly #1570

@raffaem

Description

@raffaem

Describe the bug

Error details

input.pdf

output.pdf

At page 7 of output.pdf, in the bottom left corner, Acrobat Reader renders

Image

Instead of rendering "Version of 2025-09-09", like in the other pages.

The exact artifact and the exact page in which it happens seem to be chaging from run to run.

For instance, in another run, page 7 is fine, but at page 11, at the upper center border, I get

Image

Instead of "WORKING PAPER - PLEASE DO NOT DISTRIBUTE".

In another run at page 19, in the bottom right corner, I get

Image

instead of the page number.

Minimal code

This script takes input.pdf (shared above) and generates output.pdf by:

  1. Inserting "WORKING PAPER - PLEASE DO NOT DISTRIBUTE" in the upper center border
  2. The current date in the bottom left corner
  3. The page number in the bottom right corner

So you need to download input.pdf shared above and put it in the same folder as the Python script.

from datetime import datetime
from contextlib import contextmanager
from fpdf import FPDF, get_scale_factor
from pypdf import PdfWriter, PdfReader
import io
from dataclasses import dataclass
from pathlib import Path
from tqdm import tqdm
import requests, zipfile, io

# Download Aptos font from Microsoft
zip_file_url = "https://download.microsoft.com/download/8/6/0/860a94fa-7feb-44ef-ac79-c072d9113d69/Microsoft%20Aptos%20Fonts.zip"
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("./aptos")
font_path = Path("./aptos/Aptos.ttf")
assert font_path.is_file()

# Combine fpdf2 with pypdf, see https://py-pdf.github.io/fpdf2/CombineWithPypdf.html#combine-with-pypdf
@contextmanager
def add_to_page(reader_page, unit="mm"):
    k = get_scale_factor(unit)
    format = (reader_page.mediabox[2] / k, reader_page.mediabox[3] / k)
    pdf = FPDF(format=format, unit=unit)
    pdf.add_page()
    yield pdf
    page_overlay = PdfReader(io.BytesIO(pdf.output())).pages[0]
    reader_page.merge_page(page2=page_overlay)

# Text object dataclass
@dataclass
class TextObj:
    text: str
    coords: list

# Add text objects to a page
def add_text_objs(text_objs, writer, pagei):
    for text_obj in text_objs:
        with add_to_page(writer.pages[pagei]) as pdf:
            pdf.add_font(family="my_aptos", fname=font_path)
            pdf.set_font(family="my_aptos", size=9)
            pdf.text(x=text_obj.coords[0], y=text_obj.coords[1], text=text_obj.text)

# Convert coordinates in mm
def pdfbox2mm(box):
    return [float(coord) / get_scale_factor("mm") for coord in box]

reader = PdfReader("input.pdf")
writer = PdfWriter()
pages = list(reader.pages)

for pagei, page in tqdm(enumerate(pages), total=len(pages)):

    writer.add_page(page)
    mediabox_mm = pdfbox2mm(page.mediabox)
    anns = list()

    # Add page numbers
    # In pypdf2, (0,0) is the bottom-left corner.
    # In fpdf2, (0,0) is the upper-left corner.
    coords = [mediabox_mm[2] * 0.9, mediabox_mm[3] * 0.96]
    text = f"{pagei + 1} / {len(reader.pages)}"
    ann = TextObj(text=text, coords=coords)
    anns.append(ann)

    # Add compilation time
    coords = [mediabox_mm[2] * 0.05, mediabox_mm[3] * 0.96]
    text = f"Version of {datetime.now().strftime('%Y-%m-%d')}"
    ann = TextObj(text=text, coords=coords)
    anns.append(ann)

    # Add disclaimer
    coords = [mediabox_mm[2] * 0.35, mediabox_mm[3] * 0.05]
    text = f"WORKING PAPER - PLEASE DO NOT DISTRIBUTE"
    ann = TextObj(text=text, coords=coords)
    anns.append(ann)

    add_text_objs(anns, writer, pagei)

writer.write("output.pdf")
print("Output PDF created")

Environment
Please provide the following information:

  • Operating System: Windows
  • Python version: 3.13.3
  • fpdf2 version used: fpdf2==2.8.4, pypdf==5.8.0

Still happens with git version of fpdf2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugquestionresearch neededtoo complicated to implement without careful study of official specifications

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions