-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Explanation
We currently use the xml.dom.minidom
stdlib package to work with XMP metadata. This does not completely fulfill the recommendations from Annex H, section H.2, of the PDF 2.0 specification regarding XMP metadata:
Document.toxml()
adds the<?xml version="1.0">
header.- Between
</x:xmpmeta>
and<?xpacket end="w"?>
, provide "white-space padding to permit in-place updating of metadata".
Section H.7.5 has an additional note, but is possibly considered outdated:
Note that applications which fully understand PDF updating do not usually update in-place
Code Example
Exactly like in
Lines 139 to 162 in 0b64266
from pypdf import PdfWriter | |
writer = PdfWriter(clone_from="example.pdf") | |
metadata = writer.xmp_metadata | |
assert metadata # Ensure that it is not `None`. | |
rdf_root = metadata.rdf_root | |
xmp_meta = rdf_root.parentNode | |
xmp_document = xmp_meta.parentNode | |
# Please note that without a text node, the corresponding elements might | |
# be omitted completely. | |
pdfuaid_description = xmp_document.createElement("rdf:Description") | |
pdfuaid_description.setAttribute("rdf:about", "") | |
pdfuaid_description.setAttribute("xmlns:pdfuaid", "http://www.aiim.org/pdfua/ns/id/") | |
pdfuaid_part = xmp_document.createElement("pdfuaid:part") | |
pdfuaid_part_text = xmp_document.createTextNode("1") | |
pdfuaid_part.appendChild(pdfuaid_part_text) | |
pdfuaid_description.appendChild(pdfuaid_part) | |
rdf_root.appendChild(pdfuaid_description) | |
metadata.stream.set_data(xmp_document.toxml().encode("utf-8")) | |
writer.write("output.pdf") |
I currently use monkey-patching, which does not feel right:
def writexml(minidom_writer, indent='', addindent='', newl='', encoding=None, standalone=None):
for _child_node in xmp_document.childNodes:
_child_node.writexml(minidom_writer, indent, addindent, newl)
xmp_document.writexml = writexml
According to section 7.3.2 of the XMP Specifiction Part 1, we might have to use end="r"
to disallow in-place modifications as we leave no whitespace:
The end="w" or end="r" portion shall be used by packet scanning processors to determine whether the XMP may be modified in-place. The end="w" form indicates writable and the end="r" form indicates read-only. The writable or read-only indication should be ignored by all “smart” (not packet scanning) processors.