Supported PDF Subset¶
This project intentionally targets a narrow, explicit MVP.
Supported now¶
- Unencrypted PDFs
- Classic xref tables, including incremental update chains (multiple xref sections linked via
Prev) - Full-document rewrites on save (incremental updates are flattened into a single revision on output)
- Unfiltered or
FlateDecodestreams - Page tree traversal with inherited resources, media boxes, crop boxes, and page rotation
- Inline images (
BI/ID/EI) are safely skipped during content stream parsing - Dictionary operands in content streams (e.g.,
BDCwith<</MCID 0>>) - Common text operators
- Common path, paint, and graphics-state operators (
q,Q,cm,gs,w,J,j,M,d,ri,i) - Clipping path operators (
W,W*) - Color operators for device and general color spaces (
rg/RG,g/G,k/K,cs/CS,sc/SC,scn/SCN) - Curve segment operators (
v,y) - Marked-content operators as safe pass-through (
BMC,BDC,EMC,MP,DP) - ExtGState font entries (fonts set via
gsoperator) - Image XObject invocation detection
Type1andTrueTypefonts in the current text pathType0withIdentity-H, two-byte CIDs, andToUnicodemaps- Rectangle, quad, and quad-group redaction targets
- Three redaction modes:
strip(remove bytes),redact(blank space + overlay),erase(blank space, no overlay) - Metadata stripping for supported document layouts
- Attachment stripping for supported embedded-file layouts
Explicitly unsupported or incomplete¶
- Encrypted PDFs
- Object streams and xref streams
- Incremental update preservation (output is always a flat rewrite)
- Form XObjects on targeted pages
- Type3 fonts
- broad CID font support beyond the current
Identity-Hpath - partial image rewriting
- optional-content group sanitization
- overlay text stamping
Failure model¶
When unsupported content affects correctness, the engine returns an explicit error instead of pretending to succeed.
Typical behavior:
- unsupported operators on redacted pages return
Unsupported(the operator allow-list covers common text, path, color, clipping, graphics-state, and marked-content operators) - unimplemented plan options return
UnsupportedOption - malformed structure returns parse or corruption errors
Security posture¶
This subset is intentionally conservative. If the engine cannot safely rewrite the targeted content, it should fail rather than emit a misleading "sanitized" PDF.