Open Redact PDF¶
Open Redact PDF is a browser-first PDF redaction engine implemented in Rust and exposed to browsers through WebAssembly. The project operates on PDF structure instead of flattening pages into images, removes targeted content for a constrained but real subset of PDFs, and preserves unredacted text where the supported subset allows it.
Start here¶
Reference¶
- Rust API
- TypeScript and WASM API
- Canonical target model
- Supported PDF subset and failure model
- Workspace crate map
Design and security¶
Guides¶
Engine Internals¶
Deep technical documentation covering PDF spec concepts, implementation decisions, tradeoffs, and code-level explanations. Start with the reading order guide.
- Reading order for new contributors
- Architecture overview
- PDF primer
- Parsing model
- Object model and serialization
- Graphics state and coordinate systems
- Text system and extraction
- Search geometry and match modeling
- Redaction target model
- Redaction application pipeline
- Writer and deterministic output
- WASM/JS boundary design
- Security and correctness model
- Known limitations
- Glossary
- PDF spec to code map
- Top 10 implementation decisions
Current MVP scope¶
- Unencrypted PDFs, plus Standard Security Handler decryption at V = 1/2 (RC4), V = 4 (AES-128), and V = 5 (AES-256 / R = 5 or R = 6) under either the user or owner password — classic xref tables, PDF 1.5+ cross-reference streams, object streams, and the hybrid
XRefStmform are all handled - Unfiltered or
FlateDecodestreams, including PNG and TIFFDecodeParmspredictors - Deterministic full-document rewrites with FlateDecode-compressed content streams
- Form XObjects traversed for text extraction, search, and copy-on-write redaction (text, vector paint, and Image
Doinvocations inside the Form), with nested Forms handled recursively Type1,TrueType, andType0/Identity-Htext withToUnicode,WinAnsiEncoding,MacRomanEncoding,StandardEncoding, and/Encoding /Differencesdecoding- Rectangle, quad, and quad-group redaction targets in canonical page space
- Three redaction modes:
strip,redact(default), anderase, with optionaloverlayTextlabels inredactmode - Conservative image redaction at invocation level
- Hidden-by-default Optional Content Groups are refused by default; callers can opt in via
sanitizeHiddenOcgs: trueto stripBDC /OC /<name> ... EMCcontent gated by hidden layers before redaction
Fail-explicit design
Unsupported features return an explicit error instead of being silently ignored or producing incorrect output.