Redaction Workflow¶
End-to-end pipeline¶
- Parse the PDF structure
- Traverse the page tree and normalize page boxes
- Parse page content streams into operations
- Extract glyph geometry and searchable text
- Normalize authoring input into canonical page-space targets
- Remove or neutralize intersecting text, vectors, images, and annotations
- Paint visible redaction fills
- Save a new deterministic PDF
Manual rectangles¶
Manual rectangle authoring is a UI convenience layer. The engine still receives canonical page-space targets, not DOM coordinates.
Search-driven redaction¶
Search works in visual glyph order and returns quad groups. These can be passed directly into apply_redactions.
Redaction modes¶
The mode field on RedactionPlan controls the visual and structural output:
| Mode | Bytes removed | Overlay painted | Surrounding text |
|---|---|---|---|
strip |
yes | no | shifts to fill gap |
redact |
yes (blank space) | yes | stays in place |
erase |
yes (blank space) | no | stays in place |
redact is the default when mode is omitted. The fill color for the overlay defaults to black and can be overridden via fill_color / fillColor.
Apply semantics¶
- text glyphs intersecting a target are removed or replaced according to the selected mode
- intersecting path paints are neutralized
- intersecting image draws are removed conservatively at invocation level
- optional annotation removal can strip intersecting annotation objects from touched pages
Save semantics¶
The writer emits a new PDF with a full save. The output does not rely on hidden references back to the original file.