Canonical Target Model¶
The engine accepts page-space geometry targets, not screen-space UI artifacts.
Coordinate system¶
- Units are PDF user-space units after page normalization
- Coordinates are page-local
- The origin is the normalized page origin
- Page rotation and crop translation are normalized before redaction logic runs
Target types¶
Rectangle targets¶
Useful for drag-based authoring and coarse manual redaction.
Quad targets¶
Useful for text-aligned authoring where a single four-point region is sufficient.
Quad-group targets¶
Useful for:
- multi-line text matches
- discontinuous text
- search and regex results
- future OCR or entity-driven workflows
Normalization rules¶
Normalization is handled by pdf_targets.
The normalization layer:
- validates page indices
- rejects empty or non-intersecting targets
- converts rectangles to canonical quads
- preserves quad groups as independent geometry regions
- computes merged bounds for efficient intersection checks
Why the model is geometry-first¶
The apply pipeline does not need to know whether a target came from:
- a drag interaction
- a text selection
- a search term
- a regex pipeline
- a future OCR pass
Everything is compiled into page-space geometry before redaction is applied.