Skip to content

PDF Spec Concepts Mapped to Code

This page is a quick-reference index mapping PDF specification concepts to their implementation locations in this repository. Use it when reading the spec alongside the source, or when tracing a bug back to the relevant code.

Spec references use ISO 32000-1:2008 section numbers.


Concept Map

PDF Concept Spec Reference Repository Location Key Types / Functions
File structure ISO 32000 7.5 crates/pdf_objects/src/parser.rs parse_pdf, parse_header, find_startxref
Cross-reference table ISO 32000 7.5.4 crates/pdf_objects/src/parser.rs parse_xref_table, parse_xref_section, XrefEntry
Incremental updates ISO 32000 7.5.6 crates/pdf_objects/src/parser.rs parse_xref_table (Prev chain loop)
Indirect objects ISO 32000 7.3.10 crates/pdf_objects/src/types.rs ObjectRef, PdfObject, PdfFile::get_object
Object types ISO 32000 7.3 crates/pdf_objects/src/types.rs PdfValue enum
Stream objects ISO 32000 7.3.8 crates/pdf_objects/src/types.rs, stream.rs PdfStream, decode_stream
FlateDecode ISO 32000 7.4.4 crates/pdf_objects/src/stream.rs decode_stream (inflate)
Document catalog ISO 32000 7.7.2 crates/pdf_objects/src/document.rs DocumentCatalog, build_document
Page tree ISO 32000 7.7.3 crates/pdf_objects/src/document.rs collect_pages, PageInfo
Page objects ISO 32000 7.7.3.3 crates/pdf_objects/src/document.rs PageInfo { page_ref, resources, page_box, content_refs }
MediaBox / CropBox ISO 32000 7.7.3.3 crates/pdf_graphics/src/geometry.rs PageBox { media_box, crop_box, rotate }
Coordinate systems ISO 32000 8.3 crates/pdf_graphics/src/geometry.rs Matrix, transform_point, PageBox::normalized_transform
CTM ISO 32000 8.3.4 crates/pdf_text/src/text.rs, crates/pdf_redact/src/redact.rs ctm variable, cm operator handler
cm operator ISO 32000 8.4.4 crates/pdf_text/src/text.rs cm handler: ctm = matrix.multiply(ctm)
Graphics state stack ISO 32000 8.4.2 crates/pdf_text/src/text.rs q/Q handlers, ctm_stack
Content streams ISO 32000 7.8.2 crates/pdf_content/src/content.rs parse_content_stream, Operation
Text state ISO 32000 9.3 crates/pdf_text/src/text.rs RuntimeTextState
Text-showing operators ISO 32000 9.4.3 crates/pdf_text/src/text.rs show_text function
BT/ET ISO 32000 9.4.1 crates/pdf_text/src/text.rs BT handler (resets matrices only)
Text matrix (Tm) ISO 32000 9.4.2 crates/pdf_text/src/text.rs text_state.text_matrix
Text rendering mode ISO 32000 9.3.6 crates/pdf_text/src/text.rs text_state.text_render_mode, visible flag
Glyph positioning ISO 32000 9.4.4 crates/pdf_text/src/text.rs show_text advance computation
TJ operator ISO 32000 9.4.3 crates/pdf_text/src/text.rs, crates/pdf_redact/src/redact.rs TJ handler, build_compensated_array
Simple fonts ISO 32000 9.6 crates/pdf_text/src/text.rs SimpleFont, load_single_font
Type0 fonts ISO 32000 9.7 crates/pdf_text/src/text.rs CompositeFont, load_composite_font
ToUnicode ISO 32000 9.10.3 crates/pdf_text/src/text.rs parse_to_unicode_cmap
CID widths ISO 32000 9.7.4.3 crates/pdf_text/src/text.rs parse_cid_widths
ExtGState ISO 32000 8.4.5 crates/pdf_text/src/text.rs ExtGStateFontMap, gs operator handler
Inline images ISO 32000 8.9.7 crates/pdf_content/src/content.rs skip_inline_image
XObjects ISO 32000 8.8 crates/pdf_redact/src/redact.rs load_xobjects, XObjectKind
Annotations ISO 32000 12.5 crates/pdf_redact/src/redact.rs remove_annotations
Path construction ISO 32000 8.5.2 crates/pdf_content/src/content.rs PathSegment enum
Path painting ISO 32000 8.5.3 crates/pdf_content/src/content.rs PaintOperator enum
Marked content ISO 32000 14.6 crates/pdf_content/src/content.rs BMC/BDC/EMC passthrough

Concepts This Engine Does NOT Implement

The following PDF features are explicitly out of scope. Encountering any of these in a document will result in an explicit error or silent passthrough, depending on the subsystem. No partial or best-effort support is provided.

  • Encryption (ISO 32000 7.6) — standard security handler, public-key security handler, and crypt filters are all unsupported. Any Encrypt key in the trailer causes rejection.
  • Digital signatures (ISO 32000 12.8) — signature fields and signature dictionaries are not validated or preserved in any meaningful way.
  • Optional content / layers (ISO 32000 8.11) — optional content groups (OCGs) and optional content membership dictionaries (OCMDs) are not tracked. Visibility state is ignored.
  • JavaScript actions (ISO 32000 12.6.4.16) — no JavaScript engine is present. JavaScript actions are passed through unexamined.
  • Forms / AcroForms (ISO 32000 12.7) — interactive form fields, widget annotations, and form data are not extracted, filled, or redacted at the field level.
  • Multimedia (ISO 32000 13) — sound, video, 3D, and rich media annotations are not interpreted.
  • Transparency and blending (ISO 32000 11) — transparency groups, soft masks, and blend modes are parsed as content operators but their effect on rendering is not modeled.
  • Type3 fonts (ISO 32000 9.6.5) — glyph procedures defined as content streams are not executed.
  • CFF / OpenType fonts — CFF (Compact Font Format) embedded font programs are not parsed for advance widths or outlines.
  • Pattern and shading fills (ISO 32000 8.7) — tiling patterns and shading patterns are passed through uninterpreted; their geometry is not included in path bounds calculations.