WASM and JavaScript Boundary Design¶
1. Architecture¶
Rust crates (open_redact_pdf, pdf_*)
↓ wasm-pack --target bundler
pdf_wasm crate (wasm_api.rs)
↓ wasm-bindgen generates
packages/ts-sdk/vendor/pdf-wasm/ (JS glue + .wasm binary)
↓ dynamic import()
packages/ts-sdk/src/index.ts
↓ workspace dependency
apps/demo-web/
2. PdfHandle and interior mutability¶
PdfHandle wraps RefCell<PdfDocument>. Why RefCell?
wasm_bindgenexports take&PdfHandle(shared reference)- But
apply_redactionsneeds&mutaccess to modify the document RefCellprovides runtime-checked interior mutability- This is safe because JS is single-threaded — no concurrent borrows are possible
3. Serde boundary¶
- Input (JS → Rust):
serde_wasm_bindgen::from_value::<RedactionPlan>(plan)deserializes a JS object into a typed Rust struct - Output (Rust → JS):
serde_wasm_bindgen::to_value(...)serializes Rust structs to JS objects - Direct types:
Vec<u8>becomesUint8Array,usizebecomes JS number,Stringstays string
4. Error handling¶
All errors become JS strings via PdfError::to_string(). There are no structured error objects on the JS side. The TS SDK does not wrap these — they propagate as exceptions thrown from the WASM call.
5. TS SDK normalization layer¶
The WASM layer returns snake_case keys. The TS SDK normalizes these to camelCase:
| WASM key | TS SDK key |
|---|---|
page_index |
pageIndex |
char_start |
charStart |
text_glyphs_removed |
textGlyphsRemoved |
{ points: [...] } quad shape |
[Point, Point, Point, Point] |
6. Memory management¶
freePdf(handle)calls the generated.free()method to release WASM heap memorywasm-bindgengenerates aFinalizationRegistryfallback for GC-driven cleanup- Callers should explicitly call
freePdf— GC timing is unpredictable and the WASM heap is not managed by the JS GC
7. PdfHandle as opaque branded type¶
In TypeScript: type PdfHandle = { readonly __brand: "PdfHandle" }. Callers cannot construct one directly. This prevents accidental misuse such as passing an arbitrary object to SDK functions that expect a loaded PDF handle.
8. No caching¶
extract_text and search_text re-run full analysis on each call. Each call performs: parse content stream → load fonts → interpret operators → decode glyphs. This is a conscious MVP simplicity choice. Caching would require invalidation logic after apply_redactions modifies the document.
9. Why it was coded this way¶
- Bundler target (not
web/nodejs): works with Vite, Webpack, and Rollup without extra configuration RefCelloverMutex: simpler, and JS is single-threaded soMutexadds overhead with no benefit- Branded type: type safety at the TS layer without any runtime cost
- No caching: correctness over performance for MVP; extraction results would be stale after redaction
10. What would break¶
| Change | Consequence |
|---|---|
Mutex instead of RefCell |
Unnecessary overhead; potential deadlock if a panic occurs inside a lock |
| Not normalizing snake_case | TS callers receive wrong property names; all field accesses return undefined |
Not calling freePdf |
Memory leak; WASM heap grows unbounded across PDF loads |