Fire-PDF Overview
What is Fire-PDF?
Section titled “What is Fire-PDF?”Fire-PDF is a Rust-based PDF parsing engine designed to eliminate the typical tradeoff between speed and accuracy.
It converts any PDF — scanned, text-based, or mixed — into structured Markdown with:
- Correct reading order
- Preserved tables (as Markdown tables)
- Preserved formulas (as LaTeX)
- Proper handling of multi-column layouts
Why it’s fast
Section titled “Why it’s fast”Fire-PDF’s performance comes from making GPU usage conditional instead of mandatory:
- Text-based pages use native extraction and never touch GPU.
- Only scanned or image-heavy pages go through a neural layout model and OCR.
Millisecond page classification with pdf-inspector
Section titled “Millisecond page classification with pdf-inspector”pdf-inspector is an open-source Rust library that classifies pages by analyzing PDF internals (font encodings, text operators, image coverage) in milliseconds, without rendering.
For mixed documents, this avoids GPU processing for the majority of pages, which directly translates into lower latency and lower cost.
Layout-aware accuracy
Section titled “Layout-aware accuracy”For complex documents, speed alone isn’t enough. Fire-PDF uses a neural document layout model to detect regions like text blocks, tables, formulas, images, headers, and footers, then processes each region with tuned parameters.
Typical strategy highlights:
- Tables get higher budgets to produce accurate Markdown tables.
- Formulas are preserved as LaTeX.
- Reading order is predicted neurally with an XY-cut fallback for multi-column layouts.
The five-stage pipeline
Section titled “The five-stage pipeline”- Classify — scan the PDF’s internal structure and classify each page as text-based or needing OCR
- Render — render OCR pages at 200 DPI, automatically capping or slicing oversized pages
- Layout detection — run the neural layout model on GPU to get bounding boxes, region types, and reading order
- Extraction — native extraction for text pages; OCR regions are handled by a vision-language model (GLM-OCR)
- Assembly — sort by reading order and assemble Markdown; tables become Markdown tables; formulas stay LaTeX; geometric deduplication removes overlaps