Fire-PDF Overview

What is Fire-PDF?

Fire-PDF is a Rust-based PDF parsing engine designed to eliminate the typical tradeoff between speed and accuracy.

It converts any PDF — scanned, text-based, or mixed — into structured Markdown with:

Correct reading order
Preserved tables (as Markdown tables)
Preserved formulas (as LaTeX)
Proper handling of multi-column layouts

Why it’s fast

Fire-PDF’s performance comes from making GPU usage conditional instead of mandatory:

Text-based pages use native extraction and never touch GPU.
Only scanned or image-heavy pages go through a neural layout model and OCR.

Millisecond page classification with pdf-inspector

pdf-inspector is an open-source Rust library that classifies pages by analyzing PDF internals (font encodings, text operators, image coverage) in milliseconds, without rendering.

For mixed documents, this avoids GPU processing for the majority of pages, which directly translates into lower latency and lower cost.

Layout-aware accuracy

For complex documents, speed alone isn’t enough. Fire-PDF uses a neural document layout model to detect regions like text blocks, tables, formulas, images, headers, and footers, then processes each region with tuned parameters.

Typical strategy highlights:

Tables get higher budgets to produce accurate Markdown tables.
Formulas are preserved as LaTeX.
Reading order is predicted neurally with an XY-cut fallback for multi-column layouts.

The five-stage pipeline

Classify — scan the PDF’s internal structure and classify each page as text-based or needing OCR
Render — render OCR pages at 200 DPI, automatically capping or slicing oversized pages
Layout detection — run the neural layout model on GPU to get bounding boxes, region types, and reading order
Extraction — native extraction for text pages; OCR regions are handled by a vision-language model (GLM-OCR)
Assembly — sort by reading order and assemble Markdown; tables become Markdown tables; formulas stay LaTeX; geometric deduplication removes overlaps