PDF Parsing Modes
Firecrawl supports three PDF parsing modes. The core tradeoff is speed vs. robustness across different PDF types.
Picking the right mode
Section titled “Picking the right mode”| Mode | Best for | Strength | Tradeoff |
|---|---|---|---|
fast | Text-native PDFs | Fast | Won’t extract from scanned/image-heavy pages |
auto | Default for most PDFs | Text-first with OCR fallback | More complex but more robust for mixed PDFs |
ocr | Scans, photos, and auto misclassification | Most robust | Higher cost and latency |
Example: force OCR
Section titled “Example: force OCR”parsers: [{ type: 'pdf', mode: 'ocr', maxPages: 20 }]Example: default behavior
Section titled “Example: default behavior”parsers: [{ type: 'pdf' }]Practical guidance
Section titled “Practical guidance”- Start with
autoand switch toocronly for the PDFs that need it - Use
fastwhen you know the PDFs have a clean embedded text layer - For batch pipelines, sample your PDF distribution first and then choose defaults
Related: