Skip to content

PDF Parsing Modes

Firecrawl supports three PDF parsing modes. The core tradeoff is speed vs. robustness across different PDF types.

ModeBest forStrengthTradeoff
fastText-native PDFsFastWon’t extract from scanned/image-heavy pages
autoDefault for most PDFsText-first with OCR fallbackMore complex but more robust for mixed PDFs
ocrScans, photos, and auto misclassificationMost robustHigher cost and latency
parsers: [{ type: 'pdf', mode: 'ocr', maxPages: 20 }]
parsers: [{ type: 'pdf' }]
  • Start with auto and switch to ocr only for the PDFs that need it
  • Use fast when you know the PDFs have a clean embedded text layer
  • For batch pipelines, sample your PDF distribution first and then choose defaults

Related: