Skip to content

Errors & Troubleshooting

Most document parsing failures come from one of three areas: inaccessible inputs, complex document structures, or parameter/resource constraints.

  • the URL requires login to download
  • the site blocks bots
  • the link is expired or returns 4xx/5xx

Tip: verify the URL downloads in a normal browser. For restricted sources, consider using your own download/proxy step and then parse the file from a public URL.

When you set minAge, Firecrawl performs a cache-only lookup and never triggers a fresh scrape. If no cached data exists, it returns 404 with error code SCRAPE_NO_CACHED_DATA.

Often caused by scanned/image-heavy PDFs:

  • switch the PDF mode to ocr
  • sample first pages with maxPages before scaling up

See: Scanned PDFs & OCR

  • identify whether the table is text-native or scanned
  • treat table-heavy PDFs as a separate pipeline and keep regression samples

See: Tables & Formulas

  • 402: typically indicates credit/billing issues
  • 429: rate limiting (reduce concurrency, increase timeouts, or use batch/queue)
  • 500: server-side error or target-site instability; reproduce with smaller samples and adjust strategy
  1. start with sampling: maxPages: 5
  2. classify: text-native vs scanned
  3. pick the mode: fast / auto / ocr
  4. codify the strategy: turn “document traits → config” into reusable rules