Errors & Troubleshooting

Most document parsing failures come from one of three areas: inaccessible inputs, complex document structures, or parameter/resource constraints.

Common issues

1) URL access or permission failures

the URL requires login to download
the site blocks bots
the link is expired or returns 4xx/5xx

Tip: verify the URL downloads in a normal browser. For restricted sources, consider using your own download/proxy step and then parse the file from a public URL.

1.5) Cache-only lookups returning 404

When you set minAge, Firecrawl performs a cache-only lookup and never triggers a fresh scrape. If no cached data exists, it returns 404 with error code SCRAPE_NO_CACHED_DATA.

2) Empty output or missing body text

Often caused by scanned/image-heavy PDFs:

switch the PDF mode to ocr
sample first pages with maxPages before scaling up

See: Scanned PDFs & OCR

3) Broken tables

identify whether the table is text-native or scanned
treat table-heavy PDFs as a separate pipeline and keep regression samples

See: Tables & Formulas

4) 402 / 429 / 500

402: typically indicates credit/billing issues
429: rate limiting (reduce concurrency, increase timeouts, or use batch/queue)
500: server-side error or target-site instability; reproduce with smaller samples and adjust strategy

Recommended debugging flow

start with sampling: maxPages: 5
classify: text-native vs scanned
pick the mode: fast / auto / ocr
codify the strategy: turn “document traits → config” into reusable rules