Errors & Troubleshooting
Most document parsing failures come from one of three areas: inaccessible inputs, complex document structures, or parameter/resource constraints.
Common issues
Section titled “Common issues”1) URL access or permission failures
Section titled “1) URL access or permission failures”- the URL requires login to download
- the site blocks bots
- the link is expired or returns 4xx/5xx
Tip: verify the URL downloads in a normal browser. For restricted sources, consider using your own download/proxy step and then parse the file from a public URL.
1.5) Cache-only lookups returning 404
Section titled “1.5) Cache-only lookups returning 404”When you set minAge, Firecrawl performs a cache-only lookup and never triggers a fresh scrape. If no cached data exists, it returns 404 with error code SCRAPE_NO_CACHED_DATA.
2) Empty output or missing body text
Section titled “2) Empty output or missing body text”Often caused by scanned/image-heavy PDFs:
- switch the PDF mode to
ocr - sample first pages with
maxPagesbefore scaling up
See: Scanned PDFs & OCR
3) Broken tables
Section titled “3) Broken tables”- identify whether the table is text-native or scanned
- treat table-heavy PDFs as a separate pipeline and keep regression samples
See: Tables & Formulas
4) 402 / 429 / 500
Section titled “4) 402 / 429 / 500”- 402: typically indicates credit/billing issues
- 429: rate limiting (reduce concurrency, increase timeouts, or use batch/queue)
- 500: server-side error or target-site instability; reproduce with smaller samples and adjust strategy
Recommended debugging flow
Section titled “Recommended debugging flow”- start with sampling:
maxPages: 5 - classify: text-native vs scanned
- pick the mode:
fast/auto/ocr - codify the strategy: turn “document traits → config” into reusable rules