Affects: Docling 2.92.0 (Docling Core 2.74.1, IBM Models 3.13.2,
Parse 5.10.1) on Python 3.10, Linux, NVIDIA CUDA.
TL;DR
Running Docling on a small (1.2 KB) single-page PDF that contains an
FDA electronic-signature manifestation page produces a 0-byte
markdown file while Docling logs "Processed 1 docs, of which 0 failed" and exits with returncode 0. The success log is misleading:
no markdown content is actually written.
pdftotext and pikepdf both render the page's text fine, so the
input is well-formed.
Source
The bug was triggered by page 46 of an FDA Drug Approval review
package fetched from:
https://www.accessdata.fda.gov/drugsatfda_docs/nda/2007/016608s098,020712s029,021710_ClinRev.pdf
(FDA NDA 016608 / 020712 / 021710 Clinical Review, 46 pages,
1,014,458 bytes)
Page 46 is the standard FDA "electronic-signature manifestation"
page that closes many review documents.
We extracted page 46 alone via pikepdf (preserving original bytes,
no re-encoding) so the reproducer is minimal:
import pikepdf
with pikepdf.open("016608s098,020712s029,021710_ClinRev.pdf") as src, \
pikepdf.new() as out:
out.pages.append(src.pages[45]) # page 46, 0-indexed
out.save("input.pdf")
The resulting input.pdf (attached, 1,228 bytes) is the reproducer.
Reproducer
Easiest: pip-installed Docling
pip install docling==2.92.0
mkdir -p in out
cp input.pdf in/
docling in/ --to md --output out --device cuda --no-abort-on-error
# OR for the same flags our production worker uses:
docling in/ --to md --output out --device cuda --num-threads 4 \
--no-abort-on-error --image-export-mode placeholder
ls -la out/
# Expected: a non-empty .md file
# Actual: a 0-byte .md file
The bug reproduces with or without --no-abort-on-error and with or
without --image-export-mode placeholder. The minimal flag set that
triggers it is --device cuda; CPU-only runs may behave differently
(we have not tested CPU-only since our production deployment is GPU).
Docker (minimal stock image, no custom build)
# One-liner: build an ad-hoc image with docling 2.92.0 from pip and run it.
docker build -t docling-bug-test - <<'DOCKERFILE'
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04
RUN apt-get update -qq && apt-get install -y -qq python3-pip \
&& rm -rf /var/lib/apt/lists/*
RUN pip3 install --no-cache-dir docling==2.92.0
ENTRYPOINT ["docling"]
DOCKERFILE
mkdir -p in out
cp input.pdf in/
docker run --rm --gpus all -v "$(pwd):/data" docling-bug-test \
/data/in --to md --output /data/out \
--device cuda --num-threads 4 --no-abort-on-error \
--image-export-mode placeholder
ls -la out/
This recipe deliberately uses a stock CUDA image and pip install docling==2.92.0 — no project-specific Dockerfile or pre-cached models,
so any reader can reproduce on a GPU host with Docker + nvidia-toolkit.
First run takes ~2 minutes to build; subsequent runs are instant.
(The bug was originally observed inside our own pre-built image
docling-gpu:latest — a CUDA 12.4 base with Docling 2.92.0 and all
its models pre-cached for cold-start speed — but the bug is in
Docling itself, not in our packaging. The pip-based reproducer above
runs the same Docling 2.92.0 from upstream.)
Observed behavior
Verbose output (-vv):
2026-05-08 12:01:21 INFO docling.pipeline.base_pipeline: Processing document <sha>.pdf
2026-05-08 12:01:21 DEBUG docling.pipeline.standard_pdf_pipeline: PIPELINE_PROFILING Stage preprocess: ... duration=0.049s
2026-05-08 12:01:21 DEBUG docling.pipeline.standard_pdf_pipeline: PIPELINE_PROFILING Stage ocr: ... duration=0.124s
2026-05-08 12:01:23 DEBUG docling.pipeline.standard_pdf_pipeline: PIPELINE_PROFILING Stage layout: ... duration=1.175s
2026-05-08 12:01:23 DEBUG docling.pipeline.standard_pdf_pipeline: PIPELINE_PROFILING Stage table: ... duration=0.000s
2026-05-08 12:01:23 DEBUG docling.pipeline.standard_pdf_pipeline: PIPELINE_PROFILING Stage assemble: ... duration=0.000s
2026-05-08 12:01:23 INFO docling.document_converter: Finished converting document <sha>.pdf in 3.88 sec.
2026-05-08 12:01:23 INFO docling.cli.main: writing Markdown output to /data/out/<sha>.md
2026-05-08 12:01:23 INFO docling.cli.main: Processed 1 docs, of which 0 failed
2026-05-08 12:01:23 INFO docling.cli.main: All documents were converted in 3.89 seconds.
$ ls -l /data/out/<sha>.md
-rw-r--r-- 1 bill bill 0 May 8 12:21 <sha>.md
The full pipeline ran (preprocess → ocr → layout → table → assemble),
no exception was raised, the output file was created — but it's
0 bytes.
Expected behavior
Either:
- Docling produces non-empty markdown for this page (the input
has extractable content — pdftotext returns a clean record
below), OR
- Docling surfaces an error if it cannot render the page — the
silent 0-byte success is the worst-of-both-worlds failure mode (no
error to act on, but no usable output either, and a downstream
pipeline trusting the "0 failed" signal will silently ingest empty
content).
pdftotext output for the same PDF (proves the input is renderable):
--------------------------------------------------------------------------------------------------------------------This is a representation of an electronic record that was signed electronically and
this page is the manifestation of the electronic signature.
--------------------------------------------------------------------------------------------------------------------/s/
--------------------Ronald Farkas
10/11/2007 05:28:47 PM
MEDICAL OFFICER
John Feeney
10/26/2007 10:34:59 PM
MEDICAL OFFICER
Concur
What's in doc17189_p46/
| File |
Description |
input.pdf |
The minimal 1,228-byte 1-page PDF reproducer. Also re-derivable from the FDA URL above by extracting page 46. |
command.sh |
Self-contained reproducer script — works on any GPU host with Docker + NVIDIA toolkit; builds a stock docling image and runs the failing case. Exits non-zero with a clear message if the bug reproduces (.md is 0 bytes). |
info.json |
Machine-readable provenance: source FDA URL, source doc, observed behavior, expected behavior |
stderr.log |
Full stderr of a verbose-mode (-vv) Docling run on the input |
stdout.log |
Stdout (empty — Docling logs to stderr) |
empty_output.md |
The 0-byte .md Docling produced |
Environment where originally observed
- Image:
docling-gpu:latest (custom local build — CUDA 12.4 base
with Docling 2.92.0 and all its models pre-cached, equivalent to
the pip-based reproducer above)
- Host: 31 GiB system RAM, 8 GiB GPU (NVIDIA), AMD Ryzen Threadripper PRO 3955WX
- Docker memory limit:
--memory 12g --memory-swap 12g (well under
what Docling needs for full-document conversions; not a memory
issue — bug also reproduces with no memory cap)
- Docling CLI flags:
--device cuda --num-threads 4 --no-abort-on-error --image-export-mode placeholder
docling-reprex-doc17189_p46.tar.gz
Affects: Docling 2.92.0 (Docling Core 2.74.1, IBM Models 3.13.2,
Parse 5.10.1) on Python 3.10, Linux, NVIDIA CUDA.
TL;DR
Running Docling on a small (1.2 KB) single-page PDF that contains an
FDA electronic-signature manifestation page produces a 0-byte
markdown file while Docling logs
"Processed 1 docs, of which 0 failed"and exits with returncode0. The success log is misleading:no markdown content is actually written.
pdftotextandpikepdfboth render the page's text fine, so theinput is well-formed.
Source
The bug was triggered by page 46 of an FDA Drug Approval review
package fetched from:
Page 46 is the standard FDA "electronic-signature manifestation"
page that closes many review documents.
We extracted page 46 alone via
pikepdf(preserving original bytes,no re-encoding) so the reproducer is minimal:
The resulting
input.pdf(attached, 1,228 bytes) is the reproducer.Reproducer
Easiest: pip-installed Docling
The bug reproduces with or without
--no-abort-on-errorand with orwithout
--image-export-mode placeholder. The minimal flag set thattriggers it is
--device cuda; CPU-only runs may behave differently(we have not tested CPU-only since our production deployment is GPU).
Docker (minimal stock image, no custom build)
This recipe deliberately uses a stock CUDA image and
pip install docling==2.92.0— no project-specific Dockerfile or pre-cached models,so any reader can reproduce on a GPU host with Docker + nvidia-toolkit.
First run takes ~2 minutes to build; subsequent runs are instant.
(The bug was originally observed inside our own pre-built image
docling-gpu:latest— a CUDA 12.4 base with Docling 2.92.0 and allits models pre-cached for cold-start speed — but the bug is in
Docling itself, not in our packaging. The pip-based reproducer above
runs the same Docling 2.92.0 from upstream.)
Observed behavior
Verbose output (
-vv):The full pipeline ran (preprocess → ocr → layout → table → assemble),
no exception was raised, the output file was created — but it's
0 bytes.
Expected behavior
Either:
has extractable content —
pdftotextreturns a clean recordbelow), OR
silent 0-byte success is the worst-of-both-worlds failure mode (no
error to act on, but no usable output either, and a downstream
pipeline trusting the "0 failed" signal will silently ingest empty
content).
pdftotextoutput for the same PDF (proves the input is renderable):What's in
doc17189_p46/input.pdfcommand.shinfo.jsonstderr.log-vv) Docling run on the inputstdout.logempty_output.md.mdDocling producedEnvironment where originally observed
docling-gpu:latest(custom local build — CUDA 12.4 basewith Docling 2.92.0 and all its models pre-cached, equivalent to
the pip-based reproducer above)
--memory 12g --memory-swap 12g(well underwhat Docling needs for full-document conversions; not a memory
issue — bug also reproduces with no memory cap)
--device cuda --num-threads 4 --no-abort-on-error --image-export-mode placeholderdocling-reprex-doc17189_p46.tar.gz