Run BrowserTemplate fallback browser in headless mode by apetros · Pull Request #2967 · lncrawl/lightnovel-crawler

apetros · 2026-05-13T12:12:57Z

Hit this while running a long --all crawl of readnovelfull.com/godly-stay-home-dad-v1.html. Around chapter 1017 the HTTP scraper started failing (almost certainly Cloudflare kicking in after ~40 minutes of sustained requests). BrowserTemplate._override_scraper_get_soup caught the ScraperErrorGroup and tried to fall back to Selenium-driven Chrome — at which point everything wedged.

Cause: BrowserTemplate.browser constructs the fallback as Browser(cookie_store=self.scraper.cookies) with no headless flag, so it defaults to a visible window. My session is GNOME on Wayland with Xwayland (XAUTHORITY=/run/user/1000/.mutter-Xwaylandauth.XXXXX); the Chrome subprocess Selenium spawns can't authenticate to the X server, prints Authorization required, but no authorization protocol specified, and the driver dies with NoSuchDriverException. The fallback holds self._lock (an EventLock) while Chrome retries with a 120s page-load timeout, so concurrent worker threads pile up and trip TimeoutError('Failed to acquire semaphore') from TaskManager. Throughput collapsed from 2.3 s/chapter to 122 s/chapter and the run was effectively dead.

I couldn't find anywhere in the codebase that actually needs a visible window for the fallback — _override_scraper_get_soup, _override_scraper_get_image, and _override_scraper_get_json all just want the resulting HTML / screenshot / JSON. Forcing headless=True here side-steps the X11 dance and matches what already happens automatically on machines with no display (webdriver/local.py:42-43 flips headless on when Platform.has_display is false — but on Wayland that check returns true via tkinter even when Chrome itself can't reach the server).

One-line fix:

browser = Browser(cookie_store=self.scraper.cookies, headless=True)

Test plan

On a Linux Wayland desktop, re-run a --all crawl long enough to trigger the fallback. Confirm Chrome starts (headless) and the run continues instead of dying with NoSuchDriverException + semaphore timeouts.
Sanity-check that nothing else (sources that explicitly drive the browser) is broken by the forced headless mode.

When the HTTP scraper raises a ScraperErrorGroup, BrowserTemplate falls back to a real Chrome session. It was constructed without a headless flag, so it defaulted to a visible window. There's no codepath in CLI or server runs that actually interacts with that window, and on Wayland/Xwayland sessions the subprocess often can't authenticate to the X server, which kills the fallback entirely.

dipu-bd · 2026-05-13T13:05:28Z

soon the entire browser will be replaced by https://github.com/ultrafunkamsterdam/nodriver

dipu-bd merged commit a92013b into lncrawl:dev May 13, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run BrowserTemplate fallback browser in headless mode#2967

Run BrowserTemplate fallback browser in headless mode#2967
dipu-bd merged 1 commit into
lncrawl:devfrom
apetros:fix/headless-browser-fallback

apetros commented May 13, 2026

Uh oh!

Uh oh!

dipu-bd commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

apetros commented May 13, 2026

Test plan

Uh oh!

Uh oh!

dipu-bd commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants