Skip to content

[enh] add multi-language subtitle support to yt-dlp extractor#389

Merged
asciimoo merged 1 commit intoasciimoo:masterfrom
dummylabs:feat/ytdlp-multilang-subtitles
May 4, 2026
Merged

[enh] add multi-language subtitle support to yt-dlp extractor#389
asciimoo merged 1 commit intoasciimoo:masterfrom
dummylabs:feat/ytdlp-multilang-subtitles

Conversation

@dummylabs
Copy link
Copy Markdown
Contributor

@dummylabs dummylabs commented May 4, 2026

First of all, I really like your project; I use it actively and will follow its development with interest.

Disclaimer

I am not a golang expert; however, I have sufficient experience in other programming languages, so I used Claude Code for AI-assisted development, not vibe-coding. This text was written by a human and verified by an LLM for mistakes.

Problem Statement

If the sub_language setting in the ytdlp extractor is set to a language that does not match the video's default language, the video transcript is not saved in Hister because yt-dlp returns 201 and an empty string.

A workaround is to request an auto-translation of captions into the specified language, but in this case YouTube applies stricter checks, and cookies may be required, overwise it may return 429: Too Many Requests. Therefore, in my experiments, the most reliable approach (without extra impersonation or cookie import) was to request captions in the language that matches the video's default language (info.Language).

Core Changes

In the ytdlp extractor configuration, the sub_language parameter can now take the value auto, or contain one or more subtitle languages separated by commas.
Logic:

  • if auto is specified: download subtitles in the default language for this video
  • if en is specified: download subtitles in English or nothing
  • if en, fr are specified: try downloading subtitles in the listed languages in order of priority, or nothing if none of the specified languages is the video's default.

If en is specified as sub_language, the current behavior (before the change) is fully preserved.

Changes to the default configuration

extractors:
    ytdlp:
        fetch_subtitles: true
        sub_language: "auto"      # detect original language automatically

Testing

Tested manually on videos with different languages and different values of sub_language.

Added new test fixtures:

  • sampleJSONTemplateFr for a non-English language test
  • sampleJSONTemplateNoLang for a video without a default language (this is the real case in my experience)

Four new tests added:

Test Fixture sub_language Expectation
Auto Fr auto subtitles present
MultiLangMatch Fr fr,en subtitles present
AutoNoLanguage NoLang auto no subtitles
MultiLangNoMatch Fr de,es no subtitles

sub_language now accepts:
- "auto": download subtitles in the video's original language (via yt-dlp language field)
- "de": single language, always download in that language
- "de,fr": download only if the video's original language is in the list
@asciimoo
Copy link
Copy Markdown
Owner

asciimoo commented May 4, 2026

Thank you both for the contribution and for being transparent about AI usage.

The pull requests looks good to me.

I just realized that we do not document extractor specific configuration options anywhere. Would you be open to add yt-dlp config options to the extractor documentation (webui/website/src/content/docs/extractors.md) in a follow-up pull request?

@asciimoo
Copy link
Copy Markdown
Owner

asciimoo commented May 4, 2026

Please fix the go lint error and it's ready to merge

@asciimoo
Copy link
Copy Markdown
Owner

asciimoo commented May 4, 2026

@dummylabs doh, sorry the lint error is not even in your code.. My fault, I'm fixing it. Not sure how my local linter let that slip..

Copy link
Copy Markdown
Owner

@asciimoo asciimoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@asciimoo asciimoo merged commit aa6edcb into asciimoo:master May 4, 2026
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants