[enh] add multi-language subtitle support to yt-dlp extractor#389
Merged
asciimoo merged 1 commit intoasciimoo:masterfrom May 4, 2026
Merged
Conversation
sub_language now accepts: - "auto": download subtitles in the video's original language (via yt-dlp language field) - "de": single language, always download in that language - "de,fr": download only if the video's original language is in the list
Owner
|
Thank you both for the contribution and for being transparent about AI usage. The pull requests looks good to me. I just realized that we do not document extractor specific configuration options anywhere. Would you be open to add |
Owner
|
Please fix the go lint error and it's ready to merge |
Owner
|
@dummylabs doh, sorry the lint error is not even in your code.. My fault, I'm fixing it. Not sure how my local linter let that slip.. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First of all, I really like your project; I use it actively and will follow its development with interest.
Disclaimer
I am not a golang expert; however, I have sufficient experience in other programming languages, so I used Claude Code for AI-assisted development, not vibe-coding. This text was written by a human and verified by an LLM for mistakes.
Problem Statement
If the
sub_languagesetting in the ytdlp extractor is set to a language that does not match the video's default language, the video transcript is not saved in Hister because yt-dlp returns 201 and an empty string.A workaround is to request an auto-translation of captions into the specified language, but in this case YouTube applies stricter checks, and cookies may be required, overwise it may return
429: Too Many Requests. Therefore, in my experiments, the most reliable approach (without extra impersonation or cookie import) was to request captions in the language that matches the video's default language (info.Language).Core Changes
In the ytdlp extractor configuration, the
sub_languageparameter can now take the valueauto, or contain one or more subtitle languages separated by commas.Logic:
autois specified: download subtitles in the default language for this videoenis specified: download subtitles in English or nothingen,frare specified: try downloading subtitles in the listed languages in order of priority, or nothing if none of the specified languages is the video's default.If
enis specified assub_language, the current behavior (before the change) is fully preserved.Changes to the default configuration
Testing
Tested manually on videos with different languages and different values of
sub_language.Added new test fixtures:
sampleJSONTemplateFrfor a non-English language testsampleJSONTemplateNoLangfor a video without a default language (this is the real case in my experience)Four new tests added:
autofr,enautode,es