- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 497
[YouTube] Add support for extracting auto-translated captions #997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
        
          
                ...in/java/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeStreamExtractor.java
          
            Show resolved
            Hide resolved
        
              
          
                extractor/src/main/java/org/schabi/newpipe/extractor/stream/SubtitlesStream.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                extractor/src/main/java/org/schabi/newpipe/extractor/stream/SubtitlesStream.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                extractor/src/main/java/org/schabi/newpipe/extractor/stream/SubtitlesStream.java
          
            Show resolved
            Hide resolved
        
      | .setAutoGenerated(isAutoGenerated) | ||
| .setAutoTranslated(false) | ||
| .build()); | ||
| if (i == 0 && caption.getBoolean("isTranslatable") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not base the extraction on the index, but rather on whether the subtitles are auto-generated:
| if (i == 0 && caption.getBoolean("isTranslatable") | |
| if (isAutoGenerated && caption.getBoolean("isTranslatable") | 
Also, this PR doesn't add support of subtitles translation for uploaded subtitles. For instance, see https://www.youtube.com/watch?v=_cMxraX_5RE: you can translate from German to French and from English to French, and the translations are different.
We may need another property in SubtitlesStream for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we should use isAutoGenerated here. For better quality, it should be !isAutoGenerated. Manually added captions should be exact.
I was also wondering whether we should provide the auto-translated captions by default. Extracting the data for and generating ~100 SubtitleStreams takes some time. I'd definitely not recommend to do this for all available languages by default. On the other hand, we could provide a method which does this when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to extract all available subtitles, but made sure to speed up the process. It's up to the frontends to filter the subtitles.
        
          
                ...in/java/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeStreamExtractor.java
          
            Show resolved
            Hide resolved
        
      2bcc0a9    to
    efce384      
    Compare
  
    | What happened to this? | 
Closes #977 Based on and adresses TeamNewPipe/NewPipe#8023
Faster and ordered: captions provided by the user are at the beginning of the list, auto-translated captions are at the end
efce384    to
    9730de2      
    Compare
  
    
Extract auto-translated captions for YouTube videos.
API changes 🟢
SubtitlesStreamThis adds
isAutoTranslated()next toisAutoGenerated()to distinguish between auto-generated subtitles which use speech2text and auto-translated captions based on Google translator.Additionally,
getBaseLocale(),getDisplayBaseLanguageName()andgetBaseLanguageTag()were added to access info on the language which was used for auto-translations.Issues closed by this PR
Closes #977
Based on and adresses TeamNewPipe/NewPipe#8023