Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new-contrib: Audio Whisper API with Local Device Microphones #1271

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions authors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,8 @@ justonf:
name: "Juston Forte"
website: "https://www.linkedin.com/in/justonforte/"
avatar: "https://avatars.githubusercontent.com/u/96567547?s=400&u=08b9757200906ab12e3989b561cff6c4b95a12cb&v=4"

carlkho-minerva:
name: "Carl Vincent Ladres Kho"
website: "https://www.carlkho.com/"
avatar: "https://avatars.githubusercontent.com/u/106736711?v=4"
500 changes: 500 additions & 0 deletions examples/Whisper_transcribe_device_microphone.ipynb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the reader speaks Filipino they can't test this part out – how about translating from a more common second language like Spanish?

Also, an indefinite record makes many notebooks crash – set a 5-10 second limit as well.

# Demo: Transcribe lengthy Filipino speech and translate into English with proper grammar and punctuation
result = transcribe_audio(
    debug=False,
    prompt="Filipino spoken. Proper grammar and punctuation. Skip fillers.",
    timed_recording=False,
    record_seconds=0,
    is_english=False,
)

print("\nTranscription/Translation:", result)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining transcribing and translating here is a bit weird in this function, and also drops the prompt param for translations. (The prompt should be in english for translation and language of choice in a transcription). I'd split this out into two clear helper functions for translate and transcribe.

def process_audio(file_name, is_english=True, prompt=""):
    with open(file_name, "rb") as audio_file:
        if is_english:
            response = client.audio.transcriptions.create(
                model="whisper-1", file=audio_file, prompt=prompt
            )
        else:
            response = client.audio.translations.create(
                model="whisper-1", file=audio_file
            )

        return response.text.strip()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't this this is how we intend for the prompt parameter to be used – looking at our docs, it is more of an example(s) than an instruction.
Screenshot 2024-11-26 at 4 03 42 PM

Large diffs are not rendered by default.

16 changes: 13 additions & 3 deletions registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1251,8 +1251,8 @@
- teomusatoiu
tags:
- moderation


- title: Summarizing Long Documents
path: examples/Summarizing_long_documents.ipynb
date: 2024-04-19
Expand Down Expand Up @@ -1323,4 +1323,14 @@
- maxreid-openai
tags:
- completions
- chatgpt
- chatgpt

- title: Using Whisper API to transcribe text using your Device Microphone
path: examples/Whisper_transcribe_device_microphone.ipynb
date: 2024-07-06
authors:
- carlkho-minerva
tags:
- whisper
- audio
- transcribe