livekeet

Real-time audio transcription to markdown. Optimized for Apple Silicon.

Uses NVIDIA's Parakeet speech recognition model via MLX for fast, accurate, on-device transcription.

Features

Real-time transcription with voice activity detection (VAD)
Speaker detection - automatically labels who's speaking in calls
Speaker diarization - identify multiple speakers on the same audio channel
Multiple speaker names - comma-separated --with "Tom,Bob,Alice" with auto-diarization
System audio capture for transcribing calls, meetings, videos
Microphone capture for voice notes and dictation
Markdown output with timestamps and speaker labels
Speaker relabeling - interactively rename speakers after recording
Configurable speaker names, output location, filename patterns, diarization default

Requirements

macOS 13+ (Ventura or later)
Apple Silicon (M1/M2/M3/M4)
Python 3.12+

Installation

uv tool install git+https://github.com/LucaDeLeo/livekeet.git

The audio capture binary is downloaded automatically on first run.

From source (for development)

git clone https://github.com/LucaDeLeo/livekeet.git
cd livekeet
uv sync && make build
uv run livekeet

Screen Recording Permission

For system audio capture, macOS requires Screen Recording permission. On first run, you'll be prompted to grant this in System Settings > Privacy & Security > Screen Recording.

Usage

# Start transcribing (system audio + mic, outputs to timestamped file)
livekeet

# Output to specific file
livekeet meeting.md

# Output into a directory (uses config filename pattern)
livekeet meetings/

# Name the other speaker (for 1:1 calls)
livekeet --with "John"

# Multiple speakers (auto-enables diarization)
livekeet --with "Tom,Bob,Alice"

# Multilingual transcription
livekeet --multilingual

# Mic only (no system audio)
livekeet --mic-only

# List audio devices
livekeet --devices

# Use specific audio device
livekeet --device "BlackHole 2ch"

# Identify multiple speakers per channel (e.g. group calls)
livekeet --diarize

# Show periodic status updates
livekeet --status

# Rename speakers in an existing transcript
livekeet relabel meeting.md

Flags

--with, -w Other speaker name(s), comma-separated (enables diarization when multiple)
--mic-only, -m Record microphone only (no system audio)
--multilingual Use the multilingual model (parakeet-tdt-0.6b-v3)
--model Choose a model explicitly
--diarize Identify individual speakers per audio channel
--device, -d Select input device by number or name (mic-only)
--devices List available audio input devices
relabel <file> Interactively rename speakers in a transcript
init Create the default config file (same as --init)
--config Show the config file location
--status Show periodic status updates while recording Note: --multilingual overrides --model when both are set.

Configuration

Create a config file to customize defaults:

livekeet init

This creates ~/.config/livekeet/config.toml:

[output]
# Directory for transcripts (empty = current directory)
directory = ""
# Filename pattern: {date}, {time}, {datetime}
filename = "{datetime}.md"

[speaker]
# Your name in transcripts
name = "Me"

[defaults]
# Available models (downloaded automatically on first use):
#   mlx-community/parakeet-tdt-0.6b-v2  - Fast, English only (default)
#   mlx-community/parakeet-tdt-0.6b-v3  - Fast, multilingual
model = "mlx-community/parakeet-tdt-0.6b-v2"
# diarize = false

Set diarize = true under [defaults] to always enable speaker diarization without the --diarize flag.

Output Patterns

The filename pattern supports these variables:

{date} - Current date (2024-01-15)
{time} - Current time (14-30-25)
{datetime} - Combined (2024-01-15-143025)

Examples:

"{datetime}.md" → 2024-01-15-143025.md
"{date}-meeting.md" → 2024-01-15-meeting.md
"transcript.md" → transcript.md (auto-suffixes if exists) If the resolved filename already exists, livekeet will save to name-2.md, name-3.md, and so on.

How It Works

Audio Capture: Uses a Swift tool with ScreenCaptureKit to capture system audio and microphone as separate stereo channels
Voice Activity Detection: WebRTC VAD detects when speech starts/stops
Speaker Detection: Labels by channel (mic = you, system = other). With --diarize, extracts speaker embeddings to identify individuals within each channel.
Transcription: When speech ends, audio is transcribed using Parakeet-MLX
Output: Text is written to markdown with timestamps and speaker labels

Speaker Detection

When using system audio capture (the default), livekeet labels speakers by audio channel: mic = you, system = other.

With --diarize, livekeet uses speaker embeddings (WeSpeaker ResNet34 via MLX) to identify individual speakers within each channel. This is useful for group calls where multiple remote participants share the system audio channel.

Diarization is enabled automatically when:

--diarize flag is passed
diarize = true is set in config
Multiple names are given via --with "Tom,Bob"

Default mode

Microphone  ────▶  Your name ("Me")
System Audio ────▶  Other speaker ("Other" or --with name)

Diarize mode

Microphone   ────▶  Me, Local 2, Local 3, ...
System Audio ────▶  Other, Remote 2, Remote 3, ...

When using --with "Tom,Bob,Alice", the speakers are named in order:

System Audio ────▶  Tom, Bob, Alice, Remote 4, ...

The first speaker on each channel gets the primary name. Additional speakers use the provided names, then fall back to numbered labels. The embedding model (~25MB) is downloaded on first use.

Example Output

[14:32:15] **Me**: Hey, how's the project going?
[14:32:22] **John**: Pretty good, we just finished the API integration.
[14:32:28] **Alice**: I have some questions about the endpoint naming.
[14:32:35] **Me**: Sure, let's discuss.

Configuration

Set your name: edit ~/.config/livekeet/config.toml (run livekeet --init first)
Set other speaker: livekeet --with "John"

Limitations

Diarization works best on system audio where each speaker has a clean direct signal
Mic-captured audio from external sources (e.g. phone speakers) may not separate reliably due to shared acoustic characteristics
Works best with clear turn-taking (one person speaks at a time)

Speaker Relabeling

After a recording, livekeet automatically prompts you to rename generic speaker labels (like "Other", "Remote 2") when multiple speakers were detected. You can also relabel any existing transcript:

livekeet relabel meeting.md

The interactive flow shows sample quotes from each speaker so you can identify who's who:

Speaker "Remote 2" (8 lines):
  > "so the deadline is next Friday"
  > "I'll send the design doc after this"
  > "yeah that works for me"

  (n) Name  (m) More  (s) Skip:

Press n to rename, m to see more quotes, or s to skip. Press Ctrl+C to cancel relabeling.

Models

Model	Size	Speed	Notes
parakeet-tdt-0.6b-v2	600M	Fast	English only (default)
parakeet-tdt-0.6b-v3	600M	Fast	Multilingual (`--multilingual`)

Models are downloaded automatically on first use.

Troubleshooting

"Audio capture tool not found"

Build the Swift audio capture tool:

make build

"Screen Recording permission required"

Go to System Settings > Privacy & Security > Screen Recording and enable your terminal app.

No audio captured

If you see "No audio detected yet", check Screen Recording or microphone permissions.

Check audio devices: livekeet --devices
Try mic-only mode: livekeet --mic-only
Ensure the audio source is playing

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
audiocapture		audiocapture
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
diarization.py		diarization.py
diarization_pyannote.py		diarization_pyannote.py
livekeet.py		livekeet.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

livekeet

Features

Requirements

Installation

From source (for development)

Screen Recording Permission

Usage

Configuration

Output Patterns

How It Works

Speaker Detection

Default mode

Diarize mode

Example Output

Configuration

Limitations

Speaker Relabeling

Models

Troubleshooting

"Audio capture tool not found"

"Screen Recording permission required"

No audio captured

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

livekeet

Features

Requirements

Installation

From source (for development)

Screen Recording Permission

Usage

Configuration

Output Patterns

How It Works

Speaker Detection

Default mode

Diarize mode

Example Output

Configuration

Limitations

Speaker Relabeling

Models

Troubleshooting

"Audio capture tool not found"

"Screen Recording permission required"

No audio captured

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages