Skip to content

Update README for modular SDK architecture and remove Kokoro TTS#11

Merged
Blaizzy merged 2 commits intomainfrom
copilot/update-readme-sdk-architecture
Jan 31, 2026
Merged

Update README for modular SDK architecture and remove Kokoro TTS#11
Blaizzy merged 2 commits intomainfrom
copilot/update-readme-sdk-architecture

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 31, 2026

The README documented the old monolithic example app structure and referenced Kokoro TTS, which is no longer part of the SDK. Updated to reflect the new modular architecture introduced in PR #1.

Changes

Removed

  • Kokoro TTS section and all references
  • Manual file copying instructions
  • nSpeak framework mentions
  • Marvis model documentation (superseded by modular architecture)

Added

  • Architecture: Documented 6 modules (Core, Codecs, TTS, STT, STS, UI)
  • Installation: Swift Package Manager integration with selective imports
  • Quick Start: Code examples for TTS, STT, and streaming generation
  • Supported Models: Table listing Soprano, Qwen3, LlamaTTS, GLMASR with HuggingFace repos
  • Advanced Usage: Custom parameters, codec usage, multi-voice selection
  • Requirements: Platform/SDK minimums and Apple Silicon recommendation

Example Usage

The new README guides developers to use the modular SDK:

// Import only what you need
import MLXAudioTTS
import MLXAudioCore

// Models download automatically from HuggingFace
let model = try await SopranoModel.fromPretrained("mlx-community/Soprano-80M-bf16")

// Generate with typed parameters
let audio = try await model.generate(
    text: "Hello from MLX Audio Swift!",
    parameters: GenerateParameters(
        maxTokens: 200,
        temperature: 0.7,
        topP: 0.95
    )
)
Original prompt

Summary

Update the README.md to match the new modular Swift SDK architecture introduced in PR #1 and remove all references to Kokoro TTS.

Changes Required

1. Remove Kokoro TTS References

  • Remove the entire "Kokoro" section (lines 14-22)
  • Remove Kokoro from the list of required files (line 8)
  • Remove the mention of "nSpeak framework is embedeed for Kokoro already" (line 12)

2. Replace with New SDK Documentation

Replace the old README content with comprehensive documentation for the new modular SDK:

Header Section

  • Change title to: # MLX Audio Swift
  • Add tagline: "A modular Swift SDK for audio processing with MLX on Apple Silicon"
  • Add badges for platforms (macOS 14+, iOS 17+)

Architecture Section

Add a section explaining the modular architecture:

## Architecture

MLXAudio follows a modular design allowing you to import only what you need:

- **MLXAudioCore**: Base types, protocols, and utilities
- **MLXAudioCodecs**: Audio codec implementations (SNAC, Vocos, Mimi)
- **MLXAudioTTS**: Text-to-Speech models (Soprano, Qwen3, LlamaTTS)
- **MLXAudioSTT**: Speech-to-Text models (GLMASR, Whisper)
- **MLXAudioSTS**: Speech-to-Speech (future)
- **MLXAudioUI**: SwiftUI components for audio interfaces

Installation Section

Add Swift Package Manager installation instructions:

dependencies: [
    .package(url: "https://github.com/Blaizzy/mlx-audio-swift.git", branch: "main")
]

// Import only what you need
.product(name: "MLXAudioTTS", package: "mlx-audio-swift"),
.product(name: "MLXAudioCore", package: "mlx-audio-swift")

Quick Start Section

Add example code for:

Text-to-Speech:

import MLXAudioTTS
import MLXAudioCore

// Load a TTS model from HuggingFace
let model = try await SopranoModel.fromPretrained("mlx-community/Soprano-80M-bf16")

// Generate audio
let audio = try await model.generate(
    text: "Hello from MLX Audio Swift!",
    parameters: GenerateParameters(
        maxTokens: 200,
        temperature: 0.7,
        topP: 0.95
    )
)

// Save to file
try saveAudioArray(audio, sampleRate: Double(model.sampleRate), to: outputURL)

Speech-to-Text:

import MLXAudioSTT
import MLXAudioCore

// Load audio file
let (sampleRate, audioData) = try loadAudioArray(from: audioURL)

// Load STT model
let model = try await GLMASRModel.fromPretrained("mlx-community/GLM-ASR-Nano-2512-4bit")

// Transcribe
let output = model.generate(audio: audioData)
print(output.text)

Streaming Generation:

for try await event in model.generateStream(text: text, parameters: parameters) {
    switch event {
    case .token(let token):
        print("Generated token: \(token)")
    case .audio(let audio):
        print("Final audio shape: \(audio.shape)")
    case .info(let info):
        print(info.summary)
    }
}

Supported Models Section

Add a table:

| Model | Type | HuggingFace Repo |
|-------|------|------------------|
| Soprano | TTS | mlx-community/Soprano-80M-bf16 |
| Qwen3 | TTS | mlx-community/VyvoTTS-EN-Beta-4bit |
| LlamaTTS (Orpheus) | TTS | mlx-community/orpheus-3b-0.1-ft-bf16 |
| GLMASR | STT | mlx-community/GLM-ASR-Nano-2512-4bit |

Features Section

  • Modular architecture for minimal app size
  • Automatic model downloading from HuggingFace
  • Native async/await support
  • Streaming audio generation
  • Type-safe Swift API
  • Optimized for Apple Silicon with MLX

Advanced Usage Section

Add examples for:

  • Custom generation parameters
  • Audio codec usage (SNAC encoding/decoding)
  • Voice selection for multi-voice models

Requirements

  • macOS 14+ or iOS 17+
  • Apple Silicon (M1 or later) recommended
  • Xcode 15+
  • Swift 5.9+

Examples Section

  • Reference the Examples/VoicesApp directory
  • Link to test files for more usage examples

Credits Section

Add credits:

License

Add license information if available

Implementation Notes

  • Use proper markdown formatting
  • Include code blocks with syntax highlighting
  • Keep the tone professional and developer-friendly
  • Ensure all code examples are accurate based on the actual API from PR MLX-Audio Swift SDK v1  #1
  • Remove ALL mentions of Kokoro TTS

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: Blaizzy <23445657+Blaizzy@users.noreply.github.com>
Copilot AI changed the title [WIP] Update README.md for new modular Swift SDK architecture Update README for modular SDK architecture and remove Kokoro TTS Jan 31, 2026
Copilot AI requested a review from Blaizzy January 31, 2026 19:44
Copy link
Copy Markdown
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Blaizzy Blaizzy marked this pull request as ready for review January 31, 2026 19:46
@Blaizzy Blaizzy merged commit 9b4b304 into main Jan 31, 2026
@ekryski
Copy link
Copy Markdown
Contributor

ekryski commented Feb 22, 2026

@Blaizzy any reason why you wanted to remove Kokoro? I've been finding it to still be on of the fastest and most accurate TTS models and I've been testing everything bleeding edge I can get my hands on.

I was planning on putting up a PR for it because I brought in a native implementation into one of my apps. But if you guys don't want to include it, then I won't bother creating a PR for it. I know there are some concerns about licensing because of some of the underlying libraries that it uses, but there is a way around that.

@Blaizzy Blaizzy deleted the copilot/update-readme-sdk-architecture branch March 5, 2026 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants