chore: add SwiftLint and SwiftFormat configuration#140
Closed
beshkenadze wants to merge 17 commits intoBlaizzy:mainfrom
Closed
chore: add SwiftLint and SwiftFormat configuration#140beshkenadze wants to merge 17 commits intoBlaizzy:mainfrom
beshkenadze wants to merge 17 commits intoBlaizzy:mainfrom
Conversation
* Update README.md (Blaizzy#104) * Add Fish Audio S2 Pro model. (Blaizzy#106) * Add README for Qwen3 TTS. (Blaizzy#107) * Fix Parakeet multilingual recognition (Russian/non-English) Root cause: BatchNorm running in training mode during inference. MLX Module defaults to training=true, causing BatchNorm to compute statistics from the current batch (size=1) instead of using stored running_mean/running_var. This produced noisy encoder features that confused Russian with Polish/Latin transliteration. Fix: call model.train(false) after loading weights. Also align preprocessing with NeMo reference: - Mel scale: HTK → Slaney - Window padding: right-pad → center-pad (matching torch.stft) - STFT pad mode: reflect → constant (matching NeMo) - Std normalization: ddof=0 → ddof=1 (Bessel's correction) - Log zero guard: 1e-5 → 2^-24 (matching NeMo default) - Mel filterbank norm: use "slaney" explicitly - Filter special tokens (<|...|>) from decoded output Verified: Swift output matches NeMo CUDA reference on Russian audio. * Add KittenTTS text-to-speech model Port of KittenTTS (StyleTTS2-based, 15M params, 24kHz) to MLX Swift. Produces near-identical output to Python mlx-audio reference. Architecture: - PL-BERT (ALBERT) text encoder for semantic understanding - Bidirectional LSTM prosody predictor (duration, F0, noise) - iSTFT-Net vocoder with Snake activations and AdaIN conditioning - 8 built-in voices (Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo) G2P (Grapheme-to-Phoneme): - MisakiSwift ported inline (Apache 2.0) — no external deps - Dictionary lookup (gold/silver) + BART neural fallback for OOV words - Apple NaturalLanguage framework for POS tagging Files: - Sources/MLXAudioTTS/Models/KittenTTS/ (8 model files + 16 G2P files) - Sources/MLXAudioTTS/Resources/KittenTTS/ (~9MB US dictionaries + BART weights) - scripts/convert_voices_npz.py (voices.npz → safetensors converter) - Tests: 7 unit tests for config, text cleaner, voice aliases Usage: let model = try await TTS.loadModel(modelRepo: "mlx-community/kitten-tts-nano-0.8") let audio = try await model.generate(text: "Hello world", voice: "Bella", ...) HF models: mlx-community/kitten-tts-{nano,micro,mini}-0.8[-{4,5,6,8}bit] --------- Co-authored-by: Prince Canuma <prince.gdt@gmail.com> Co-authored-by: Lucas Newman <lucas@future.fit>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- skip 3D weight transposition in sanitize() for quantized checkpoints (MLX-converted weights are already in (out, kernel, in) layout) - guard duration pipeline against NaN propagation from quantized encoder - cap duration values at 100 to prevent OOM from garbage int32 casts - return silence instead of crashing on empty indices Fixes Blaizzy#135
- Add Cohere Transcribe STT model implementation - Wire into CLI and docs - Add Cohere Transcribe tests - Fix: use max(dim-1, 1) in ParakeetAudio normalization (div-by-zero guard) - Fix: add textProcessor param and kokoro case to TTSModel factory - Improve test integration via MLXAUDIO_TEST_MODEL_DIR env var
…ut and NaN durations
Adds .swiftlint.yml (SwiftLint 0.63+) and .swiftformat (SwiftFormat 0.60+) tuned for a Swift 6 ML audio library. Key decisions documented inline in both files: - Relaxed body/type/file length thresholds for DSP and model code - Strict concurrency opt-in rules (async_without_await, unowned_variable_capture) - preferFinalClasses + acronyms + isEmpty SwiftFormat opt-ins - force_cast / force_try downgraded to warning for incremental adoption - trailing_comma disabled in SwiftLint (managed by SwiftFormat) Adds a "Code Quality" section to README with setup and usage instructions.
Contributor
Author
|
Superseded by a clean branch based on upstream/main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.swiftlint.yml(SwiftLint 0.63+) for static analysis.swiftformat(SwiftFormat 0.60+) for automatic code formattingConfiguration highlights
Both configs target Swift 6 and are tuned for an ML audio library. Every non-default value is annotated with a reason inline.
SwiftLint
file_length.warningfunction_body_length.warning/errortype_body_length.warning/errorcyclomatic_complexity.ignores_case_statementsforce_cast/force_trytrailing_commadisabledforce_unwrappingasync_without_await,unowned_variable_captureSwiftFormat
--swiftversion 6.2swift-tools-version; enables Swift 6 redundancy rules--maxwidth 120line_length.warning--wrap auto--maxwidthactually reformat long lines--wraparguments/parameters/collections before-first--guardelse next-line--importgrouping testable-bottom@testable importvisually separated from production imports--enable acronymsId → ID,Url → URLper Apple naming guidelines--enable preferFinalClasses--enable isEmpty.count == 0→.isEmpty--enable wrapConditionalBodiesguard else { return }Test plan
brew install swiftlint swiftformatlocallyswiftformat . --lint— no formatting violations on current codebaseswiftlint lint— review any new warnings, assess if thresholds need further tuning