Skip to content

chore: add SwiftLint and SwiftFormat configuration#140

Closed
beshkenadze wants to merge 17 commits intoBlaizzy:mainfrom
beshkenadze:chore/swift-linting-formatting
Closed

chore: add SwiftLint and SwiftFormat configuration#140
beshkenadze wants to merge 17 commits intoBlaizzy:mainfrom
beshkenadze:chore/swift-linting-formatting

Conversation

@beshkenadze
Copy link
Copy Markdown
Contributor

Summary

  • Adds .swiftlint.yml (SwiftLint 0.63+) for static analysis
  • Adds .swiftformat (SwiftFormat 0.60+) for automatic code formatting
  • Adds a Code Quality section to README with setup and usage instructions

Configuration highlights

Both configs target Swift 6 and are tuned for an ML audio library. Every non-default value is annotated with a reason inline.

SwiftLint

Setting Default → Value Reason
file_length.warning 400 → 500 MLX model files are inherently large
function_body_length.warning/error 40/100 → 60/150 DSP routines (STFT, mel spectrogram) can't always be split further
type_body_length.warning/error 250/350 → 400/800 Model classes declare all weights as stored properties
cyclomatic_complexity.ignores_case_statements false → true switch over token/language IDs inflates the metric
force_cast / force_try error → warning Gradual adoption path; CI stays green
trailing_comma disabled SwiftFormat owns trailing commas; avoids duplicate diagnostics
opt-in: force_unwrapping off → warning Public library safety signal
opt-in: async_without_await, unowned_variable_capture off → on Critical for Swift 6 strict concurrency

SwiftFormat

Setting Default → Value Reason
--swiftversion 6.2 none Matches swift-tools-version; enables Swift 6 redundancy rules
--maxwidth 120 none Consistent with SwiftLint line_length.warning
--wrap auto preserve Makes --maxwidth actually reformat long lines
--wraparguments/parameters/collections before-first preserve Readability for MLX initialisers with many named params
--guardelse next-line auto Async guard chains are easier to read with body on next line
--importgrouping testable-bottom none @testable import visually separated from production imports
--enable acronyms off Id → ID, Url → URL per Apple naming guidelines
--enable preferFinalClasses off All MLX model types are final; helps the compiler devirtualise
--enable isEmpty off .count == 0.isEmpty
--enable wrapConditionalBodies off Prevents easy-to-miss single-line guard else { return }

Test plan

  • brew install swiftlint swiftformat locally
  • swiftformat . --lint — no formatting violations on current codebase
  • swiftlint lint — review any new warnings, assess if thresholds need further tuning

beshkenadze and others added 17 commits March 21, 2026 20:16
* Update README.md (Blaizzy#104)

* Add Fish Audio S2 Pro model. (Blaizzy#106)

* Add README for Qwen3 TTS. (Blaizzy#107)

* Fix Parakeet multilingual recognition (Russian/non-English)

Root cause: BatchNorm running in training mode during inference.
MLX Module defaults to training=true, causing BatchNorm to compute
statistics from the current batch (size=1) instead of using stored
running_mean/running_var. This produced noisy encoder features that
confused Russian with Polish/Latin transliteration.

Fix: call model.train(false) after loading weights.

Also align preprocessing with NeMo reference:
- Mel scale: HTK → Slaney
- Window padding: right-pad → center-pad (matching torch.stft)
- STFT pad mode: reflect → constant (matching NeMo)
- Std normalization: ddof=0 → ddof=1 (Bessel's correction)
- Log zero guard: 1e-5 → 2^-24 (matching NeMo default)
- Mel filterbank norm: use "slaney" explicitly
- Filter special tokens (<|...|>) from decoded output

Verified: Swift output matches NeMo CUDA reference on Russian audio.

* Add KittenTTS text-to-speech model

Port of KittenTTS (StyleTTS2-based, 15M params, 24kHz) to MLX Swift.
Produces near-identical output to Python mlx-audio reference.

Architecture:
- PL-BERT (ALBERT) text encoder for semantic understanding
- Bidirectional LSTM prosody predictor (duration, F0, noise)
- iSTFT-Net vocoder with Snake activations and AdaIN conditioning
- 8 built-in voices (Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo)

G2P (Grapheme-to-Phoneme):
- MisakiSwift ported inline (Apache 2.0) — no external deps
- Dictionary lookup (gold/silver) + BART neural fallback for OOV words
- Apple NaturalLanguage framework for POS tagging

Files:
- Sources/MLXAudioTTS/Models/KittenTTS/ (8 model files + 16 G2P files)
- Sources/MLXAudioTTS/Resources/KittenTTS/ (~9MB US dictionaries + BART weights)
- scripts/convert_voices_npz.py (voices.npz → safetensors converter)
- Tests: 7 unit tests for config, text cleaner, voice aliases

Usage:
  let model = try await TTS.loadModel(modelRepo: "mlx-community/kitten-tts-nano-0.8")
  let audio = try await model.generate(text: "Hello world", voice: "Bella", ...)

HF models: mlx-community/kitten-tts-{nano,micro,mini}-0.8[-{4,5,6,8}bit]

---------

Co-authored-by: Prince Canuma <prince.gdt@gmail.com>
Co-authored-by: Lucas Newman <lucas@future.fit>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- skip 3D weight transposition in sanitize() for quantized checkpoints
  (MLX-converted weights are already in (out, kernel, in) layout)
- guard duration pipeline against NaN propagation from quantized encoder
- cap duration values at 100 to prevent OOM from garbage int32 casts
- return silence instead of crashing on empty indices

Fixes Blaizzy#135
- Add Cohere Transcribe STT model implementation
- Wire into CLI and docs
- Add Cohere Transcribe tests
- Fix: use max(dim-1, 1) in ParakeetAudio normalization (div-by-zero guard)
- Fix: add textProcessor param and kokoro case to TTSModel factory
- Improve test integration via MLXAUDIO_TEST_MODEL_DIR env var
Adds .swiftlint.yml (SwiftLint 0.63+) and .swiftformat (SwiftFormat 0.60+)
tuned for a Swift 6 ML audio library.

Key decisions documented inline in both files:
- Relaxed body/type/file length thresholds for DSP and model code
- Strict concurrency opt-in rules (async_without_await, unowned_variable_capture)
- preferFinalClasses + acronyms + isEmpty SwiftFormat opt-ins
- force_cast / force_try downgraded to warning for incremental adoption
- trailing_comma disabled in SwiftLint (managed by SwiftFormat)

Adds a "Code Quality" section to README with setup and usage instructions.
@beshkenadze
Copy link
Copy Markdown
Contributor Author

Superseded by a clean branch based on upstream/main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant