divvunspell

A fast, feature-rich spell checking library and toolset for HFST-based spell checkers. Written in Rust, divvunspell is a modern reimplementation and extension of hfst-ospell with additional features like parallel processing, comprehensive tokenization, case handling, and morphological analysis.

Features

High Performance: Memory-mapped transducers and parallel suggestion generation
ZHFST/BHFST Support: Load standard HFST spell checker archives
Smart Tokenization: Unicode-aware word boundary detection with customizable alphabets
Case Handling: Intelligent case preservation and suggestion recasing
Morphological Analysis: Extract and filter suggestions based on morphological tags
Cross-Platform: Works on macOS, Linux, Windows, iOS and Android

Quick Start

As a Command-Line Tool

# Install the CLI
cargo install divvunspell-cli

# Check spelling and get suggestions
divvunspell suggest --archive speller.zhfst --json "sámi"

As a Rust Library

Add to your Cargo.toml:

[dependencies]
divvunspell = "1.0.0-beta.7"

Basic usage:

use divvunspell::archive::{SpellerArchive, ZipSpellerArchive};
use divvunspell::speller::{Speller, SpellerConfig, OutputMode};

// Load a spell checker archive
let archive = ZipSpellerArchive::open("language.zhfst")?;
let speller = archive.speller();

// Check if a word is correct
if !speller.clone().is_correct("wordd") {
    // Get spelling suggestions
    let config = SpellerConfig::default();
    let suggestions = speller.clone().suggest("wordd");

    for suggestion in suggestions {
        println!("{} (weight: {})", suggestion.value, suggestion.weight);
    }
}

// Morphological analysis
let analyses = speller.analyze_input("running");
for analysis in analyses {
    println!("{}", analysis.value); // e.g., "run+V+PresPartc"
}

Command-Line Tools

divvunspell

The main spell checking tool with support for suggestions, analysis, and tokenization.

# Get suggestions for a word
divvunspell suggest --archive language.zhfst "wordd"

# Always show suggestions even for correct words
divvunspell suggest --archive language.zhfst --always-suggest "word"

# Limit number and weight of suggestions
divvunspell suggest --archive language.zhfst --nbest 5 --weight 20.0 "wordd"

# JSON output
divvunspell suggest --archive language.zhfst --json "wordd"

# Tokenize text
divvunspell tokenize --archive language.zhfst "This is some text."

# Analyze word forms morphologically
divvunspell analyze-input --archive language.zhfst "running"
divvunspell analyze-output --archive language.zhfst "runing"

Options:

-a, --archive <FILE> - BHFST or ZHFST archive to use
-S, --always-suggest - Show suggestions even if word is correct
-w, --weight <WEIGHT> - Maximum weight limit for suggestions
-n, --nbest <N> - Maximum number of suggestions to return
--no-reweighting - Disable suggestion reweighting (closer to hfst-ospell behavior)
--no-recase - Disable case-aware suggestion handling
--json - Output results as JSON

Debugging:

Set RUST_LOG=trace to enable detailed logging:

RUST_LOG=trace divvunspell suggest --archive language.zhfst "wordd"

thfst-tools

Convert HFST and ZHFST files to optimized THFST and BHFST formats.

THFST (Tromsø-Helsinki FST): A byte-aligned HFST format optimized for fast loading and memory mapping, required for ARM processors.

BHFST (Box HFST): THFST files packaged in a box container with JSON metadata for efficient processing.

# Convert HFST to THFST
thfst-tools hfst-to-thfst acceptor.hfst acceptor.thfst

# Convert ZHFST to BHFST (recommended for distribution)
thfst-tools zhfst-to-bhfst language.zhfst language.bhfst

# Convert THFST pair to BHFST
thfst-tools thfsts-to-bhfst --errmodel errmodel.thfst --lexicon lexicon.thfst output.bhfst

# View BHFST metadata
thfst-tools bhfst-info language.bhfst

accuracy

Test spell checker accuracy against known typo/correction pairs.

# Install
cd crates/accuracy
cargo install --path .

# Run accuracy test
accuracy typos.tsv language.zhfst

# Save detailed JSON report
accuracy -o report.json typos.tsv language.zhfst

# Limit test size and save TSV summary
accuracy -w 1000 -t results.tsv typos.tsv language.zhfst

# Use custom config
accuracy -c config.json typos.tsv language.zhfst

Input format (typos.tsv): Tab-separated values with typo in first column, expected correction in second:

wordd    word
recieve  receive
teh      the

Accuracy viewer (prototype web UI):

accuracy -o support/accuracy-viewer/public/report.json typos.txt language.zhfst
cd support/accuracy-viewer
npm i && npm run dev
# Open http://localhost:5000

Building from Source

Install Rust

curl https://sh.rustup.rs -sSf | sh
source $HOME/.cargo/env
rustup default stable

Build Everything

# Build all crates
cargo build --release

# Install specific tools
cargo install --path ./cli          # divvunspell CLI
cargo install --path ./crates/thfst-tools
cargo install --path ./crates/accuracy

Run Tests

cargo test

Documentation

API Documentation: docs.rs/divvunspell
GitHub Pages: divvun.github.io/divvunspell

License

The divvunspell library is dual-licensed under:

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

You may choose either license for library use.

The command-line tools (divvunspell, thfst-tools, accuracy) are licensed under GPL-3.0 (LICENSE-GPL).

Name		Name	Last commit message	Last commit date
Latest commit History 369 Commits
.cargo		.cargo
cli		cli
crates		crates
examples		examples
src		src
support		support
.gitignore		.gitignore
.taskcluster.yml		.taskcluster.yml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-GPL		LICENSE-GPL
LICENSE-MIT		LICENSE-MIT
README.md		README.md
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

divvunspell

Features

Quick Start

As a Command-Line Tool

As a Rust Library

Command-Line Tools

divvunspell

thfst-tools

accuracy

Building from Source

Install Rust

Build Everything

Run Tests

Documentation

License

About

Licenses found

Uh oh!

Releases 6

Uh oh!

Contributors 12

Uh oh!

Languages

License

Licenses found

divvun/divvunspell

Folders and files

Latest commit

History

Repository files navigation

divvunspell

Features

Quick Start

As a Command-Line Tool

As a Rust Library

Command-Line Tools

divvunspell

thfst-tools

accuracy

Building from Source

Install Rust

Build Everything

Run Tests

Documentation

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 6

Uh oh!

Contributors 12

Uh oh!

Languages