Scala implementation of Aho-Corasick algorithm

A byte-oriented Aho-Corasick implementation in pure Scala. Keywords can carry a user-defined property tag.

Features

UTF-8 is used explicitly for string APIs, so behavior is consistent across platforms
Failure links are built automatically on the first search (or call build() eagerly)
Array-based goto table and vocabulary indexing for better performance and memory use on large dictionaries

Requirements

Java 11+
Maven 3.6+

Build and test

mvn test

Usage

import io.yizhiru.ac.Automaton

val ac = new Automaton
ac.addWords("pronoun", "he", "she", "his", "hers")

// build() is optional; search triggers construction automatically
val results: Set[(String, String)] = ac.search("ushers")
// Set(("he", "pronoun"), ("she", "pronoun"), ("hers", "pronoun"))

Search raw bytes when you already have UTF-8 (or other byte-level) data:

ac.addWordBytes("tag", "he".getBytes(java.nio.charset.StandardCharsets.UTF_8))
ac.searchBytes("ushers".getBytes(java.nio.charset.StandardCharsets.UTF_8))

API notes

Method	Description
`addWord(property, word)`	Register one UTF-8 keyword
`addWords(property, words*)`	Register multiple keywords with the same property
`addWordBytes(property, bytes)`	Register a keyword from raw bytes
`build()`	Eagerly construct failure links and complete the goto table
`search(text)`	Match UTF-8 text, returns `Set[(keyword, property)]`
`searchBytes(data)`	Match raw bytes

setFailTransitions() is deprecated and kept only for backward compatibility; it delegates to build().

Duplicate matches of the same (keyword, property) pair in one text are deduplicated because results are returned as a Set.

Encoding

All string-based methods encode and decode with StandardCharsets.UTF_8. If your input uses another charset, convert to bytes yourself and use addWordBytes / searchBytes.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scala implementation of Aho-Corasick algorithm

Features

Requirements

Build and test

Usage

API notes

Encoding

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scala implementation of Aho-Corasick algorithm

Features

Requirements

Build and test

Usage

API notes

Encoding

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages