Skip to content
@LanguageMachines

Language Machines

NLP Research group at Centre for Language Studies, Radboud University Nijmegen

Popular repositories Loading

  1. frog frog Public

    Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

    C++ 79 12

  2. ucto ucto Public

    Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use…

    C++ 70 14

  3. timbl timbl Public

    TiMBL implements several memory-based learning algorithms.

    C++ 53 9

  4. PICCL PICCL Public

    A set of workflows for corpus building through OCR, post-correction and normalisation

    Python 49 7

  5. LuigiNLP LuigiNLP Public

    A workflow system for Natural Language Processing.

    Python 21 5

  6. libfolia libfolia Public

    FoLiA library for C++

    C++ 17 6

Repositories

Showing 10 of 54 repositories
  • toad Public

    Toad: Trainer Of All Data, the Frog training collection

    LanguageMachines/toad’s past year of commit activity
    C++ 1 GPL-3.0 2 1 0 Updated Dec 12, 2025
  • ticcltools Public

    Tools for TICCL

    LanguageMachines/ticcltools’s past year of commit activity
    C++ 14 GPL-3.0 4 17 0 Updated Dec 12, 2025
  • foliautils Public

    Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)

    LanguageMachines/foliautils’s past year of commit activity
    C++ 4 GPL-3.0 3 8 0 Updated Dec 11, 2025
  • frog Public

    Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

    LanguageMachines/frog’s past year of commit activity
    C++ 79 GPL-3.0 12 13 (1 issue needs help) 0 Updated Dec 11, 2025
  • ucto Public

    Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …

    LanguageMachines/ucto’s past year of commit activity
    C++ 70 GPL-3.0 14 12 0 Updated Dec 11, 2025
  • foliatest Public

    Test suite for libfolia

    LanguageMachines/foliatest’s past year of commit activity
    C++ 0 GPL-3.0 2 0 0 Updated Dec 11, 2025
  • libfolia Public

    FoLiA library for C++

    LanguageMachines/libfolia’s past year of commit activity
    C++ 17 GPL-3.0 6 5 0 Updated Dec 11, 2025
  • dimbl Public

    Distributed Tilburg Memory Based Learner

    LanguageMachines/dimbl’s past year of commit activity
    C++ 2 GPL-3.0 2 0 0 Updated Dec 11, 2025
  • mbtserver Public
    LanguageMachines/mbtserver’s past year of commit activity
    C++ 1 GPL-3.0 2 0 0 Updated Dec 11, 2025
  • mbt Public

    MBT: Memory-based tagger generation and tagging MBT is a memory-based tagger-generator and tagger in one.

    LanguageMachines/mbt’s past year of commit activity
    C++ 10 GPL-3.0 1 1 0 Updated Dec 11, 2025

Top languages

Loading…

Most used topics

Loading…