Skip to content

Tutorial

Nebojša Ćirić edited this page Sep 10, 2025 · 3 revisions

Introduction

The process of generating a grammatically correct sentence during a program's runtime is difficult to do in various human languages. The word choice in a sentence can have many word dependencies. The Unicode Inflection framework is meant to reduce the linguistic knowledge required by a programmer, and to empower the translators of a program's user interface to provide grammatically sentences in the program's user interface. The Unicode Inflection framework can support program's user interface that is either a GUI or a VUI.

The Unicode Inflection framework is meant to handle simple parts of a sentence, like an article with a noun, an adjective with a noun, or just a verb. It can not rewrite the entire sentence. It generally does not change the order of words or drastically change the grammar of a sentence. It is meant to work well when expressing the target sentence as a template.

These additional resources may be helpful background information to reference for grammatical reference and scope of the framework:

Inflecting a Word, and Getting the Grammemes of a Word

To change a word from one form to another, you have to modify the grammemes of the word. This is typically done by adding constraints. This is how you inflect a word. Below is an example of how to this.

#include <inflection/dialog/CommonConceptFactory.hpp>
#include <inflection/dialog/InflectableStringConcept.hpp>
#include <inflection/dialog/LocalizedCommonConceptFactoryProvider.hpp>
#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <unicode/unistr.h>
#include <iostream>
#include <memory>

using namespace icu;
using namespace inflection::dialog;
using namespace inflection::util;
using namespace std;

int main(const int argc, const char* const argv[])
{
    ULocale spanish("es");
    const auto& model(LocalizedCommonConceptFactoryProvider::getDefaultCommonConceptFactoryProvider()->getCommonConceptFactory(spanish)->getSemanticFeatureModel());

    // Inflect the word to the feminine form.
    InflectableStringConcept welcome(model, SpeakableString(u"bienvenido"));
    welcome.putConstraintByName(u"gender", u"feminine");
    unique_ptr<SpeakableString> result(welcome.toSpeakableString());
    if (result != nullptr) {
        // Output:
        // bienvenida
        cout << StringViewUtils::to_string(result->getPrint()) << endl;
    }

    // Inflect the word to the plural form.
    InflectableStringConcept cat(model, SpeakableString(u"gato"));
    cat.putConstraintByName(u"number", u"plural");
    result.reset(cat.toSpeakableString());
    if (result != nullptr) {
        // Output:
        // gatos
        cout << StringViewUtils::to_string(result->getPrint()) << endl;
    }

    // Add the definite article.
    cat.putConstraintByName(u"definiteness", u"definite");
    result.reset(cat.toSpeakableString());
    if (result != nullptr) {
        // Output:
        // los gatos
        cout << StringViewUtils::to_string(result->getPrint()) << endl;
    }

    // If you don't know the grammatical category name, and the grammeme is unique, get the defaults from the alias.
    auto indefiniteFeature = model->getFeatureAlias(u"indefinite");
    auto singularFeature = model->getFeatureAlias(u"singular");
    if (indefiniteFeature.first != nullptr && singularFeature.first != nullptr) {
        cat.putConstraintByName(indefiniteFeature.first->getName(), indefiniteFeature.second);
        cat.putConstraintByName(singularFeature.first->getName(), singularFeature.second);
        result.reset(cat.toSpeakableString());
        if (result != nullptr) {
            // Output:
            // un gato
            cout << StringViewUtils::to_string(result->getPrint()) << endl;
        }
    }

    // Get the grammatical gender of a word.
    result.reset(cat.getFeatureValueByName(u"gender"));
    if (result != nullptr) {
        // Output:
        // masculine
        cout << StringViewUtils::to_string(result->getPrint()) << endl;
    }
}

In the example above, you can also retrieve the value of a grammatical category, which is also called a grammeme. This is helpful when a word depends on the grammemes of another word. These grammemes can provide constraints on how a word is supposed to be inflected.

The equivalent C API can be found in inflection/dialog/InflectableStringConcept.h.

Customizing the Inflection of a Word

There are times when word inflection needs to be customized or disambiguated. In these scenarios, you should use a SemanticConcept. This is how you customize word inflection. Consider externalizing such data into a separate file that can be translated by the localizers. This will ensure that they can fix and update any relevant words.

#include <inflection/dialog/CommonConceptFactory.hpp>
#include <inflection/dialog/LocalizedCommonConceptFactoryProvider.hpp>
#include <inflection/dialog/SemanticConcept.hpp>
#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <unicode/unistr.h>
#include <iostream>
#include <memory>

using namespace icu;
using namespace inflection::dialog;
using namespace inflection::util;
using namespace std;

int main(const int argc, const char* const argv[])
{
    ULocale english("en");
    const auto& baseModel(*LocalizedCommonConceptFactoryProvider::getDefaultCommonConceptFactoryProvider()->getCommonConceptFactory(english)->getSemanticFeatureModel());
    SemanticValue semanticValue(u"label", u"brushString");
    map<SemanticValue, SemanticFeatureModel_DisplayData> semanticValueMap(
        {
            // This is the uncountable brush in a forest, and not the brush that you paint with plural being "brushes".
            {semanticValue, SemanticFeatureModel_DisplayData({
                DisplayValue(u"brush", map<SemanticFeature, u16string>({{*baseModel.getFeature(u"number"), u"singular"}})),
                DisplayValue(u"brush", map<SemanticFeature, u16string>({{*baseModel.getFeature(u"number"), u"plural"}}))
                })
            }
        });
    SemanticFeatureModel model(english, semanticValueMap);

    SemanticConcept brush(&model, semanticValue);
    brush.putConstraintByName(u"number", u"plural");
    unique_ptr<SpeakableString> result(brush.toSpeakableString());
    if (result != nullptr) {
        string str;
        UnicodeString::readOnlyAlias(result->getPrint()).toUTF8String(str);
        // Output:
        // brush
        cout << str << endl;
    }
}

There is no equivalent C API of SemanticConcept at this time.

Quantities

Representing a quantity can be difficult.

#include <inflection/dialog/CommonConceptFactory.hpp>
#include <inflection/dialog/InflectableStringConcept.hpp>
#include <inflection/dialog/LocalizedCommonConceptFactoryProvider.hpp>
#include <inflection/dialog/NumberConcept.hpp>
#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <unicode/unistr.h>
#include <iostream>
#include <memory>

using namespace icu;
using namespace inflection::dialog;
using namespace inflection::util;
using namespace std;

int main(const int argc, const char* const argv[])
{
    ULocale germany("de_DE");
    const auto conceptFactory = LocalizedCommonConceptFactoryProvider::getDefaultCommonConceptFactoryProvider()->getCommonConceptFactory(germany);

    InflectableStringConcept message(conceptFactory->getSemanticFeatureModel(), SpeakableString(u"Nachricht"));
    NumberConcept one(static_cast<int64_t>(1), germany, germany);
    unique_ptr<SpeakableString> result(conceptFactory->quantify(one, &message));
    if (result != nullptr) {
        string print;
        string speak;
        UnicodeString::readOnlyAlias(result->getPrint()).toUTF8String(print);
        UnicodeString::readOnlyAlias(result->getSpeak()).toUTF8String(speak);
        // Output:
        // print: 1 Nachricht
        // speak: eine Nachricht
        cout << "print: " << print << endl;
        cout << "speak: " << speak << endl;
    }

    NumberConcept two(static_cast<int64_t>(2), germany, germany);
    result.reset(conceptFactory->quantify(two, &message));
    if (result != nullptr) {
        string print;
        string speak;
        UnicodeString::readOnlyAlias(result->getPrint()).toUTF8String(print);
        UnicodeString::readOnlyAlias(result->getSpeak()).toUTF8String(speak);
        // Output:
        // print: 2 Nachrichten
        // speak: zwei Nachrichten
        cout << "print: " << print << endl;
        cout << "speak: " << speak << endl;
    }

    // Change the grammatical case of the number and noun.
    message.putConstraintByName(u"case", u"dative");
    result.reset(conceptFactory->quantify(one, &message));
    if (result != nullptr) {
        string print;
        string speak;
        UnicodeString::readOnlyAlias(result->getPrint()).toUTF8String(print);
        UnicodeString::readOnlyAlias(result->getSpeak()).toUTF8String(speak);
        // Output:
        // print: 1 Nachricht
        // speak: einer Nachricht
        cout << "print: " << print << endl;
        cout << "speak: " << speak << endl;
    }
}

The equivalent C API can be found in inflection/dialog/CommonConceptFactory.h.

Pronouns

While there are several ways to model a pronoun, it's common to inflect an existing pronoun by only modifying the gender. In these scenarios, you can use a PronounConcept. A PronounConcept is meant to model personal and reflexive pronouns. It is not meant to model interrogative pronouns nor non-personal pronouns.

#include <inflection/dialog/CommonConceptFactory.hpp>
#include <inflection/dialog/LocalizedCommonConceptFactoryProvider.hpp>
#include <inflection/dialog/PronounConcept.hpp>
#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <unicode/unistr.h>
#include <iostream>
#include <memory>

using namespace icu;
using namespace inflection::dialog;
using namespace inflection::util;
using namespace std;

int main(const int argc, const char* const argv[])
{
    ULocale english("en");
    const auto& model(*LocalizedCommonConceptFactoryProvider::getDefaultCommonConceptFactoryProvider()->getCommonConceptFactory(english)->getSemanticFeatureModel());

    // Inflect the word to the feminine form.
    PronounConcept pronoun(model, u"they");
    unique_ptr<SpeakableString> result(pronoun.toSpeakableString());
    if (result != nullptr) {
        // Output:
        // they
        cout << StringViewUtils::to_string(result->getPrint()) << endl;
    }
    pronoun.putConstraintByName(u"gender", u"feminine");
    result.reset(pronoun.toSpeakableString());
    if (result != nullptr) {
        // Output:
        // she
        cout << StringViewUtils::to_string(result->getPrint()) << endl;
    }
    pronoun.putConstraintByName(u"gender", u"masculine");
    result.reset(pronoun.toSpeakableString());
    if (result != nullptr) {
        // Output:
        // he
        cout << StringViewUtils::to_string(result->getPrint()) << endl;
    }
}

PronounConcept can also take custom pronouns. Any custom pronouns override the defaults for a pronoun. All pronouns not overridden will use the existing default pronouns.

The equivalent C API can be found in inflection/dialog/PronounConcept.h.

Lists

A SemanticConceptList is very flexible. It allows you to update various parts of a list. The common usage of a SemanticConceptList is to use it in an "and" or "conjunction" list, which is context sensitive for some languages. The SemanticConceptList provides the ability to change other parts of a list as necessary. Below is a common example:

#include <inflection/dialog/CommonConceptFactory.hpp>
#include <inflection/dialog/InflectableStringConcept.hpp>
#include <inflection/dialog/LocalizedCommonConceptFactoryProvider.hpp>
#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <unicode/unistr.h>
#include <iostream>
#include <memory>
#include <vector>

using namespace icu;
using namespace inflection::dialog;
using namespace inflection::util;
using namespace std;

int main(const int argc, const char* const argv[])
{
    ULocale spanish("es");
    const auto& factory = *LocalizedCommonConceptFactoryProvider::getDefaultCommonConceptFactoryProvider()->getCommonConceptFactory(spanish);
    const auto& model(factory.getSemanticFeatureModel());

    // Inflect the word to the feminine form.
    InflectableStringConcept cat(model, SpeakableString(u"gato"));
    InflectableStringConcept iguana(model, SpeakableString(u"iguana"));
    vector<SemanticFeatureConceptBase*> list({&cat, &iguana});
    unique_ptr<SemanticConceptList> andList(factory.createAndList(list));
    unique_ptr<SpeakableString> result(andList->toSpeakableString());
    if (result != nullptr) {
        // Output:
        // gato e iguana
        cout << StringViewUtils::to_string(result->getPrint()) << endl;
    }

    andList->putConstraintByName(u"definiteness", u"definite");
    result.reset(andList->toSpeakableString());
    if (result != nullptr) {
        // Output:
        // el gato y la iguana
        cout << StringViewUtils::to_string(result->getPrint()) << endl;
    }
}
Clone this wiki locally