-
-
Notifications
You must be signed in to change notification settings - Fork 16
Tutorial
The process of generating a grammatically correct sentence during a program's runtime is difficult to do in various human languages. The word choice in a sentence can have many word dependencies. The Unicode Inflection framework is meant to reduce the linguistic knowledge required by a programmer, and to empower the translators of a program's user interface to provide grammatically sentences in the program's user interface. The Unicode Inflection framework can support program's user interface that is either a GUI or a VUI.
The Unicode Inflection framework is meant to handle simple parts of a sentence, like an article with a noun, an adjective with a noun, or just a verb. It can not rewrite the entire sentence. It generally does not change the order of words or drastically change the grammar of a sentence. It is meant to work well when expressing the target sentence as a template.
These additional resources may be helpful background information to reference for grammatical reference and scope of the framework:
- UTW 2024 Solving Inflection (2024.10.22)
- Automatic Grammar Agreement in Message Formatting (2023.11.8)
- Authoring Grammatically Correct Conversational Templates for Siri (2020.10.16)
- Let's Come To An Agreement About Our Words (2017.02.16)
To change a word from one form to another, you have to modify the grammemes of the word. This is typically done by adding constraints. This is how you inflect a word. Below is an example of how to this.
#include <inflection/dialog/CommonConceptFactory.hpp>
#include <inflection/dialog/InflectableStringConcept.hpp>
#include <inflection/dialog/LocalizedCommonConceptFactoryProvider.hpp>
#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <unicode/unistr.h>
#include <iostream>
#include <memory>
using namespace icu;
using namespace inflection::dialog;
using namespace inflection::util;
using namespace std;
int main(const int argc, const char* const argv[])
{
ULocale spanish("es");
const auto& model(LocalizedCommonConceptFactoryProvider::getDefaultCommonConceptFactoryProvider()->getCommonConceptFactory(spanish)->getSemanticFeatureModel());
// Inflect the word to the feminine form.
InflectableStringConcept welcome(model, SpeakableString(u"bienvenido"));
welcome.putConstraintByName(u"gender", u"feminine");
unique_ptr<SpeakableString> result(welcome.toSpeakableString());
if (result != nullptr) {
// Output:
// bienvenida
cout << StringViewUtils::to_string(result->getPrint()) << endl;
}
// Inflect the word to the plural form.
InflectableStringConcept cat(model, SpeakableString(u"gato"));
cat.putConstraintByName(u"number", u"plural");
result.reset(cat.toSpeakableString());
if (result != nullptr) {
// Output:
// gatos
cout << StringViewUtils::to_string(result->getPrint()) << endl;
}
// Add the definite article.
cat.putConstraintByName(u"definiteness", u"definite");
result.reset(cat.toSpeakableString());
if (result != nullptr) {
// Output:
// los gatos
cout << StringViewUtils::to_string(result->getPrint()) << endl;
}
// If you don't know the grammatical category name, and the grammeme is unique, get the defaults from the alias.
auto indefiniteFeature = model->getFeatureAlias(u"indefinite");
auto singularFeature = model->getFeatureAlias(u"singular");
if (indefiniteFeature.first != nullptr && singularFeature.first != nullptr) {
cat.putConstraintByName(indefiniteFeature.first->getName(), indefiniteFeature.second);
cat.putConstraintByName(singularFeature.first->getName(), singularFeature.second);
result.reset(cat.toSpeakableString());
if (result != nullptr) {
// Output:
// un gato
cout << StringViewUtils::to_string(result->getPrint()) << endl;
}
}
// Get the grammatical gender of a word.
result.reset(cat.getFeatureValueByName(u"gender"));
if (result != nullptr) {
// Output:
// masculine
cout << StringViewUtils::to_string(result->getPrint()) << endl;
}
}
In the example above, you can also retrieve the value of a grammatical category, which is also called a grammeme. This is helpful when a word depends on the grammemes of another word. These grammemes can provide constraints on how a word is supposed to be inflected.
The equivalent C API can be found in inflection/dialog/InflectableStringConcept.h.
There are times when word inflection needs to be customized or disambiguated. In these scenarios, you should use a SemanticConcept. This is how you customize word inflection. Consider externalizing such data into a separate file that can be translated by the localizers. This will ensure that they can fix and update any relevant words.
#include <inflection/dialog/CommonConceptFactory.hpp>
#include <inflection/dialog/LocalizedCommonConceptFactoryProvider.hpp>
#include <inflection/dialog/SemanticConcept.hpp>
#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <unicode/unistr.h>
#include <iostream>
#include <memory>
using namespace icu;
using namespace inflection::dialog;
using namespace inflection::util;
using namespace std;
int main(const int argc, const char* const argv[])
{
ULocale english("en");
const auto& baseModel(*LocalizedCommonConceptFactoryProvider::getDefaultCommonConceptFactoryProvider()->getCommonConceptFactory(english)->getSemanticFeatureModel());
SemanticValue semanticValue(u"label", u"brushString");
map<SemanticValue, SemanticFeatureModel_DisplayData> semanticValueMap(
{
// This is the uncountable brush in a forest, and not the brush that you paint with plural being "brushes".
{semanticValue, SemanticFeatureModel_DisplayData({
DisplayValue(u"brush", map<SemanticFeature, u16string>({{*baseModel.getFeature(u"number"), u"singular"}})),
DisplayValue(u"brush", map<SemanticFeature, u16string>({{*baseModel.getFeature(u"number"), u"plural"}}))
})
}
});
SemanticFeatureModel model(english, semanticValueMap);
SemanticConcept brush(&model, semanticValue);
brush.putConstraintByName(u"number", u"plural");
unique_ptr<SpeakableString> result(brush.toSpeakableString());
if (result != nullptr) {
string str;
UnicodeString::readOnlyAlias(result->getPrint()).toUTF8String(str);
// Output:
// brush
cout << str << endl;
}
}
There is no equivalent C API of SemanticConcept at this time.
Representing a quantity can be difficult.
#include <inflection/dialog/CommonConceptFactory.hpp>
#include <inflection/dialog/InflectableStringConcept.hpp>
#include <inflection/dialog/LocalizedCommonConceptFactoryProvider.hpp>
#include <inflection/dialog/NumberConcept.hpp>
#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <unicode/unistr.h>
#include <iostream>
#include <memory>
using namespace icu;
using namespace inflection::dialog;
using namespace inflection::util;
using namespace std;
int main(const int argc, const char* const argv[])
{
ULocale germany("de_DE");
const auto conceptFactory = LocalizedCommonConceptFactoryProvider::getDefaultCommonConceptFactoryProvider()->getCommonConceptFactory(germany);
InflectableStringConcept message(conceptFactory->getSemanticFeatureModel(), SpeakableString(u"Nachricht"));
NumberConcept one(static_cast<int64_t>(1), germany, germany);
unique_ptr<SpeakableString> result(conceptFactory->quantify(one, &message));
if (result != nullptr) {
string print;
string speak;
UnicodeString::readOnlyAlias(result->getPrint()).toUTF8String(print);
UnicodeString::readOnlyAlias(result->getSpeak()).toUTF8String(speak);
// Output:
// print: 1 Nachricht
// speak: eine Nachricht
cout << "print: " << print << endl;
cout << "speak: " << speak << endl;
}
NumberConcept two(static_cast<int64_t>(2), germany, germany);
result.reset(conceptFactory->quantify(two, &message));
if (result != nullptr) {
string print;
string speak;
UnicodeString::readOnlyAlias(result->getPrint()).toUTF8String(print);
UnicodeString::readOnlyAlias(result->getSpeak()).toUTF8String(speak);
// Output:
// print: 2 Nachrichten
// speak: zwei Nachrichten
cout << "print: " << print << endl;
cout << "speak: " << speak << endl;
}
// Change the grammatical case of the number and noun.
message.putConstraintByName(u"case", u"dative");
result.reset(conceptFactory->quantify(one, &message));
if (result != nullptr) {
string print;
string speak;
UnicodeString::readOnlyAlias(result->getPrint()).toUTF8String(print);
UnicodeString::readOnlyAlias(result->getSpeak()).toUTF8String(speak);
// Output:
// print: 1 Nachricht
// speak: einer Nachricht
cout << "print: " << print << endl;
cout << "speak: " << speak << endl;
}
}
The equivalent C API can be found in inflection/dialog/CommonConceptFactory.h.
While there are several ways to model a pronoun, it's common to inflect an existing pronoun by only modifying the gender. In these scenarios, you can use a PronounConcept. A PronounConcept is meant to model personal and reflexive pronouns. It is not meant to model interrogative pronouns nor non-personal pronouns.
#include <inflection/dialog/CommonConceptFactory.hpp>
#include <inflection/dialog/LocalizedCommonConceptFactoryProvider.hpp>
#include <inflection/dialog/PronounConcept.hpp>
#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <unicode/unistr.h>
#include <iostream>
#include <memory>
using namespace icu;
using namespace inflection::dialog;
using namespace inflection::util;
using namespace std;
int main(const int argc, const char* const argv[])
{
ULocale english("en");
const auto& model(*LocalizedCommonConceptFactoryProvider::getDefaultCommonConceptFactoryProvider()->getCommonConceptFactory(english)->getSemanticFeatureModel());
// Inflect the word to the feminine form.
PronounConcept pronoun(model, u"they");
unique_ptr<SpeakableString> result(pronoun.toSpeakableString());
if (result != nullptr) {
// Output:
// they
cout << StringViewUtils::to_string(result->getPrint()) << endl;
}
pronoun.putConstraintByName(u"gender", u"feminine");
result.reset(pronoun.toSpeakableString());
if (result != nullptr) {
// Output:
// she
cout << StringViewUtils::to_string(result->getPrint()) << endl;
}
pronoun.putConstraintByName(u"gender", u"masculine");
result.reset(pronoun.toSpeakableString());
if (result != nullptr) {
// Output:
// he
cout << StringViewUtils::to_string(result->getPrint()) << endl;
}
}
PronounConcept can also take custom pronouns. Any custom pronouns override the defaults for a pronoun. All pronouns not overridden will use the existing default pronouns.
The equivalent C API can be found in inflection/dialog/PronounConcept.h.
A SemanticConceptList is very flexible. It allows you to update various parts of a list. The common usage of a SemanticConceptList is to use it in an "and" or "conjunction" list, which is context sensitive for some languages. The SemanticConceptList provides the ability to change other parts of a list as necessary. Below is a common example:
#include <inflection/dialog/CommonConceptFactory.hpp>
#include <inflection/dialog/InflectableStringConcept.hpp>
#include <inflection/dialog/LocalizedCommonConceptFactoryProvider.hpp>
#include <inflection/dialog/SemanticFeatureModel.hpp>
#include <inflection/dialog/SpeakableString.hpp>
#include <unicode/unistr.h>
#include <iostream>
#include <memory>
#include <vector>
using namespace icu;
using namespace inflection::dialog;
using namespace inflection::util;
using namespace std;
int main(const int argc, const char* const argv[])
{
ULocale spanish("es");
const auto& factory = *LocalizedCommonConceptFactoryProvider::getDefaultCommonConceptFactoryProvider()->getCommonConceptFactory(spanish);
const auto& model(factory.getSemanticFeatureModel());
// Inflect the word to the feminine form.
InflectableStringConcept cat(model, SpeakableString(u"gato"));
InflectableStringConcept iguana(model, SpeakableString(u"iguana"));
vector<SemanticFeatureConceptBase*> list({&cat, &iguana});
unique_ptr<SemanticConceptList> andList(factory.createAndList(list));
unique_ptr<SpeakableString> result(andList->toSpeakableString());
if (result != nullptr) {
// Output:
// gato e iguana
cout << StringViewUtils::to_string(result->getPrint()) << endl;
}
andList->putConstraintByName(u"definiteness", u"definite");
result.reset(andList->toSpeakableString());
if (result != nullptr) {
// Output:
// el gato y la iguana
cout << StringViewUtils::to_string(result->getPrint()) << endl;
}
}