Skip to content

Language model files

ISC-SDE edited this page Jan 29, 2020 · 6 revisions

KB file overview

A KB consists of seven or eight csv files:

Contents Filename (source) Filename (compiled) Description
Abbreviations XX_dev_xx_acro.csv acro.csv a list of abbreviations that should not be treated as sentence endings, and if needed also words that in contrary mark a sentence ending
filter XX_dev_xx_filter.csv filter.csv transcription rules that are applied on the concept clusters at the end of the Smart Indexing process in order to optimize the clusters
grammatical labels XX_dev_xx_labels.csv labels.csv a list of all labels that are used in the lexrep file
lexical representations XX_dev_xx_lexreps.csv lexreps.csv a list of words and word groups with (grammatical) labels
metadata XX_dev_xx_metadata.csv metadata.csv language-specific settings for the language model
pre-processor XX_dev_xx_prepro.csv prepro.csv transcription rules that are applied on the input text before the actual indexing starts
rules XX_dev_xx_rules.csv rules.csv a series of rules to disambiguate elements that can be a Concept or a Relation depending on their context and to detect attributes and their scope
regular expressions (optional) XX_dev_xx_regex.csv regex.csv extra lexical representations with counterparts in the lexreps file

A full description of the contents for these files can be found in /docs/KB-file-formats.doc.

More on how these files get translated into runnable code in the corresponding section on the Build Process

Clone this wiki locally