Skip to content

Checker info - SEGM and XPOSTAG fields #18

@epageperron

Description

@epageperron

As of now, these are the rules concerning the SEGM and XPOSTAG field of the CDLI-CoNLL format. It might still change since we are still running into problems wile annotating, problems that require making decisions on the rules.

SEGM
containts the lemma which is composed of a dictionary word and its sense, appended and in square brackets. eg : udu[sheep] or dab[seize]
For all word types except verbs, there will only be suffixed morphemes, no prefixed ones.
All morphemes except the first element are composed of a dash, followed by the morpheme.
Only in the case of verbs, the first prefix will be without a dash. eg : i[-n]-dab[seize][-ø]
every morpheme can be enclosed in [], or nor.
There are also rules concerning the "slots" for morpheme but since we are not noting them we will not check for the order at this time, but we should open a backlog issue to that effect, since we want to democratize the usage of the tool, checking the possible order of morphemes would be an asset for inexperienced annotators.

XPOSTAG
This field will display the ETCSRI/ORACC POS tag OR the named entity tag instead of the lemma in the SEGM field.
For all word types except verbs, there will only be suffixed morphological tags, no prefixed ones.
All morph tags morphemes except the first element are composed of a period, followed by the morpheme.
Only verbs can have prefixes. eg : FIN.3-SG-H-A.V.3-SG-P
Tags can contain dashes, they are not meaningful in this context since the checker should use a map of morphemes and morph tags.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions