As of now, these are the rules concerning the SEGM and XPOSTAG field of the CDLI-CoNLL format. It might still change since we are still running into problems wile annotating, problems that require making decisions on the rules.
SEGM
containts the lemma which is composed of a dictionary word and its sense, appended and in square brackets. eg : udu[sheep] or dab[seize]
For all word types except verbs, there will only be suffixed morphemes, no prefixed ones.
All morphemes except the first element are composed of a dash, followed by the morpheme.
Only in the case of verbs, the first prefix will be without a dash. eg : i[-n]-dab[seize][-ø]
every morpheme can be enclosed in [], or nor.
There are also rules concerning the "slots" for morpheme but since we are not noting them we will not check for the order at this time, but we should open a backlog issue to that effect, since we want to democratize the usage of the tool, checking the possible order of morphemes would be an asset for inexperienced annotators.
XPOSTAG
This field will display the ETCSRI/ORACC POS tag OR the named entity tag instead of the lemma in the SEGM field.
For all word types except verbs, there will only be suffixed morphological tags, no prefixed ones.
All morph tags morphemes except the first element are composed of a period, followed by the morpheme.
Only verbs can have prefixes. eg : FIN.3-SG-H-A.V.3-SG-P
Tags can contain dashes, they are not meaningful in this context since the checker should use a map of morphemes and morph tags.
As of now, these are the rules concerning the SEGM and XPOSTAG field of the CDLI-CoNLL format. It might still change since we are still running into problems wile annotating, problems that require making decisions on the rules.
SEGM
containts the lemma which is composed of a dictionary word and its sense, appended and in square brackets. eg : udu[sheep] or dab[seize]
For all word types except verbs, there will only be suffixed morphemes, no prefixed ones.
All morphemes except the first element are composed of a dash, followed by the morpheme.
Only in the case of verbs, the first prefix will be without a dash. eg : i[-n]-dab[seize][-ø]
every morpheme can be enclosed in [], or nor.
There are also rules concerning the "slots" for morpheme but since we are not noting them we will not check for the order at this time, but we should open a backlog issue to that effect, since we want to democratize the usage of the tool, checking the possible order of morphemes would be an asset for inexperienced annotators.
XPOSTAG
This field will display the ETCSRI/ORACC POS tag OR the named entity tag instead of the lemma in the SEGM field.
For all word types except verbs, there will only be suffixed morphological tags, no prefixed ones.
All morph tags morphemes except the first element are composed of a period, followed by the morpheme.
Only verbs can have prefixes. eg : FIN.3-SG-H-A.V.3-SG-P
Tags can contain dashes, they are not meaningful in this context since the checker should use a map of morphemes and morph tags.