Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用自建的高度嵌套的实体数据集是否合适 #1

Open
Aenchanteda opened this issue Apr 11, 2022 · 1 comment
Open

使用自建的高度嵌套的实体数据集是否合适 #1

Aenchanteda opened this issue Apr 11, 2022 · 1 comment

Comments

@Aenchanteda
Copy link

可以使用嵌套实体的数据集吗,比如要识别医学领域中的名词,比如“手术”、“部位”、“术式”、“症状”等其他类别。部位、症状、药品这些实体相互不隶属不包含,但“手术”这一类实体包含了“部位”、“术式”、“药品”、“部位”等等,这种与手术的共现导致语料里的紧密嵌套非常多,不像这个项目中的ORG、PERSON、NAME分的那么开。所以这样的语料能用吗,如果可以的话,想产生像train.json里面带下标的语料具体怎么做呢。希望您能解惑下

@geekhch
Copy link
Owner

geekhch commented Apr 13, 2022

可以的,这就是典型的嵌套实体问题呢,标注数据的时候,记录下每个实体的_(首字符位置,结束位置, 以及实体类别)_三元组就行了,不管嵌套与否,每个三元组都可以唯一表示一个实体。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants