The Jupyter notebook is a part of the Text mining course Labs.
Information extraction (IE) is the task of identifying named entities and semantic relations between these entities in text data. The code includes two sub-tasks in IE:
- Named entity recognition (identifying mentions of entities)
- Entity linking (matching these mentions to entities in a knowledge base)
- Defining Evaluation measures
- Span recognition
- implementing a generator function that yields the gold-standard spans in a given data frame.
- Error analysis spans a data frame, including prints of the false positives and negatives.
- write a code to post-process the output produced by spaCy. To filter out specific labels it is useful to know the named entity label scheme
- labels: Filtered_label= ["CARDINAL", "DATE","ORDINAL", "MONEY", "TIME", "QUANTITY", "PERCENT"]
- Entity linking
- Extending the training data using the knowledge base
- Context-sensitive disambiguation