Spacy french NER transformer based model fr_dep_news_trf not working #13276
-
Hello, we want to use spacy to do NER extraction for french texts. The transformer based model fr_dep_news_trf seems to be broken. The list of entities is always empty. How to reproduce the behaviourWe create a minimum example to reproduce the issue with google colab
the model doesn't detect anything. Your EnvironmentIt's the default colab environment Info about spaCy
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
Hi! You're using
So it's not broken - the NER feature simply isn't available in this model. You'll have to use |
Beta Was this translation helpful? Give feedback.
-
Hello Sofie, Thank you for the quick response! It makes sense now. I have a separate related question.. (I can open a separate issue if you see fit). We actually started with the
The model outputs I don't know how the NER model is trained. But I find this error is really weird:
I understand that it's normal that model make mistakes.. but I see multiple errors like this that I can't understand how it would happen (For an extreme case, with Is french NER somehow more difficult? Should we perform some sort of pre-processing/post-processing to improve the accuracy? Thank you |
Beta Was this translation helpful? Give feedback.
-
Hi! It's fine to keep in this thread, on the discussion forum we're less strict about one-topic-per-thread. That does sound a bit odd, yea. There's no difference between the machinery used for training NER on the different languages - the only difference would be in the training data. WikiNER is used for both German and French, but I wouldn't rule out the fact that one language can be of different quality than another language within the same dataset (depending on annotators and/or dataset size). So to get to the bottom of why these weird cases are happening, it might be worth looking at the French WikiNER corpus... (sorry, not exactly a satisfying answer, I realise!) |
Beta Was this translation helpful? Give feedback.
-
By the way - one more comment. The reason why the German and French trf models don't currently have an NER in their pipeline, is because they use different datasets for NER & syntax, and we didn't want to have two separate transformers in the pipeline... This might be one reason why English performs better, because its transformer layer is being trained on both tasks in a sort of transfer learning setting. |
Beta Was this translation helpful? Give feedback.
Hi! You're using
fr_dep_news_trf
, which doesn't have an NER component. In the documentation you can find which components it does have:So it's not broken - the NER feature simply isn't available in this model. You'll have to use
fr_core_news_lg
instead.