How to use lemmatizer with span categorizer? #9201
-
Hi, However (after a struggle with config.cfg), I end up with a message: ValueError: [E143] Labels for component 'tagger' not initialized. This can be fixed by calling add_label, or by providing a representative batch of examples to the component's I know I could add labels, but why? In any other tagger use I did not have to do anything (I assume a callback may have done that). Is there an example of config.cfg using spancat along with (english) lemmatizer? Here is my config.cfg
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Do you want to train the tagger, or just use the pre-trained tagger from an existing pipeline? I assume you want to do the latter, in which case you should have something like this:
And you can repeat that for the attribute ruler and lemmatizer. That will load all those components from the existing pipeline without changes. See this section in the docs. (You probably also want to freeze the components since you don't need them when training the spancat.) The error is happening because you are using a blank tagger and your pipeline has no tok2vec for it to get input from - your only tok2vec is the one included inside the spancat, which isn't accessible from other components. |
Beta Was this translation helpful? Give feedback.
-
Well, I think a have my 'spancat pipeline using lemmatizer working. I am not sure how to prove it - perhaps there is something to see in the saved model (or perhaps I have to look for attributes in the doc?). I used the suggestions in #7149 by @svlandeg , ending with a pipeline: I could not use @svlandeg suggestion to simply re-train the source tok2vec, because en_core_web_lg tok2vec uses That (of course) puts a strain on my resources, especially since my 'spans' are up to 14 tokens. I am running out of 64 GB pf memory unless I play games with batch_size=400 (I asume it only affects validation) and training.batcher.size (going down to 100-800). My training run now exceed 24 hours, and it seems that lemmatization only made results somewhat worse (~1-2%).. Disappointing... |
Beta Was this translation helpful? Give feedback.
Do you want to train the tagger, or just use the pre-trained tagger from an existing pipeline? I assume you want to do the latter, in which case you should have something like this:
And you can repeat that for the attribute ruler and lemmatizer. That will load all those components from the existing pipeline without changes. See this section in the docs. (You probably also want to freeze the components since you don't need them when training the spancat.)
The error is happening because you are using a blank tagger and your pipeline has no tok2vec for it to get input from - your only tok2vec is the one included inside the spancat, which isn't a…