Running @Language.factory a second time in Databricks fails #13491
Unanswered
larrymccutchan
asked this question in
Help: Other Questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am developing some new pipelines in my NLP project for work. While creating the pipelines, i re-run a Jupiter like notebook over and over again as I develop the logic for it. (I am very new to spacy). I am getting a "code not found"/E004 for the following code when I run it a second time.
I believe that this is related to another discussion here: #7316
However, it seems like a very realistic scenario to create a pipeline. I believe you mentioned that rerunning it in the same process it unlikely (I read it as undesired). How would you suggest I do it?
I have been using dbutils.library.restartPython() (see here: https://docs.databricks.com/en/libraries/restart-python-process.html) to get it to work for now.
I would appreciate any help,
Larry
CODE below (stripped down version of the example provided here: https://spacy.io/usage/processing-pipelines#custom-components-attributes)
import requests
from spacy.lang.en import English
from spacy.language import Language
from spacy.matcher import PhraseMatcher
from spacy.tokens import Doc, Span, Token
@Language.factory("rest_countries")
class RESTCountriesComponent:
def init(self, nlp, name, label="GPE"):
Doc.set_extension("has_country", getter=self.has_country)
print("init")
nlp = English()
#nlp.add_pipe("rest_countries", config={"label": "GPE"})
doc = nlp("Some text about Colombia and the Czech Republic")
print("Pipeline", nlp.pipe_names) # pipeline contains component name
-- Error details
File /databricks/python/lib/python3.10/site-packages/spacy/language.py:514, in Language.factory..add_factory(factory_func)
508 if internal_name in registry.factories:
509 # We only check for the internal name here – it's okay if it's a
510 # subclass and the base class has a factory of the same name. We
511 # also only raise if the function is different to prevent raising
512 # if module is reloaded.
513 existing_func = registry.factories.get(internal_name)
--> 514 if not util.is_same_func(factory_func, existing_func):
515 err = Errors.E004.format(
516 name=name, func=existing_func, new_func=factory_func
517 )
518 raise ValueError(err)
File /databricks/python/lib/python3.10/site-packages/spacy/util.py:1125, in is_same_func(func1, func2)
1123 return False
1124 same_name = func1.qualname == func2.qualname
-> 1125 same_file = inspect.getfile(func1) == inspect.getfile(func2)
1126 same_code = inspect.getsourcelines(func1) == inspect.getsourcelines(func2)
1127 return same_name and same_file and same_code
File /databricks/python/lib/python3.10/site-packages/torch/package/package_importer.py:691, in _patched_getfile(object)
689 if object.module in _package_imported_modules:
690 return _package_imported_modules[object.module].file
--> 691 return _orig_getfile(object)
File /usr/lib/python3.10/inspect.py:785, in getfile(object)
783 return module.file
784 if object.module == 'main':
--> 785 raise OSError('source code not available')
786 raise TypeError('{!r} is a built-in class'.format(object))
787 if ismethod(object):
The e
Beta Was this translation helpful? Give feedback.
All reactions