Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add spacy_stanza into stanza_nlp_engine as it is no longer maintained #1522

Merged
merged 8 commits into from
Feb 5, 2025

Conversation

omri374
Copy link
Contributor

@omri374 omri374 commented Feb 4, 2025

Change Description

spacy_stanza is a spacy wrapper for stanza. spacy-stanza only supports stanza versions below 1.7, and due to some changes in torch, stanza < 1.7 doesn't work (unless we downgrade torch).

This change adds the code from spacy_stanza into stanza_nlp_engine, as spacy_stanza is no longer maintained.
Tests were also added to the test_stanza_nlp_engine.py file to ensure that the code works as expected.

@omri374 omri374 requested a review from a team as a code owner February 4, 2025 21:34
@omri374
Copy link
Contributor Author

omri374 commented Feb 4, 2025

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@omri374 omri374 requested a review from Copilot February 4, 2025 21:37

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

presidio-analyzer/presidio_analyzer/nlp_engine/stanza_nlp_engine.py:136

  • [nitpick] The function name 'tokenizer_factory' is ambiguous. It should be renamed to 'create_stanza_tokenizer' to better reflect its purpose.
def tokenizer_factory(
@omri374
Copy link
Contributor Author

omri374 commented Feb 4, 2025

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

SharonHart
SharonHart previously approved these changes Feb 5, 2025
@SharonHart SharonHart merged commit 65eabd4 into main Feb 5, 2025
32 checks passed
@SharonHart SharonHart deleted the omri/stanza_new_version branch February 5, 2025 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants