You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using presidio to anonymize text in Polish, but it does not detect names (e.g., "Jan Kowalski"). Here is my code:
from presidio_anonymizer import PresidioAnonymizer
from presidio_reversible_anonymizer import PresidioReversibleAnonymizer
config = {
"nlp_engine_name": "spacy",
"models": [{"lang_code": "pl", "model_name": "pl_core_news_lg"}],
}
anonymizer = PresidioAnonymizer(analyzed_fields=["PERSON", "PHONE_NUMBER", "EMAIL_ADDRESS"],
languages_config=config)
anonymizer_tool = PresidioReversibleAnonymizer(analyzed_fields=["PERSON", "PHONE_NUMBER", "EMAIL_ADDRESS"],
languages_config=config)
text = "Jan Kowalski mieszka w Warszawie i ma e-mail [email protected]."
anonymized_result = anonymizer_tool.anonymize(text)
anon_result = anonymizer.anonymize(text)
deanonymized_result = anonymizer_tool.deanonymize(anonymized_result)
print("Anonymized text:", anonymized_result)
print("Deanonymized text:", deanonymized_result)
print("Map:", anonymizer_tool.deanonymizer_mapping)
print("Anonymized text:", anon_result)
Output:
Anonymized text: Jan Kowalski mieszka w Warszawie i ma e-mail [email protected].
Deanonymized text: Jan Kowalski mieszka w Warszawie i ma e-mail [email protected].
Map: {}
Anonymized text: Jan Kowalski mieszka w Warszawie i ma e-mail [email protected].
I expected the name "Jan Kowalski" and the email address to be anonymized, but the output remains unchanged. I have installed the pl_core_news_lg model using:
python -m spacy download pl_core_news_lg
Am I missing something in the configuration, or does Presidio not support Polish entity recognition properly? Any suggestions on how to make it detect names in Polish?
The interesting thing is that when I use only
anonymizer_tool = PresidioReversibleAnonymizer()
Then the output look like this:
Anonymized text: Elizabeth Tate mieszka w Warszawie i ma e-mail [email protected].
Deanonymized text: Jan Kowalski mieszka w Warszawie i ma e-mail [email protected].
Map: {'PERSON': {'Elizabeth Tate': 'Jan Kowalski'}, 'EMAIL_ADDRESS': {'[email protected]': '[email protected]'}}
I have also tired it with stanza package but with the same result. Here is my code for that :
import stanza
stanza.download("pl")
from presidio_analyzer import AnalyzerEngine, RecognizerRegistry, PatternRecognizer, EntityRecognizer, Pattern, \
RecognizerResult
from presidio_analyzer.nlp_engine import NlpEngineProvider
text = "Jan Kowalski mieszka w Warszawie i ma e-mail [email protected]."
configuration = {"nlp_engine_name": "stanza", "models": [{"lang_code": "pl", "model_name": "pl"}]}
provider = NlpEngineProvider(nlp_configuration=configuration)
nlp_engine = provider.create_engine()
analyzer = AnalyzerEngine(nlp_engine=nlp_engine, supported_languages=["pl"])
results = analyzer.analyze(text=text, language="pl", entities=['PHONE_NUMBER', 'PERSON', 'PER'])
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am using presidio to anonymize text in Polish, but it does not detect names (e.g., "Jan Kowalski"). Here is my code:
Output:
I expected the name "Jan Kowalski" and the email address to be anonymized, but the output remains unchanged. I have installed the pl_core_news_lg model using:
python -m spacy download pl_core_news_lg
Am I missing something in the configuration, or does Presidio not support Polish entity recognition properly? Any suggestions on how to make it detect names in Polish?
The interesting thing is that when I use only
anonymizer_tool = PresidioReversibleAnonymizer()
Then the output look like this:
I have also tired it with stanza package but with the same result. Here is my code for that :
Beta Was this translation helpful? Give feedback.
All reactions