-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added some medical suffixes #88
base: master
Are you sure you want to change the base?
Conversation
added some nurse suffixes: bn, rn, and np.
Update suffixes.py
removed king and queen from the titles as these are sometimes used as names
added a few prefixes: el, van, mc, mac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pull request has some good additions in it, but would need a few tweaks before I can merge it in. (Sorry it's been so long since your pull request, it's been a while since I could focus on this.)
'san', | ||
'santa', | ||
'st', | ||
'ste', | ||
'van', | ||
'vel', | ||
'van', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Van is sometimes a first name, so including it in prefixes would break parsing for all the Vans of the world. Skimming the US birth names database there do appear to be people named Van, eg 183 people born in 1983.
% python tests.py "Van middle last"
<HumanName : [
title: ''
first: 'Van middle last'
middle: ''
last: ''
suffix: ''
nickname: ''
]>
Similar comment with Mac. I went to school with a guy named Mac.
Mc is fine because there's no vowel so it can't be a first name. Although I guess it could be a title abbreviation, Master of Ceremonies, and I'm not sure how that would play out.
El is an article in Spanish, so I'd kinda like to know how it is used in a name. Is it used as the Spanish article in a title like el senator, or as a prefix like del?
@@ -7,12 +7,10 @@ | |||
'brother', | |||
'dame', | |||
'father', | |||
'king', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both king and queen are including in the set of titles that indicate first names when placed before a single name, e.g. King David and Queen Mary, so this pull request will break some tests. In 2005 there were 148 people born in the US named King, so maybe it is a more useful case to handle than the title. I'm know people have used this parser on datasets that include kings and queens before though, but I guess we can let them customize the titles constant to pick them up.
We should update the test cases that include "king" to use one of the other titles in that set.
I had a suggestion for adding a few medical acronyms: bn, np, and rn which I added to my version as we sometimes deal with medical professionals.