Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move all simple unit tests into JSON file #61

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

MilesCranmer
Copy link

This PR takes all "simple" unit tests, and puts them into a single JSON file to help clean things up.

The result is tests.py reduced to 814 lines, and an easier way to bulk-add new names (just append them to the JSON file).

simple here means a test with a single HumanName(...) call, followed by several self.m(...), with nothing else. No other tests are ported.

Let me know what you think.
Cheers,

Miles

@MilesCranmer
Copy link
Author

Extra random notes:

  • I didn't mention it, but really good job developing this package. It has saved me a tremendous amount of time and is very good with all the names I've come across. It should really be at least v1.0 now, because it feels quite stable!
  • If you are interested in bulk adding of unit tests (in case of finding corner cases which might exist), now with the JSON, you could considering running over huge name databases (after sewing together first name and last name) to make sure everything works: e.g., government lists: https://www.ssa.gov/oact/babynames/limits.html or https://www.census.gov/topics/population/genealogy/data/2000_surnames.html
  • It would be great to have a feature for fuzzy comparison of names. The equality method only works for identical names, but a lot of time, I end up comparing things "Cranmer, M" ?= "Cranmer, Miles", or similar. Maybe an extra HumanName.similarity(...) function could give a float (0-1) for how similar a name is.
  • Is there a way I can get HumanName to just print the initial of some part of the name? E.g., if someone has a joint last name, like "Perez-Backay" or something, this would be "PB". Likewise, a double middle name like "Johnson Johnson" could printed as "J. J."

Cheers,
Miles

@derek73
Copy link
Owner

derek73 commented Aug 10, 2017

  1. You are an amazing and beautiful person. Thank you! You made my day.

  2. I agree, it should be a v1.0 already. Shoulda done it with my last refactor but forgot. Just been looking for a good excuse. I think your test refactor/addition can be the excuse I needed.

  3. I like the fuzzy similarity idea. I've never implemented something like that but, like all things with this package, it'll give me a chance to add to my skills. Or, don't let me stop you from implementing it.

  4. re: printing initials for a name part, I can't think of an easy way to do it with the existing class because currently "Perez-Backay" is just a string in an array and it doesn't know anything about the hyphen. We'd have to add support for that. Probably wouldn't be too hard. How would you want to access it?

  5. I just checked out your code and tried changing a few of the strings to raise an error, but it's not raising an error for me when I run "python tests.py". Did you try that? Does it raise an error for you?

- This class name was duplicated in the next class, which was redefining
  it.
@MilesCranmer
Copy link
Author

@derek73 sorry about the test not actually running — it looks like I used a duplicate class name, so it was just redefined later in the document, nulling the first one. It runs now (but the John Doe, Msc.Ed. test fails—is this expected?).

Re: 4., okay, I suppose that might be complicated. Is there a way with the current code to set up the print(name) to print with, e.g., "{first initial} {last}", or similar? This would just be for simple one-word names.

Cheers,
Miles

@MilesCranmer
Copy link
Author

By the way, if it's still desirable to have a separate test "function" for each test in the JSON, I think this could be doable by, in __main__, generating an encapsulated function for each separate test (each entry in the JSON), and then assigning it to the BruteForce unittest class as a new method.

@derek73
Copy link
Owner

derek73 commented Aug 11, 2017

Cool, I was able to raise an error now.

re 4, it's easier if you don't need to handle hyphenated names. You could do it with python string formatting, e.g.:

>>> from nameparser import HumanName
>>> test = HumanName("Firstname Middle Last")
>>> test.string_format = '{first[0]} {middle[0]} {last[0]}'
>>> print(test)
F M L

That's kinda brittle because it will barf if the name doesn't have all the keys in the format string. You could also treat it like an iterable, e.g.:

>>> for part in test:
>>>      if part: print(part[0])
F
M
L

If you implemented something in the class that's probably how you'd do it, with some special case when it has a hyphen depending what you want it to return in that case.

@derek73 derek73 added this to the v1.0 milestone Aug 11, 2017
@derek73
Copy link
Owner

derek73 commented Aug 11, 2017

Some other thoughts as I'm using your branch as I try to look into some of the other bug reports.

  1. I have some tests for the conjunction code, but I'm have a hard time finding them now because there's no groupings in the current testnames.json. I wonder if there's anything we can do about that? One thought, maybe have separate files that could be like tests/fixtures/conjunctions.json? Or, is the json structure based on something you've found elsewhere?

  2. Also on the tests, it'd be gravy if we could get the test name to show up in the failure message. You could do that by creating a new test method for each one, but it might be easier to make a new m() or even just assertEqual() since you're only using it in that one place. I guess a new method for each name might add to the count of tests, one for each name. Not sure if that's good or bad. Sounds more complicated than it's worth but maybe you see it differently.

  3. We should also document this I guess in case someone else wants to run it against a bunch of names like you suggested. Probably show them what the format looks like and if it's based off anything, how to actually test against another file. Any interest in taking a stab at that? Maybe also some brief comment in the test method. I wonder if it should be HumanNameFixtureTests or something to let us know it's running over the things in the json file?

  4. I also thought about adding support for something like python tests.py -f myownnames.json. Otherwise we could just say to download the project to run it I guess.

@derek73 derek73 removed this from the v1.0 milestone Aug 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants