Move all simple unit tests into JSON file #61

MilesCranmer · 2017-08-09T23:24:33Z

This PR takes all "simple" unit tests, and puts them into a single JSON file to help clean things up.

The result is tests.py reduced to 814 lines, and an easier way to bulk-add new names (just append them to the JSON file).

simple here means a test with a single HumanName(...) call, followed by several self.m(...), with nothing else. No other tests are ported.

Let me know what you think.
Cheers,

Miles

MilesCranmer · 2017-08-09T23:54:06Z

Extra random notes:

I didn't mention it, but really good job developing this package. It has saved me a tremendous amount of time and is very good with all the names I've come across. It should really be at least v1.0 now, because it feels quite stable!
If you are interested in bulk adding of unit tests (in case of finding corner cases which might exist), now with the JSON, you could considering running over huge name databases (after sewing together first name and last name) to make sure everything works: e.g., government lists: https://www.ssa.gov/oact/babynames/limits.html or https://www.census.gov/topics/population/genealogy/data/2000_surnames.html
It would be great to have a feature for fuzzy comparison of names. The equality method only works for identical names, but a lot of time, I end up comparing things "Cranmer, M" ?= "Cranmer, Miles", or similar. Maybe an extra HumanName.similarity(...) function could give a float (0-1) for how similar a name is.
Is there a way I can get HumanName to just print the initial of some part of the name? E.g., if someone has a joint last name, like "Perez-Backay" or something, this would be "PB". Likewise, a double middle name like "Johnson Johnson" could printed as "J. J."

Cheers,
Miles

derek73 · 2017-08-10T02:26:03Z

You are an amazing and beautiful person. Thank you! You made my day.
I agree, it should be a v1.0 already. Shoulda done it with my last refactor but forgot. Just been looking for a good excuse. I think your test refactor/addition can be the excuse I needed.
I like the fuzzy similarity idea. I've never implemented something like that but, like all things with this package, it'll give me a chance to add to my skills. Or, don't let me stop you from implementing it.
re: printing initials for a name part, I can't think of an easy way to do it with the existing class because currently "Perez-Backay" is just a string in an array and it doesn't know anything about the hyphen. We'd have to add support for that. Probably wouldn't be too hard. How would you want to access it?
I just checked out your code and tried changing a few of the strings to raise an error, but it's not raising an error for me when I run "python tests.py". Did you try that? Does it raise an error for you?

- This class name was duplicated in the next class, which was redefining it.

MilesCranmer · 2017-08-10T14:37:50Z

@derek73 sorry about the test not actually running — it looks like I used a duplicate class name, so it was just redefined later in the document, nulling the first one. It runs now (but the John Doe, Msc.Ed. test fails—is this expected?).

Re: 4., okay, I suppose that might be complicated. Is there a way with the current code to set up the print(name) to print with, e.g., "{first initial} {last}", or similar? This would just be for simple one-word names.

Cheers,
Miles

MilesCranmer · 2017-08-10T14:50:04Z

By the way, if it's still desirable to have a separate test "function" for each test in the JSON, I think this could be doable by, in __main__, generating an encapsulated function for each separate test (each entry in the JSON), and then assigning it to the BruteForce unittest class as a new method.

derek73 · 2017-08-11T02:23:01Z

Cool, I was able to raise an error now.

re 4, it's easier if you don't need to handle hyphenated names. You could do it with python string formatting, e.g.:

>>> from nameparser import HumanName
>>> test = HumanName("Firstname Middle Last")
>>> test.string_format = '{first[0]} {middle[0]} {last[0]}'
>>> print(test)
F M L

That's kinda brittle because it will barf if the name doesn't have all the keys in the format string. You could also treat it like an iterable, e.g.:

>>> for part in test:
>>>      if part: print(part[0])
F
M
L

If you implemented something in the class that's probably how you'd do it, with some special case when it has a hyphen depending what you want it to return in that case.

derek73 · 2017-08-11T03:51:19Z

Some other thoughts as I'm using your branch as I try to look into some of the other bug reports.

I have some tests for the conjunction code, but I'm have a hard time finding them now because there's no groupings in the current testnames.json. I wonder if there's anything we can do about that? One thought, maybe have separate files that could be like tests/fixtures/conjunctions.json? Or, is the json structure based on something you've found elsewhere?
Also on the tests, it'd be gravy if we could get the test name to show up in the failure message. You could do that by creating a new test method for each one, but it might be easier to make a new m() or even just assertEqual() since you're only using it in that one place. I guess a new method for each name might add to the count of tests, one for each name. Not sure if that's good or bad. Sounds more complicated than it's worth but maybe you see it differently.
We should also document this I guess in case someone else wants to run it against a bunch of names like you suggested. Probably show them what the format looks like and if it's based off anything, how to actually test against another file. Any interest in taking a stab at that? Maybe also some brief comment in the test method. I wonder if it should be HumanNameFixtureTests or something to let us know it's running over the things in the json file?
I also thought about adding support for something like python tests.py -f myownnames.json. Otherwise we could just say to download the project to run it I guess.

MilesCranmer added 8 commits August 9, 2017 22:59

Remove all simple test and put in JSON

2356edc

Clean up residual newlines after scrape

3b35ed7

Fix JSON testnames by making everything a string

a000527

Move escaped UTF-8 back to python tests.py

cb23c31

Fix single quotes in JSON testnames file

d86b4ef

Fix blank name JSON entry

26e28cf

Fix duplicate test name in testnames.json

5cae192

Add HumanNameTestBase to test all names in JSON

9618741

Fix JSON tests not running by changing class name

5a6857e

- This class name was duplicated in the next class, which was redefining it.

derek73 added the enhancement label Aug 11, 2017

derek73 added this to the v1.0 milestone Aug 11, 2017

derek73 removed this from the v1.0 milestone Aug 31, 2018

vaneseltine mentioned this pull request Aug 1, 2019

Introduce parameterized pytest suite #91

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move all simple unit tests into JSON file #61

Move all simple unit tests into JSON file #61

Uh oh!

MilesCranmer commented Aug 9, 2017

Uh oh!

MilesCranmer commented Aug 9, 2017

Uh oh!

derek73 commented Aug 10, 2017

Uh oh!

MilesCranmer commented Aug 10, 2017

Uh oh!

MilesCranmer commented Aug 10, 2017

Uh oh!

derek73 commented Aug 11, 2017 •

edited

Loading

Uh oh!

derek73 commented Aug 11, 2017

Uh oh!

Uh oh!

Move all simple unit tests into JSON file #61

Are you sure you want to change the base?

Move all simple unit tests into JSON file #61

Uh oh!

Conversation

MilesCranmer commented Aug 9, 2017

Uh oh!

MilesCranmer commented Aug 9, 2017

Uh oh!

derek73 commented Aug 10, 2017

Uh oh!

MilesCranmer commented Aug 10, 2017

Uh oh!

MilesCranmer commented Aug 10, 2017

Uh oh!

derek73 commented Aug 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

derek73 commented Aug 11, 2017

Uh oh!

Uh oh!

derek73 commented Aug 11, 2017 •

edited

Loading