names-matcher

Fuzzily biject people's names between two lists.

Let's define an identity as a series of names belonging to the same person. The algorithm is:

Parse, normalize, and split names in each identity. The result is a set of strings per each.
Define the similarity between identities as max(ratio, token_set_ratio), where ratio
and token_set_ratio are inspired by string comparison functions from rapidfuzz.
Construct the distance matrix between identities in two specified lists.
Solve the Linear Assignment Problem (LAP) on that matrix.

Our LAP's solution scales up to ~1000-s of identities.

Example:

>>> from names_matcher import NamesMatcher
>>> NamesMatcher()([["Vadim Markovtsev", "vmarkovtsev"], ["Long, Waren", "warenlg"]], \
                    [["Warren"], ["VMarkovtsev"], ["Eiso Kant"]])
(array([1, 0], dtype=int32), array([0.75      , 0.57142857]))

The first resulting tuple element is the mapping indexes: of same length as the first sequence, with indexes in the second sequence. The second element is the corresponding confidence values from 0 to 1.

Installation

pip3 install names-matcher

Command line interface

Given one identity per line in two files, print the matches to standard output:

python3 -m names_matcher path/to/file/1 path/to/file/2

Each identity is several names merged with |, for example:

Vadim Markovtsev|vmarkovtsev|vadim

Contributing

Contributions are very welcome and desired! Please follow the code of conduct and read the contribution guidelines.

License

Apache-2.0, see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github		.github
names_matcher		names_matcher
.codecov.yml		.codecov.yml
.flake8		.flake8
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DCO.md		DCO.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
Makefile		Makefile
README.md		README.md
lint-requirements.txt		lint-requirements.txt
requirements.txt		requirements.txt
setup.py		setup.py
test-requirements.txt		test-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

names-matcher

Installation

Command line interface

Contributing

License

About

Releases

Contributors 4

Languages

License

athenianco/names-matcher

Folders and files

Latest commit

History

Repository files navigation

names-matcher

Installation

Command line interface

Contributing

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Contributors 4

Languages