-
Notifications
You must be signed in to change notification settings - Fork 9
Update Duplicate Detection #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Lightning11wins
wants to merge
11
commits into
master
Choose a base branch
from
dups
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update front end dataqa plugin to follow the new schema for p_dups.
Add helpful comments. Add smother color gradient to dups UI. Ignore blank emails, phones, and addresses. Fix a bug where cos_compare() was used for phone numbers instead of lev_compare(). Abstract values into an object in update.qy. Remove unhelpful optimization attempts. Remove unhelpful comments.
Add known issues to string similarity documentation. Clean up and organize todos. Clean up testing code in several files.
Author
Rearchitect dupe acquisition queries, inlining them in update.qy. Add comments get/<field>.qy files to explain how the data is marshalled. Add print statements to update.qy which help with debugging. Add a comment to update.qy explaining the strategies used. Add code to compute individual-field similarities for concat dups, improving reasons displayed to the DB Admin. Rename globals.qy to cluster_params.qy.
Add last updated date. Add handling for edge case: missing dup reason. Add handling for some nondup edge cases. Organize confusing joins. Add some friendly flair when all dups are resolved.
… by the object system.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The duplicate detection project is ready to review, although (best case), there are still a couple of things blocking it from being ready to merge.
I would appreciate a full review of all changes, as there's quite a lot here. That said, some areas may require additional special attention, so I've compiled a list of all 28
TODO: Gregcomments below. (Note: Some of my todos assume the reader understands various pieces of nearby context / has generally read the indicated source code.)TODO: Greginupdate.qy.TODO: Greginupdate_duplicates.sh.Please let me know if you have any questions, comments, or concerns about my changes and design choices.