Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check-contracts and export-contracts functionality to CLI #561

Open
2 tasks done
andrewtavis opened this issue Jan 26, 2025 · 9 comments
Open
2 tasks done

Add check-contracts and export-contracts functionality to CLI #561

andrewtavis opened this issue Jan 26, 2025 · 9 comments
Assignees
Labels
feature New feature or request help wanted Extra attention is needed

Comments

@andrewtavis
Copy link
Member

Terms

Description

As discussed in recent calls, we need the data contracts to be able to be checked against Scribe-Data exports to see if all required fields that are in the contract are also in the data. If not, then the user needs to be alerted that the current contracts are invalid. The contracts can be found in src/scribe_data/wikidata/data-contracts. Generally the functionality that this issue would add would be:

# Check the data in the following directory to see that all needed language data is included:
scribe-data check-contract (cc) -od DIRECTORY_OF_OUTPUTS
# Error if a value of the contract is not in the data.

# Export the current contracts to an output directory so that they can be used outside of Scribe-Data:
scribe-data export-contracts (ec) -od DIRECTORY_TO_OUTPUT_CONTRACTS_TO

The changes for this would go in src/scribe_data/cli/main.py as well as new src/scribe_data/cli/contracts/check and src/scribe_data/cli/contracts/export files 😊

Contribution

Happy to support someone who has interest in working on this!

@axif0 might pick this up for Outreachy, but could also review if someone else had interest 📶🚀

@andrewtavis andrewtavis added feature New feature or request help wanted Extra attention is needed labels Jan 26, 2025
@you-think-you-know-me
Copy link
Contributor

Hey @andrewtavis , I would love to work on this issue. Please assign this to me.

@andrewtavis
Copy link
Member Author

Thanks for your willingness to pick up another issue so quickly, @you-think-you-know-me! Looking forward to seeing the results here 😊

CC @axif0 for an eventual review :)

@you-think-you-know-me
Copy link
Contributor

Hey @andrewtavis , For the check-contract feature, do we also need to print message if a certain language output file is not present in the output directory in which we are checking?

@andrewtavis
Copy link
Member Author

Yes exactly, @you-think-you-know-me :) Generally the plan would be to raise a non-zero exit code and then list all the contract values that are missing from the data set that we have.

@you-think-you-know-me
Copy link
Contributor

you-think-you-know-me commented Feb 4, 2025

ok thanks, I have few more doubts :-

  1. The contracts that have been added till now are for verb data types only?
  2. I am also getting difficulty in understanding the structure of data-contracts. It would be really helpful if you could give me an overview about it by taking any example.
    @axif0 @andrewtavis

@andrewtavis
Copy link
Member Author

The contracts are basically split based on the functionality of the end Scribe commands:

  • numbers is for the plural command
  • genders is for annotating the gender of nouns
  • conjugations is for the verb conjugation tables

So basically if we wanted to do a command, rather than hard coding the exact values for the given language that may change and are based on the data that's on Wikidata, we instead code the applications to get the correct column from the contract and then use that. You can check DATA_CONTRACTS.md for an explanation of the contracts :)

@andrewtavis
Copy link
Member Author

Let us know if you need more assistance!

@you-think-you-know-me
Copy link
Contributor

@andrewtavis I have successfully implemented the export-contracts feature. I am still having doubts about check-contract feature-

  1. We need to check that the labels mentioned inside conjugations section should be present in verbs.json files in output directory. Correct me if I am wrong.
  2. Similarly, the labels present inside numbers and genders need to be checked in which data-type files?

@andrewtavis
Copy link
Member Author

Great to hear, @you-think-you-know-me! Answer for question 1 is yes, and maybe we'll also check against other file types, but not stress on that for now. For 2, those labels need to be in the nouns table of the given language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request help wanted Extra attention is needed
Projects
Status: Todo
Development

No branches or pull requests

2 participants