Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore: Update missing Wikidata query forms #560

Closed
wants to merge 1 commit into from

Conversation

github-actions[bot]
Copy link

Automated PR: Missing Lexeme Forms

This is an automated PR created by the Check and Update Missing Query Forms workflow.

Missing Forms Summary

Language Forms Type
arabic adjectives, adverbs, nouns, prepositions, proper_nouns, verbs
basque adjectives, adverbs, nouns, pronouns, verbs
bengali adjectives, adverbs, nouns, pronouns, proper_nouns, verbs
czech adjectives, adverbs, nouns, personal_pronouns, prepositions, pronouns, proper_nouns, verbs
dagbani adjectives, adverbs, conjunctions, nouns, personal_pronouns, prepositions, pronouns, proper_nouns, verbs
danish adjectives, adverbs, nouns, personal_pronouns, pronouns, proper_nouns, verbs
english adjectives, adverbs, nouns, personal_pronouns, pronouns, proper_nouns, verbs
esperanto adjectives, nouns, personal_pronouns, pronouns, proper_nouns, verbs
estonian adjectives, adverbs, nouns, personal_pronouns, postpositions, prepositions, pronouns, proper_nouns, verbs
finnish adjectives, nouns, personal_pronouns, pronouns, proper_nouns, verbs
french adjectives, adverbs, nouns, personal_pronouns, prepositions, proper_nouns, verbs
german adjectives, adverbs, nouns, personal_pronouns, proper_nouns, verbs
greek adjectives, nouns, proper_nouns, verbs
hausa adjectives, nouns, prepositions, proper_nouns
hebrew adjectives, nouns, pronouns, proper_nouns, verbs
igbo adjectives, adverbs, nouns, pronouns, verbs
italian adjectives, nouns, prepositions, pronouns, proper_nouns, verbs
japanese adjectives, nouns, personal_pronouns
latin adjectives, adverbs, nouns, personal_pronouns, prepositions, pronouns, proper_nouns, verbs
latvian adverbs, nouns, personal_pronouns, pronouns, proper_nouns, verbs
malayalam adjectives, nouns, personal_pronouns, pronouns, proper_nouns, verbs
persian adjectives, adverbs, conjunctions, nouns, proper_nouns, verbs
polish adjectives, adverbs, nouns, personal_pronouns, pronouns, proper_nouns, verbs
portuguese adjectives, nouns, personal_pronouns, pronouns, proper_nouns, verbs
russian adjectives, adverbs, nouns, personal_pronouns, pronouns, proper_nouns, verbs
slovak adjectives, adverbs, nouns, proper_nouns, verbs
spanish adjectives, adverbs, nouns, personal_pronouns, pronouns, proper_nouns, verbs
swahili adjectives, nouns, personal_pronouns, verbs
swedish adjectives, adverbs, nouns, personal_pronouns, pronouns, proper_nouns, verbs
tamil adjectives, adverbs, nouns, personal_pronouns, pronouns, proper_nouns, verbs

Please review the changes and provide feedback.

@andrewtavis
Copy link
Member

Things that are still needed for the query generation process, @axif0:

  • The first form is being indented by two spaces
  • We want the language in the query header to be capitalized
  • We need to decide if we need the filter by language functionality in the queries
    • We need it if it's a sub-language
    • We do want a note at the top of the query, but something more simple like # Note: We need to filter for "ISO_2" because LANGUAGE_NAME is a dialect.
    • The above note would only be added for sub-languages

The last thing is that we need the label service fields to not be removed by this process. Let's discuss later! 😊

@andrewtavis
Copy link
Member

We should check the cases when we're using the label service to return statements and maybe there are certain properties that we're returning for certain data types:

  • Prepositions we're getting the grammatical case (P5713)
  • Nouns we're getting the grammatical gender (P5185)
  • Sometimes we return the label of the lemma (let's check in on this)
  • German verbs have auxiliary verbs auxiliaryVerbFrom

Copy link
Member

@wkyoshida wkyoshida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Comment on lines +1 to +3
# tool: scribe-data
# All arabic (Q13955) adverbs (Q380057) and their forms.
# Enter this query at https://query.wikidata.org/.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - just a suggestion

Would it make sense to include a blurb in these headers that the files are being generated? With something along the lines of..

Auto-generated by the check_and_update_missing_query_forms.yaml workflow.

@@ -0,0 +1,21 @@
# tool: scribe-data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit -

As the PR description links to the workflow, would it be a good idea to perhaps link to the specific commit version of the workflow that ran as opposed to simply the one found in main?

i.e. instead of..
https://github.com/scribe-org/Scribe-Data/blob/main/.github/workflows/check_and_update_missing_query_forms.yaml

link to..
https://github.com/scribe-org/Scribe-Data/blob/79ca889/.github/workflows/check_and_update_missing_query_forms.yaml

Mostly thinking really that this would allow someone to look at a closed automated PR and go to the correct version used - also this would allow for this as the workflow file gets renamed over time even.


I believe this could be done by passing in the parent commit SHA to pr_body.py that the workflow uses to create the PR description.

@axif0
Copy link
Collaborator

axif0 commented Jan 28, 2025

Things that are still needed for the query generation process, @axif0:

  • The first form is being indented by two spaces

  • We want the language in the query header to be capitalized

  • We need to decide if we need the filter by language functionality in the queries

    • We need it if it's a sub-language
    • We do want a note at the top of the query, but something more simple like # Note: We need to filter for "ISO_2" because LANGUAGE_NAME is a dialect.
    • The above note would only be added for sub-languages

The last thing is that we need the label service fields to not be removed by this process. Let's discuss later! 😊

I updated the logic a bit for sub-language in forms. After update we get more missing forms as expected, also added the filter for sub-language you mentioned earlier.

Sub-language structure is a bit different, thats why it didn't added.

Sub_lang for hindustani

{
{'type': 'lexeme', 'id': 'L580094', 'lemmas': {'hi': {'language': 'hi', 'value': 'पॉंच'}, 'ur': {'language': 'ur', 'value': 'پانچ'}}...
...
'forms': [{'id': 'L580094-F1', 'representations': {'hi': {'language': 'hi', 'value': 'पॉंच'}, 'ur': {'language': 'ur', 'value': 'پانچ'}}, 'grammaticalFeatures': ['Q115526488', 'Q1751855'], 'claims': {}}, {'id': 'L580094-F2', 'representations': {'hi': {'language': 'hi', 'value': '५'}, 'ur': {'language': 'ur', 'value': '۵'}}, 'grammaticalFeatures': [], 'claims': {}}, {'id': 'L580094-F3', 'representations': {'hi': {'language': 'hi', 'value': 'पॉंच'}, 'ur': {'language': 'ur', 'value': 'پانچ'}}, 'grammaticalFeatures': ['Q115526488', 'Q1233197'], 'claims': {}}, ...
...
}

English verb, windsurfing language without sub_language:

{"type":"lexeme","id":"L4","lemmas":{"en":{"language":"en","value":"windsurf"}},"lexicalCategory":"Q24905","language":"Q1860"....

....
"forms":[{"id":"L4-F1","representations":{"en":{"language":"en","value":"windsurfing"}},"grammaticalFeatures":["Q10345583"],"claims":{}},{"id":"L4-F3","representations":{"en":{"language":"en","value":"windsurfs"}},"grammaticalFeatures":["Q110786","Q3910936","Q51929074"],"claims":{}}

You can check the PR here.

CC: @andrewtavis, @wkyoshida

@andrewtavis
Copy link
Member

Nice, @axif0! Feel free to open a PR here and we'll dive into it more from there :) So exciting that this is being finalized! 🚀

@andrewtavis
Copy link
Member

Closing this as a new one will be opened up once #563 is merged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants