-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sub-Language Filtering and Sub-Language Handling Improvements for lex… #563
base: main
Are you sure you want to change the base?
Conversation
Thank you for the pull request!The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :) If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you! Maintainer checklist |
Looks good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 🙌
src/scribe_data/cli/get.py
Outdated
except json.decoder.JSONDecodeError: | ||
rprint( | ||
"[bold red]Error: The Wikidata query service returned an invalid response. This usually happens when the query is too large or the service is temporarily unavailable. Please try again later or consider using a Wikidata dump instead.[/bold red]" | ||
) | ||
except (urllib.error.HTTPError, EndPointInternalError): | ||
rprint( | ||
"[bold red]Error: The Wikidata query service is currently experiencing issues (HTTP 500). This could be due to:[/bold red]" | ||
) | ||
rprint("[red]1. The query is too complex or large[/red]") | ||
rprint("[red]2. The Wikidata service is temporarily overloaded[/red]") | ||
rprint("[red]3. Server-side issues at Wikidata[/red]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking here - is there a way to differentiate between the scenarios where it's a client error and where it's a server error? What I mean is - are the errors or status codes that we get different, for instance, if it's due to the query being too large or if due to Wikidata being down?
More so asking, cause it be good to immediately inform the user what to do - whether they could potentially wait for Wikidata to come back up or if they really should just try the dump (as opposed to telling them it could be either one).
Maybe there is no way to tell and this has been looked at already, but was wondering 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where it's a client error and where it's a server error? What I mean is - are the errors or status codes that we get different, for instance, if it's due to the query being too large or if due to Wikidata being down?
Can we do like, if the error code is within 400 to 499 then it should show a client side error. if e.code>=500
then it should show wikidata server side error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Yeah, that's what I was wondering if we could do. Do we know what typically happens in the scenario where the query is too large? Do we get a 400 Bad Request
error or some other 4XX
code in that case?
But yeah, as you pointed out, @axif0, could make sense to do:
4XX
for client error5XX
for server error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know what typically happens in the scenario where the query is too large?
Mostly we get errors like #549 or EndPointInternalError (error code: 500) where the query is too large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm gotcha, I see 🤔
How about then:
- if we get a
JSONDecodeError
or aEndPointInternalError
, we already simply suggest to them to try a dump instead, since we know those errors are often indicative of the query being too large - if we get any other
4XX
error, we let them know it's some client error - if we get any other
5XX
error, we let them know it's some server error
Would this make sense perhaps? Let me know if not though (CC @andrewtavis)
P.S. It could be that perhaps #156 potentially obsoletes this issue with the query being too large 🤔 , but we'll get there when we do 😊
src/scribe_data/check/check_missing_forms/check_missing_forms.py
Outdated
Show resolved
Hide resolved
af38f12 is some standard changes from what I saw on quick glance as well as putting back in the default export directory. @axif0: One thing to note is for |
Note that we need to fix |
Adjust Also, as |
After merging the pr in test-scribe I tried to solve the errors generates by But I face a situation, In dump it has in reverse format. like
|
Are we able to use the dictionary in lexeme_form_metadata.json to order the forms that we're getting back from the dumps within the check? Might be ok? I don't think that the Wikidata community has a specific naming order that they stick to, so maybe just order them with our way of doing it and then do the check? |
|
I need to be able to react to comments with more than one rocket for some of the work you're doing here, @axif0 😊🚀🚀🚀 Are we ready for a final review? |
Ya, I tried Total 4 files caused the error. Please check the Test Workflow. |
…eme dump
Contributor checklist
pytest
command as directed in the testing section of the contributing guideDescription
Updated the logic a bit for sub-language in forms. After update we get more missing forms as expected, also added the filter for sub-language you mentioned earlier.
Sub-language structure
is a bit different, thats why it didn't added.Sub_lang for
hindustani
English verb,
windsurfing
language without sub_language:Update sub-language forms as well-
You can check the PR here.
Related issue