Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Scribe-Data Swahili data process queries #214

Closed
8 tasks done
andrewtavis opened this issue Oct 3, 2024 · 39 comments
Closed
8 tasks done

Create Scribe-Data Swahili data process queries #214

andrewtavis opened this issue Oct 3, 2024 · 39 comments
Assignees
Labels
feature New feature or request hacktoberfest Included as a part of Hacktoberfest help wanted Extra attention is needed

Comments

@andrewtavis
Copy link
Member

andrewtavis commented Oct 3, 2024

Terms

Description

This issue would create the queries for Swahili in the src/scribe_data/language_data_extraction directory. To start we can make a nouns query and a verbs query in two separate PRs, and from there we can make new issues for other types of data. These queries can be based on the already existing queries for other languages 😊

Data types to include:

  • Nouns
  • Verbs
  • Adjectives
  • Adverbs
  • Prepositions
  • Emoji keywords

Contribution

Happy to support and answer any questions that might come up in this process! Can also review when the PRs are up :)

@andrewtavis andrewtavis added feature New feature or request help wanted Extra attention is needed hacktoberfest Included as a part of Hacktoberfest labels Oct 3, 2024
@andrewtavis
Copy link
Member Author

CC @@LevisNgigi who expressed interest in working on this :) Can you write in here and I'll assign? Feel free to make the directory structure as you see the other languages are structured!

@LevisNgigi
Copy link
Contributor

LevisNgigi commented Oct 3, 2024

Yes I can write in here and I will be glad to be assigned this. Yes I will make the directory structure as I have seen for other languages.

@andrewtavis
Copy link
Member Author

Fantastic, @LevisNgigi! Looking forward to the contribution :)

@LevisNgigi
Copy link
Contributor

Question, I am currently querying data from Wikidata Query Service and the column for singular and plural are currently empty for the swahili language.Is it possible to get clarification on how to proceed?

@andrewtavis
Copy link
Member Author

Can you paste your query, @LevisNgigi? Maybe there's little data on Wikidata right now, or it's not categorized correctly 🤔 You can also try to remove everything from the query and just get Swahili words to check if there's info there :)

Nouns at the very least are usually consistent, so you'll still be able to send along your code that will work when there is data :)

@LevisNgigi
Copy link
Contributor

SELECT DISTINCT
?lexeme
?lemma
?singular
?plural

WHERE {
?lexeme dct:language wd:Q7838 ;
wikibase:lexicalCategory wd:Q1084 ;
wikibase:lemma ?lemma .

OPTIONAL {
    ?lexeme ontolex:lexicalForm ?singularForm .
    ?singularForm ontolex:representation ?singular ;
    wikibase:grammaticalFeature wd:Q110786 ;
} .

OPTIONAL {
    ?lexeme ontolex:lexicalForm ?pluralForm .
    ?pluralForm ontolex:representation ?plural ;
    wikibase:grammaticalFeature wd:Q146786 ;
} .

}

LIMIT 100
Above is my query. The lemma column has data but for singular and plural it is currently empty.Yes there are swahili words I had checked that before including the query with singular and plural.

@andrewtavis
Copy link
Member Author

You can also use src/scribe_data/check_language_data.sparql to check the data totals :) It looks like there are 203 Swahili nouns using Q7838 :)

@andrewtavis
Copy link
Member Author

By the looks of it singulars and plurals haven't been added for them yet, which is ok 😊 When I started Scribe years ago so many languages had no data. French only had two verbs with conjugations, and now there are thousands. Can you convert the lexeme over to just the LID instead of the URI, and from there I think we should be good for now :)

@andrewtavis
Copy link
Member Author

You can see the conversion in other queries :)

@LevisNgigi
Copy link
Contributor

Yes just checked and they are only 203 and 20 verbs.Should I proceed or it needs more data?

@andrewtavis
Copy link
Member Author

Proceed by all means, @LevisNgigi! There will be more data eventually :)

@andrewtavis
Copy link
Member Author

For now do your best, and we can revisit the queries later 😊

@LevisNgigi
Copy link
Contributor

Thank you.Really appreciate your help.

@andrewtavis
Copy link
Member Author

To quote you: The pleasure is mine :)

@VNW22
Copy link
Contributor

VNW22 commented Oct 4, 2024

Hey, I would like to also work on this issue

@andrewtavis
Copy link
Member Author

Hey @VNW22 👋 I think that @LevisNgigi has nouns covered. Would you want to make an adjectives query?

@VNW22
Copy link
Contributor

VNW22 commented Oct 4, 2024

yeah, happy to work on the adjectives query.

@andrewtavis
Copy link
Member Author

Ok, check the one for Bengali adjectives query and make something similar in the a Swahili directiry :)

@GicharuElvis
Copy link
Contributor

Hey @andrewtavis i would also like to work on this. Kindly let me know if there is anyway i could contribute.

@andrewtavis
Copy link
Member Author

I'll leave it to the other contributors to say if there's more work to do here :) We'll make more issues soon.

@GicharuElvis
Copy link
Contributor

Sure, no worries. I'll be on the lookout

@LevisNgigi
Copy link
Contributor

I'll leave it to the other contributors to say if there's more work to do here :) We'll make more issues soon.

I think we have the Nouns,verb and adjectives query.I think there is no more work here for now. @GicharuElvis

@GicharuElvis
Copy link
Contributor

No worries. Let me have a look at the other issues. :)

@LevisNgigi
Copy link
Contributor

Okay :). You can also check in Scribe-android. https://github.com/scribe-org/Scribe-Android

@andrewtavis
Copy link
Member Author

Or for this we could also do an adverbs one or prepositions :) Not sure on that for Swahili, but could be something to look into 😊

@andrewtavis
Copy link
Member Author

Hey all 👋 In regards to the Swahili work, I did some filtering for sw so it's Latin script as that's what a quick search showed is mostly used (sorry if this isn't the case!). Does it make sense to also make queries for the Arabic-letter style? And are there different names for these types of written Swahili?

@LevisNgigi
Copy link
Contributor

Hey @andrewtavis the filtering you used works perfectly as the Swahili that is written and spoken uses Latin script. The Arabic style of writing fizzled out with the coming of the missionaries who introduced Latin script. The Arabic-letter style is no longer in use .Also there are no other names for Swahili just that Swahili borrowed a lot from Arabic language hence the use of Arabic-letter style back in the day.

@andrewtavis
Copy link
Member Author

Thanks for letting me know, @LevisNgigi!

@andrewtavis
Copy link
Member Author

Just added a list of data types that we want to include to this issue :) Have marked those that are already done or have PRs open, and we can work on the others 😊 If the data type can't work, then we can move to the others and open up specific issues later :)

@LevisNgigi
Copy link
Contributor

Sounds great let me have a look at them now.

andrewtavis added a commit that referenced this issue Oct 12, 2024
@andrewtavis
Copy link
Member Author

8b4fead was a needed fix here, @LevisNgigi :) The queries for adverbs and prepositions were still using the QID for adjectives, so because of that we were getting adjectives back for both. Check it out so you can see the difference!

@LevisNgigi
Copy link
Contributor

Oh my bad, seems I forgot to change that. Thank you for correcting them. I really appreciate the work that you are doing considering all the issues that have popped up lately.

@andrewtavis
Copy link
Member Author

Doing the best I can, and appreciate your support, @LevisNgigi!

@LevisNgigi
Copy link
Contributor

Thank you.Pleasure is mine

@VNW22 VNW22 mentioned this issue Oct 14, 2024
1 task
@andrewtavis
Copy link
Member Author

This is closed up now 😊 Thanks all so much for the hard work here!

@github-project-automation github-project-automation bot moved this from Todo to Done in Scribe Board Oct 14, 2024
@LevisNgigi
Copy link
Contributor

Thank you so much for your help as well Andrew.

@andrewtavis
Copy link
Member Author

Very welcome, @LevisNgigi :)

@VNW22
Copy link
Contributor

VNW22 commented Oct 15, 2024

@andrewtavis I hope its not to late, I was reviewing the adjective query that I worked on and I saw that there are forms that I did not include, I was hoping to expand the query.
Can I still do that?

@andrewtavis
Copy link
Member Author

Yes, by all means, @VNW22! Really appreciate you going back through and expanding these :) We can do individual issues in the future as well 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request hacktoberfest Included as a part of Hacktoberfest help wanted Extra attention is needed
Projects
Archived in project
Development

No branches or pull requests

4 participants