Automated Generation of Dataset for Named Entity Recognition Task

This project generates a NER dataset using Wikipedia and Wikidata. It was developed especially for morphologically challenging and low resource language.

tools (all of these are subject to change)

python 3.6
pipenv

How to run

First clone the repository to your computer. git clone [email protected]:derlem/susamuru.git
Go to directory /susamuru/susamuru cd susamuru/susamuru
Install dependencies pipenv install
Change to pipenv shell pipenv shell
Download the wikipedia tr pages dump(which is the latest dump available). Here is the link
Extract the dump to /susamuru/susamuru/dumps folder.
Now you are good to go. Start the execution with: pipenv run python susamuru.py
After the execution, you should be able to see the output file in susamuru/susamuru/output folder.

Note:

For the countries where access to wikipedia web page is restricted, consider using VPN.

Name		Name	Last commit message	Last commit date
Latest commit History 346 Commits
post-processing		post-processing
susamuru		susamuru
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automated Generation of Dataset for Named Entity Recognition Task

tools (all of these are subject to change)

How to run

Note:

About

Uh oh!

Releases

Packages

Contributors 8

Uh oh!

Languages

derlem/susamuru

Folders and files

Latest commit

History

Repository files navigation

Automated Generation of Dataset for Named Entity Recognition Task

tools (all of these are subject to change)

How to run

Note:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Uh oh!

Languages

Packages