This project generates a NER dataset using Wikipedia and Wikidata. It was developed especially for morphologically challenging and low resource language.
- python 3.6
- pipenv
- First clone the repository to your computer.
git clone [email protected]:derlem/susamuru.git - Go to directory /susamuru/susamuru
cd susamuru/susamuru - Install dependencies
pipenv install - Change to pipenv shell
pipenv shell - Download the wikipedia tr pages dump(which is the latest dump available). Here is the link
- Extract the dump to
/susamuru/susamuru/dumpsfolder. - Now you are good to go. Start the execution with:
pipenv run python susamuru.py - After the execution, you should be able to see the output file in
susamuru/susamuru/outputfolder.
- For the countries where access to wikipedia web page is restricted, consider using VPN.