In machine learning, data labeling is the process of identifying raw data (images, text files, videos, etc.) and adding one or more meaningful informative labels to provide context so that a machine learning model can learn from it. Today, most practical machine learning models utilize supervised learning, which applies an algorithm to map one input to one output.
In this purpose, we would like to develop a data labeling mechanism to tag the data which are most commonly in the form of images, videos, audio and text assets with proper, meaningful labels.
Data Labeling System consists of three iterations. In every iteration, we used Scrum development methodology in order to make easier the development process. Here is the descriptions of the iterations.
In the first iteration, random labels will be defined for instances by the system itself, but in subsequent iterations, different users can add different tags to an instance by using various types of labeling mechanisms.
In the second iteration, reporting functionality is added for user performance and labeling operation for a particular dataset. The main idea is to collect statistics for users, compare users in the context of a particular dataset or globally, and calculate metrics for instances in the dataset that are labeled with many users. The resulting reports will give us an idea about the quality of the data labeling and the quality of the users.
In the third iteration, human user interface is added to enable human users to label instances manually, one by one. Each human user has a username and a password validation which are required to enter the system in the beginning. Furthermore, The Relevance Bot, a new bot user, is added and uses Relevance Labeling Mechanism which checks instances' relevance meaning using an API and accordingly labels them.
Thanks goes to these wonderful people (emoji key):
Anıl Şenay |
Bilgehan Geçici |
Mehmet Ali Yüksel |
Kürşat Açıkgöz |
Beyza |
Ahmet Önkol |
Ahmet Elburuz Gürbüz |
This project follows the all-contributors specification. Contributions of any kind welcome!