Skip to content

3rd-placed solution for the informatiCup2017: Classifying GitHub repositories

License

Notifications You must be signed in to change notification settings

Baschdl/git_better

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

git_better CircleCI

Demo
Documentation

🎉 We were 3rd place! Check out the official challenge results here 🎉

Solution Approach and Performance

Installation

Repository Classification

  • Make sure that you've installed Python 2.7
  • Install xgboost manually
  • Create a virtual environment and install all dependencies
virtualenv -p /usr/bin/python2.7 venv
source venv/bin/activate`  
pip install -r requirements.txt
  • Download the NLTK corpus
python -m nltk.downloader all
  • Create a personal access token, grant "Full control of private repositories" (repo) and put it in your config.ini file
cp example.config.ini config.ini

Django Server

Manual

  • Psycopg2 needs postgresql-devel
apt-get install -y libpq-dev
  • Sklearn needs Tkinter
apt-get install python-tk
  • Migrate the database
python server/manage.py migrate
  • Run the server (per default on port 8000)
python server/start_server.py

Docker

  • Install Docker
  • Build the docker image
docker build -t git_better .
  • Create a container from the image and run it in the background
docker run -d -p 8000:8000 git_better

Usage

To predict repository labels based on your own training data or based on pre-trained models, follow the instructions of our main script:
python app/main.py --help
As an example, to classify the input data from the challenge repository using our pre-trained models, run
python app/main.py -i data/example-input.txt

To visualize the data with the TensorBoard Embedding Projector, run python app/embedding_visualization.py and start tensorboard with tensorboard --logdir log/. Tensorboard will display the port on which the server listens, open localhost:[port] with your browser (standard port is 6006).

Testing

To test whether the app works correctly, simply run python -m unittest discover

Deployment on Heroku

docker build -t git_better .  
docker tag git_better registry.heroku.com/git-better/web  
docker push registry.heroku.com/git-better/web  
heroku open  

About

3rd-placed solution for the informatiCup2017: Classifying GitHub repositories

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 93.9%
  • Python 6.0%
  • HTML 0.1%