Instructions are here on Google Docs
Update: New frontend to view the newly enriched dataset is here
You can view previews/summaries of all the projects here on Streamlit and find the code for that here
Other related documents can be found in this Google Drive directory
In the scripts dir, see "algorithm_v0.py" - this is a relatively minimalist implementation of a ranking algorithm. Try not to change it! Leave it there as an example.
What you can do though is copy it to a new file, e.g. "algorithm_v1.py" and make whatever changes you want. Each time you submit to the repo, github should automatically run the newest file from the scripts dir whose name matches algorithm_*.py, and save the results into the results directory. NB the naming scheme for the results is ${timestamp}_${hash of the commit which triggered the results to be generated}_${algorithm file name which generated the results}.csv
- Automate testing that the award column for each generated results csv sums to 5,000,000
- Display results in some nicer way than just csv in github
- Check in the scripts used to generate the source data csv
- Automate generating the source data (incrementally!) when the generation scripts are updated. Maybe a database would be better for this than a csv?
- If you want to inspect the csv files in jupyter/pandas/vscode, there's a helper script at scripts/jupyter_view.py