Read, collaborate and talk about scientific papers.
To get started, create a local copy of .env_example
and modify according to your environment if needed.
Without making any changes, you should be able to run the web worker out-of-the-box via docker.
- Install Kubernetes and Skaffold locally
- Create
.secrets
file ink8s/dev/.secrets
according to.env_example
(you can skip the ones that are already hard-coded inkustomization.yaml
) - Run
skaffold dev -p dev --port-forward
to create and run a new cluster with the webserver and PostgresDB - Check out localhost:5000/health
- Set local context -
kubectl config use-context minikube
- Set namespace -
kubectl config set-context --current --namespace=scihive-backend
- Set Google cloud context -
cd terraform && gcloud container clusters get-credentials $(terraform output kubernetes_cluster_name) --region $(terraform output cluster_location) && cd ..
Run docker-compose up
to get a postgres DB and web worker up and running on docker images.
The web worker will auto reload code changes and apply them immediately. For non-code
changes (e.g. environment variables, configurations), interrupt the running process and re-run docker-compose up
.
When changing the requirements.txt
file, we need to rebuild the docker image. We can do that by
simply adding the build
flag on our next run: docker-compose up --build
Run docker-compose down --rmi all --volumes
to remove all the container volumes and get
a fresh start on your next up
command. This is mostly useful if you want a fresh database volume.
To grab some papers locally, we'd like to run the fetch-arxiv
command within our web instance.
With your docker instances running, run docker ps
to find the running web instance names.
Assuming that it's scihive-backend_web_1
, you can then run
docker exec -it scihive-backend_web_1 bash -c "flask fetch-arxiv"
to grab some papers. You can
just kill the process after a couple hundered papers were scraped.
The papers will be downloaded into a local folder on your web container, configured by LOCAL_FILES_DIRECTORY
(defaulting to /tmp/scihive-papers/
). We can access them by opening a shell in the container:
docker exec -it scihive-backend_web_1 bash
.
- Install Postgres
- Install
pandoc
to support references extraction from here: https://pandoc.org/installing.html - Install
pdftotext
to support acronyms extraction:sudo apt-get install poppler-utils
- Create your Python 3.7 virtual env
- Update your local .env file to match your environment
- Run
flask fetch-arxiv
to grab some papers (you can stop the function after fetching ~200 papers) - Run
flask run
- See
SciHive.postman_collection.json
for some examples of queries
- Repeat steps 2-5 from the Develop directly on your machine section
- Run
sh restart_server.sh
###Changelog
- May 31, 2019 - Acronym extraction and enrichment (from other papers)