An end-to-end example of how to apply software engineering practices for ML training, testing and deployment. The repo trains and serves an ML model to predict the likelihood of loan defaults.
Run the go script to install pre-requisite dependencies. The go script will install Python 3 and Poetry, and create a virtual environment on the host. This will make it easier to configure our IDE to know about the Python interpreter for this project.
# mac users
scripts/go/go-mac.sh
# linux users
scripts/go/go-linux-ubuntu.sh
# windows
# 1. Download and install Python3 if not installed: https://www.python.org/downloads/release/python-31011/
# - During installation, when prompted, select "Add Python to PATH"
# 2. In Windows explorer/Search, go 'Manage App Execution Aliases' and turn off 'App Installer' for python. This resolves the issue where the `python` executable is not found in the PATH
# 3. Run the go script in the Powershell or Command Prompt Terminal
.\scripts\go\go-windows.bat
# Note: if you see a HTTPSConnectionPool read timed out error, just run this command a few more times until poetry install succeeds
- Install and configure Docker runtime
- Option 1: Use Docker Desktop, if you have a Docker Desktop license, or are eligible to use it for free, e.g. for personal or education purposes (see Docker Desktop license agreement)
- Follow steps here: https://docs.docker.com/desktop/
- Option 2: Use Colima (a license-free docker runtime, an alternative to docker desktop). This is useful if you want to adapt the dependency management setup in this repo in a commercial setting where Docker licenses aren't available.
** Note ** For this exercise, we will use Option 1 (colima) to demonstrate how to use docker containers in cases where Docker Desktop licenses aren't available.
- Configure your IDE to use the python virtual environment created by the go scripts
Build and start local development environment.
# ensure Docker runtime is started (either via Docker Desktop or colima). If using colima:
colima start
# install dependencies in local dev image
./batect --output=all setup
# start container (i.e. local dev environment)
./batect start-dev-container
Here are common tasks that you can run in the dev container during development.
# run model training smoke tests
scripts/tests/smoke-test-model-training.sh
# train model
scripts/train-model.sh
# run api tests
scripts/tests/api-test.sh
# exit container
exit # or hit Ctrl + D
You can also run these as batect tasks from the host (e.g. ./batect train-model
, ./batect api-test
)
For the following commands, you have to run it as batect tasks because the port mappings are defined at the level of each task. For example, in batect.yml
, the start-jupyter
task exposes port 8888 and makes it accessible from the host.
if you don't need
# start jupyter notebook
./batect start-jupyter
# start API in development mode
./batect start-api-locally
# send requests to API locally (run this from another terminal outside of the Docker container, as it uses curl, which we haven't installed)
scripts/request-local-api.sh