-
Notifications
You must be signed in to change notification settings - Fork 62
GPII-3138: Move functionality of gpii-dataloader repo into universal #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
d5562d5
9e5346c
5c28281
5f3921a
044aa10
d59bf09
54d75f7
68a961b
fb3d435
26995c2
d44b584
07ace4c
eb92c4e
2d8abe9
f28c79a
9c42fbe
ee8417b
cbe889b
28fc635
6794a0f
af1660b
a8598eb
9c2167b
7eebe6e
7362413
dd3ea15
8847cd3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # CouchDB Data Loader | ||
|
|
||
| (`scripts/deleteAndLoadSnapsets.sh`) | ||
|
|
||
| This script is used to setup CouchDB database and is executed as a Kubernetes batch Job every time new version of the | ||
| universal image is deployed to the cluster (also when cluster is initially created). | ||
|
|
||
| It does following: | ||
|
|
||
| - Converts the preferences in universal into `snapset` Prefs Safes and GPII Keys, | ||
| - Optionally deletes existing database, | ||
| - Creates a CouchDB database if none exits, | ||
|
||
| - Updates the database with respect to its `design/views` document, as required, | ||
| - Loads the latest snapsets created into the database. | ||
|
|
||
| ## Environment Variables | ||
|
|
||
| - `COUCHDB_URL`: URL of the CouchDB database. (required) | ||
|
||
| - `CLEAR_INDEX`: If set to `true`, the database at $COUCHDB_URL will be deleted and recreated. (optional) | ||
| - `STATIC_DATA_DIR`: The directory where the static data to be loaded into CouchDB resides. (optional) | ||
| - `BUILD_DATA_DIR`: The directory where the data built from the conversion step resides. (optional) | ||
|
|
||
| The use of environment variables for data directories is useful if you want to mount the database data using a Docker | ||
| volume and point the data loader at it. | ||
|
|
||
| Note that since [the docker doesn't support the environment variable type of | ||
| array](https://github.com/moby/moby/issues/20169), two separate environment variables are used for inputting data | ||
| directories instead of one array that holds these directories. | ||
|
|
||
| ## Running | ||
|
|
||
| Example using containers: | ||
|
|
||
| ```bash | ||
| $ docker run -d -p 5984:5984 --name couchdb couchdb | ||
| $ docker run --rm --link couchdb -e COUCHDB_URL=http://couchdb:5984/gpii \ | ||
| -e CLEAR_INDEX=true vagrant-universal scripts/deleteAndLoadSnapsets.sh | ||
| $ docker run -d -p 8081:8081 --name preferences --link couchdb \ | ||
| -e NODE_ENV=gpii.config.preferencesServer.standalone.production \ | ||
| -e PREFERENCESSERVER_LISTEN_PORT=8081 -e DATASOURCE_HOSTNAME=http://couchdb \ | ||
| -e DATASOURCE_PORT=5984 vagrant-universal | ||
| ``` | ||
|
|
||
| Below are two versions of loading couchdb data from a different location (e.g. | ||
| /home/vagrant/sync/universal/testData/dbData for static data directory and /home/vagrant/sync/universal/build/dbData for | ||
| build data directory). The first version has the optional `CLEAR_INDEX` set to true to erase and reset the database | ||
| prior to other database changes: | ||
|
|
||
| ```bash | ||
| $ docker run --name dataloader --link couchdb \ | ||
| -v /home/vagrant/sync/universal/testData/dbData:/static_data -e STATIC_DATA_DIR=/static_data \ | ||
| -v /home/vagrant/sync/universal/build/dbData:/build_data -e BUILD_DATA_DIR=/build_data \ | ||
| -e COUCHDB_URL=http://couchdb:5984/gpii \ | ||
| -e CLEAR_INDEX=true vagrant-universal scripts/deleteAndLoadSnapsets.sh | ||
| ``` | ||
|
|
||
| The second version does not set `CLEAR_INDEX` such that any existing database is left intact prior to subsequent changes | ||
| to it (e.g., deleting the snapsets): | ||
|
|
||
| ```bash | ||
| $ docker run --name dataloader --link couchdb \ | ||
| -v /home/vagrant/sync/universal/testData/dbData:/static_data -e STATIC_DATA_DIR=/static_data \ | ||
| -v /home/vagrant/sync/universal/build/dbData:/build_data -e BUILD_DATA_DIR=/build_data \ | ||
| -e COUCHDB_URL=http://couchdb:5984/gpii \ | ||
| vagrant-universal scripts/deleteAndLoadSnapsets.sh | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| #!/bin/sh | ||
| APP_DIR=${APP_DIR:-"/app"} | ||
|
|
||
| STATIC_DATA_DIR=${STATIC_DATA_DIR:-"${APP_DIR}/testData/dbData"} | ||
| PREFERENCES_DATA_DIR=${PREFERENCES_DATA_DIR:-"${APP_DIR}/testData/preferences"} | ||
| BUILD_DATA_DIR=${BUILD_DATA_DIR:-'/tmp/build/dbData'} | ||
|
|
||
| DATALOADER_JS="${APP_DIR}/scripts/deleteAndLoadSnapsets.js" | ||
| CONVERT_JS="${APP_DIR}/scripts/convertPrefs.js" | ||
|
|
||
| log() { | ||
| echo "$(date +'%Y-%m-%d %H:%M:%S') - $1" | ||
| } | ||
|
|
||
| warm_indices(){ | ||
| log "Warming indices..." | ||
|
|
||
| for view in $(curl -s "${COUCHDB_URL}/_design/views/" | jq -r '.views | keys[]'); do | ||
| curl -fsS "${COUCHDB_URL}/_design/views/_view/${view}" >/dev/null | ||
| done | ||
|
|
||
| log "Finished warming indices..." | ||
| } | ||
|
|
||
| # Verify variables | ||
| if [ -z "${COUCHDB_URL}" ]; then | ||
| echo "COUCHDB_URL environment variable must be defined" | ||
| exit 1 | ||
| fi | ||
|
|
||
| COUCHDB_URL_SANITIZED=$(echo "${COUCHDB_URL}" | sed -e 's,\(://\)[^/]*\(@\),\1<SENSITIVE>\2,g') | ||
|
|
||
| log 'Starting' | ||
| log "CouchDB: ${COUCHDB_URL_SANITIZED}" | ||
| log "Clear index: ${CLEAR_INDEX}" | ||
| log "Static: ${STATIC_DATA_DIR}" | ||
| log "Build: ${BUILD_DATA_DIR}" | ||
| log "Working directory: $(pwd)" | ||
|
|
||
| # Check we can connect to CouchDB | ||
| COUCHDB_URL_ROOT=$(echo "${COUCHDB_URL}" | sed 's/[^\/]*$//g') | ||
| RET_CODE=$(curl --write-out '%{http_code}' --silent --output /dev/null "${COUCHDB_URL_ROOT}/_up") | ||
| if [ "$RET_CODE" != '200' ]; then | ||
| log "[ERROR] Failed to connect to CouchDB: ${COUCHDB_URL_SANITIZED}" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Create build dir if it does not exist | ||
| if [ ! -d "${BUILD_DATA_DIR}" ]; then | ||
| mkdir -p "${BUILD_DATA_DIR}" | ||
| fi | ||
|
|
||
| # Convert preferences json5 to GPII keys and preferences safes | ||
| if [ -d "${PREFERENCES_DATA_DIR}" ]; then | ||
| node "${CONVERT_JS}" "${PREFERENCES_DATA_DIR}" "${BUILD_DATA_DIR}" snapset | ||
| if [ "$?" != '0' ]; then | ||
| log "[ERROR] ${CONVERT_JS} failed (exit code: $?)" | ||
| exit 1 | ||
| fi | ||
| else | ||
| log "PREFERENCES_DATA_DIR ($PREFERENCES_DATA_DIR) does not exist, nothing to convert" | ||
| fi | ||
|
|
||
| # Initialize (possibly clear) data base | ||
| if [ "${CLEAR_INDEX}" == 'true' ]; then | ||
| log "Deleting database at ${COUCHDB_URL_SANITIZED}" | ||
| if ! curl -fsS -X DELETE "${COUCHDB_URL}"; then | ||
| log "Error deleting database" | ||
| fi | ||
| fi | ||
|
|
||
| log "Creating database at ${COUCHDB_URL_SANITIZED}" | ||
| if ! curl -fsS -X PUT "${COUCHDB_URL}"; then | ||
| log "Database already exists at ${COUCHDB_URL_SANITIZED}" | ||
| fi | ||
|
|
||
| # Submit data | ||
| node "${DATALOADER_JS}" "${COUCHDB_URL}" "${STATIC_DATA_DIR}" "${BUILD_DATA_DIR}" | ||
| err=$? | ||
| if [ "${err}" != '0' ]; then | ||
| log "${DATALOADER_JS} failed with ${err}, exiting" | ||
| exit "${err}" | ||
| fi | ||
|
|
||
| # Warm Data | ||
| warm_indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Puzzled that the option here is to delete the entire database rather than merely all of the documents of type "snapset"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a production test (
test:vagrantProduction) that executesvagrantCloudBasedContainers.shwithout its--no-rebuildflag. When first run locally, or when run by CI, there is no database to delete, so deleting the database is irrelevant at that point.But, if a developer runs the test a second time, they can at their option start from scratch with an empty database, or use the
--no-rebuildflag and modify the database in situ, which is closer to a production environment.The question is whether the first option -- start from scratch -- is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think this is just a documentation issue, since if I understand correctly, the action of steps 5 and 6 is actually the one of https://github.com/GPII/universal/blob/master/scripts/deleteAndLoadSnapsets.js - so we should just clarify the wording in the readme here to explain that steps 5 and 6 don't simply drop "snapsets and keys" but in particular only those keys which are associated with snapsets. I think it would be helpful for the comment here to explicitly link to or reproduce the comment at the head of the script https://github.com/GPII/universal/blob/master/scripts/deleteAndLoadSnapsets.js#L11 so that, for example, anyone invoking this script will do so in confidence that it will not delete user data (unless they enable GPII_CLEAR_INDEX, which should be supplied with a clear warning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've added to the README -- take a look.