Skip to content
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
d5562d5
GPII-3138: Moved parts of gpii-dataloder code into universal
klown Oct 18, 2018
9e5346c
GPII-3138: Import deleteAndLoadSnapsets.sh and DataLoader.md
stepanstipl Oct 18, 2018
5c28281
GPII-3138: Add dataLoader dependencies to docker image (jq and curl)
stepanstipl Oct 19, 2018
5f3921a
GPII-3138: Modify convertPrefs.js to expect directories without the f…
stepanstipl Oct 19, 2018
044aa10
GPII-3138: Update data loader script to reflect new location in unive…
stepanstipl Oct 19, 2018
d59bf09
GPII-3138: Update directory checks to match containerised environment
stepanstipl Oct 19, 2018
54d75f7
GPII-3138: Minor formatting changes
stepanstipl Oct 19, 2018
68a961b
GPII-3138: Add DB connectivity check to dataloader
stepanstipl Oct 19, 2018
fb3d435
GPII-3138: Update vagrantCloudBasedContainers.sh to wor with omported…
stepanstipl Oct 19, 2018
26995c2
GPII-3138: Update DataLoader docs to reflect current state
stepanstipl Oct 19, 2018
d44b584
GPII-3138: Fix linting errors on DataLoader.md
stepanstipl Oct 19, 2018
07ace4c
GPII-3138: Revert unintended change in DataLoader readme
stepanstipl Oct 22, 2018
eb92c4e
GPII-3138: Improved check for proper usage of convertPrefs.js
klown Oct 22, 2018
2d8abe9
Merge pull request #3 from stepanstipl/dataloader-import-klown
klown Oct 22, 2018
f28c79a
GPII-3138: Fixed merge issues.
klown Oct 22, 2018
9c42fbe
GPII-3138: Fix typo in DataLoader.md
stepanstipl Oct 23, 2018
ee8417b
GPII-3138: Add GPII prefix to dataloader env variables
stepanstipl Oct 23, 2018
cbe889b
Merge pull request #4 from stepanstipl/dataloader-import2
klown Oct 23, 2018
28fc635
GPII-3138: Added to description of dataloader script.
klown Oct 23, 2018
6794a0f
GPII-3138: Added "GPII_" prefix to environment variables
klown Oct 23, 2018
af1660b
GPII-3138: Add GPII prefix to dataloader env variables
stepanstipl Oct 23, 2018
a8598eb
GPII-3138: Add GPII prefix to dataloader APP_DIR variable
stepanstipl Oct 24, 2018
9c2167b
Merge pull request #5 from stepanstipl/dataloader-import3
klown Oct 24, 2018
7eebe6e
GPII-3138: Merged upstream master GPII branch.
klown Oct 24, 2018
7362413
GPII-3138: Improved dataloader README
klown Oct 24, 2018
dd3ea15
GPII-3138: Add uniqie ID to pouchManager temp test dir
stepanstipl Oct 25, 2018
8847cd3
Merge pull request #6 from stepanstipl/dataloader-import3
klown Oct 25, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ WORKDIR /app
COPY . /app

RUN apk add --no-cache --virtual build-dependencies python make git g++ && \
apk add --no-cache curl jq && \
npm install && \
chown -R node:node . && \
npm cache clean --force && \
Expand Down
66 changes: 66 additions & 0 deletions documentation/DataLoader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# CouchDB Data Loader

(`scripts/deleteAndLoadSnapsets.sh`)

This script is used to setup CouchDB database and is executed as a Kubernetes batch Job every time new version of the
universal image is deployed to the cluster (also when cluster is initially created).

It does following:

- Converts the preferences in universal into `snapset` Prefs Safes and GPII Keys,
- Optionally deletes existing database,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Puzzled that the option here is to delete the entire database rather than merely all of the documents of type "snapset"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a production test (test:vagrantProduction) that executes vagrantCloudBasedContainers.sh without its --no-rebuild flag. When first run locally, or when run by CI, there is no database to delete, so deleting the database is irrelevant at that point.

But, if a developer runs the test a second time, they can at their option start from scratch with an empty database, or use the --no-rebuild flag and modify the database in situ, which is closer to a production environment.

The question is whether the first option -- start from scratch -- is useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think this is just a documentation issue, since if I understand correctly, the action of steps 5 and 6 is actually the one of https://github.com/GPII/universal/blob/master/scripts/deleteAndLoadSnapsets.js - so we should just clarify the wording in the readme here to explain that steps 5 and 6 don't simply drop "snapsets and keys" but in particular only those keys which are associated with snapsets. I think it would be helpful for the comment here to explicitly link to or reproduce the comment at the head of the script https://github.com/GPII/universal/blob/master/scripts/deleteAndLoadSnapsets.js#L11 so that, for example, anyone invoking this script will do so in confidence that it will not delete user data (unless they enable GPII_CLEAR_INDEX, which should be supplied with a clear warning)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've added to the README -- take a look.

- Creates a CouchDB database if none exits,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo exists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it's "exists" :-)
Fixed.

Copy link
Member Author

@klown klown Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind -- merging in Stepan's pull

- Updates the database with respect to its `design/views` document, as required,
- Loads the latest snapsets created into the database.

## Environment Variables

- `COUCHDB_URL`: URL of the CouchDB database. (required)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add prefixes to these (probably GPII_) to reduce chances of conflicts with other uses of the environment (and also as a hint/courtesy to anyone looking in the environment wondering how they got there)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds like a good idea (although the chance of conflict is pretty much non-existent - data loader runs in its own container and there's nothing else expected to be running).

- `CLEAR_INDEX`: If set to `true`, the database at $COUCHDB_URL will be deleted and recreated. (optional)
- `STATIC_DATA_DIR`: The directory where the static data to be loaded into CouchDB resides. (optional)
- `BUILD_DATA_DIR`: The directory where the data built from the conversion step resides. (optional)

The use of environment variables for data directories is useful if you want to mount the database data using a Docker
volume and point the data loader at it.

Note that since [the docker doesn't support the environment variable type of
array](https://github.com/moby/moby/issues/20169), two separate environment variables are used for inputting data
directories instead of one array that holds these directories.

## Running

Example using containers:

```bash
$ docker run -d -p 5984:5984 --name couchdb couchdb
$ docker run --rm --link couchdb -e COUCHDB_URL=http://couchdb:5984/gpii \
-e CLEAR_INDEX=true vagrant-universal scripts/deleteAndLoadSnapsets.sh
$ docker run -d -p 8081:8081 --name preferences --link couchdb \
-e NODE_ENV=gpii.config.preferencesServer.standalone.production \
-e PREFERENCESSERVER_LISTEN_PORT=8081 -e DATASOURCE_HOSTNAME=http://couchdb \
-e DATASOURCE_PORT=5984 vagrant-universal
```

Below are two versions of loading couchdb data from a different location (e.g.
/home/vagrant/sync/universal/testData/dbData for static data directory and /home/vagrant/sync/universal/build/dbData for
build data directory). The first version has the optional `CLEAR_INDEX` set to true to erase and reset the database
prior to other database changes:

```bash
$ docker run --name dataloader --link couchdb \
-v /home/vagrant/sync/universal/testData/dbData:/static_data -e STATIC_DATA_DIR=/static_data \
-v /home/vagrant/sync/universal/build/dbData:/build_data -e BUILD_DATA_DIR=/build_data \
-e COUCHDB_URL=http://couchdb:5984/gpii \
-e CLEAR_INDEX=true vagrant-universal scripts/deleteAndLoadSnapsets.sh
```

The second version does not set `CLEAR_INDEX` such that any existing database is left intact prior to subsequent changes
to it (e.g., deleting the snapsets):

```bash
$ docker run --name dataloader --link couchdb \
-v /home/vagrant/sync/universal/testData/dbData:/static_data -e STATIC_DATA_DIR=/static_data \
-v /home/vagrant/sync/universal/build/dbData:/build_data -e BUILD_DATA_DIR=/build_data \
-e COUCHDB_URL=http://couchdb:5984/gpii \
vagrant-universal scripts/deleteAndLoadSnapsets.sh
```
1 change: 1 addition & 0 deletions documentation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
* [Preferences Server](PreferencesServer.md)
* [Data Model for Preferences and OAuth Data](DataModel.md)
* [Pouch Manager](PouchManager.md)
* [Data Loader](DataLoader.md)
* [MatchMakerFramework](MatchMakerFramework.md)
* [Flat Match Maker](FlatMatchMaker.md)
* [Apptology](Apptology.md)
Expand Down
8 changes: 4 additions & 4 deletions scripts/convertPrefs.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ var inputDir = process.argv[2];
var targetDir = process.argv[3];
var prefsSafeType = process.argv[4] || "user";

if (prefsSafeType !== "snapset" && prefsSafeType !== "user") {
if (process.argv.length < 4 || (prefsSafeType !== "snapset" && prefsSafeType !== "user")) {
console.log("Usage: node scripts/convertPrefs.js InputFolder OutputFolder PrefsSafeType");
console.log(" where PrefsSafeType, is one of 'snapset' or 'user' (defaults to 'user')");
process.exit(1);
Expand All @@ -45,7 +45,7 @@ rimraf(targetDir, function () {
filenames.forEach(function (filename) {
if (filename.endsWith(".json5")) {
var gpiiKey = filename.substr(0, filename.length - 6);
var preferences = fs.readFileSync(inputDir + filename, "utf-8");
var preferences = fs.readFileSync(inputDir + "/" + filename, "utf-8");
var currentTime = new Date().toISOString();
var prefsSafeId = "prefsSafe-" + gpiiKey;

Expand Down Expand Up @@ -80,11 +80,11 @@ rimraf(targetDir, function () {
});

// Write the target files
var prefsSafesFile = targetDir + "prefsSafes.json";
var prefsSafesFile = targetDir + "/prefsSafes.json";
console.log("prefsSafesFile: " + prefsSafesFile);
fs.writeFileSync(prefsSafesFile, JSON.stringify(prefsSafes, null, 4));

var gpiiKeysFile = targetDir + "gpiiKeys.json";
var gpiiKeysFile = targetDir + "/gpiiKeys.json";
fs.writeFileSync(gpiiKeysFile, JSON.stringify(gpiiKeys, null, 4));

console.log("Finished converting preferences data in the source directory " + inputDir + " to the target directory " + targetDir);
Expand Down
86 changes: 86 additions & 0 deletions scripts/deleteAndLoadSnapsets.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
#!/bin/sh
APP_DIR=${APP_DIR:-"/app"}

STATIC_DATA_DIR=${STATIC_DATA_DIR:-"${APP_DIR}/testData/dbData"}
PREFERENCES_DATA_DIR=${PREFERENCES_DATA_DIR:-"${APP_DIR}/testData/preferences"}
BUILD_DATA_DIR=${BUILD_DATA_DIR:-'/tmp/build/dbData'}

DATALOADER_JS="${APP_DIR}/scripts/deleteAndLoadSnapsets.js"
CONVERT_JS="${APP_DIR}/scripts/convertPrefs.js"

log() {
echo "$(date +'%Y-%m-%d %H:%M:%S') - $1"
}

warm_indices(){
log "Warming indices..."

for view in $(curl -s "${COUCHDB_URL}/_design/views/" | jq -r '.views | keys[]'); do
curl -fsS "${COUCHDB_URL}/_design/views/_view/${view}" >/dev/null
done

log "Finished warming indices..."
}

# Verify variables
if [ -z "${COUCHDB_URL}" ]; then
echo "COUCHDB_URL environment variable must be defined"
exit 1
fi

COUCHDB_URL_SANITIZED=$(echo "${COUCHDB_URL}" | sed -e 's,\(://\)[^/]*\(@\),\1<SENSITIVE>\2,g')

log 'Starting'
log "CouchDB: ${COUCHDB_URL_SANITIZED}"
log "Clear index: ${CLEAR_INDEX}"
log "Static: ${STATIC_DATA_DIR}"
log "Build: ${BUILD_DATA_DIR}"
log "Working directory: $(pwd)"

# Check we can connect to CouchDB
COUCHDB_URL_ROOT=$(echo "${COUCHDB_URL}" | sed 's/[^\/]*$//g')
RET_CODE=$(curl --write-out '%{http_code}' --silent --output /dev/null "${COUCHDB_URL_ROOT}/_up")
if [ "$RET_CODE" != '200' ]; then
log "[ERROR] Failed to connect to CouchDB: ${COUCHDB_URL_SANITIZED}"
exit 1
fi

# Create build dir if it does not exist
if [ ! -d "${BUILD_DATA_DIR}" ]; then
mkdir -p "${BUILD_DATA_DIR}"
fi

# Convert preferences json5 to GPII keys and preferences safes
if [ -d "${PREFERENCES_DATA_DIR}" ]; then
node "${CONVERT_JS}" "${PREFERENCES_DATA_DIR}" "${BUILD_DATA_DIR}" snapset
if [ "$?" != '0' ]; then
log "[ERROR] ${CONVERT_JS} failed (exit code: $?)"
exit 1
fi
else
log "PREFERENCES_DATA_DIR ($PREFERENCES_DATA_DIR) does not exist, nothing to convert"
fi

# Initialize (possibly clear) data base
if [ "${CLEAR_INDEX}" == 'true' ]; then
log "Deleting database at ${COUCHDB_URL_SANITIZED}"
if ! curl -fsS -X DELETE "${COUCHDB_URL}"; then
log "Error deleting database"
fi
fi

log "Creating database at ${COUCHDB_URL_SANITIZED}"
if ! curl -fsS -X PUT "${COUCHDB_URL}"; then
log "Database already exists at ${COUCHDB_URL_SANITIZED}"
fi

# Submit data
node "${DATALOADER_JS}" "${COUCHDB_URL}" "${STATIC_DATA_DIR}" "${BUILD_DATA_DIR}"
err=$?
if [ "${err}" != '0' ]; then
log "${DATALOADER_JS} failed with ${err}, exiting"
exit "${err}"
fi

# Warm Data
warm_indices
6 changes: 3 additions & 3 deletions scripts/vagrantCloudBasedContainers.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,16 @@ COUCHDB_HEALTHCHECK_TIMEOUT=30
if [ "$NO_REBUILD" == "true" ] ; then
CLEAR_INDEX=
else
CLEAR_INDEX=1
CLEAR_INDEX='true'
fi

UNIVERSAL_DIR="/home/vagrant/sync/universal"
STATIC_DATA_DIR="$UNIVERSAL_DIR/testData/dbData"
BUILD_DATA_DIR="$UNIVERSAL_DIR/build/dbData/snapset"

DATALOADER_IMAGE="herrclown/gpii-dataloader"
DATALOADER_COUCHDB_URL="http://couchdb:${COUCHDB_PORT}/gpii"
DATASOURCE_HOSTNAME="http://couchdb"
DATALOADER_CMD="/app/scripts/deleteAndLoadSnapsets.sh"

GPII_PREFERENCES_CONFIG="gpii.config.preferencesServer.standalone.production"
GPII_PREFERENCES_PORT=9081
Expand Down Expand Up @@ -82,7 +82,7 @@ docker run -d -p $COUCHDB_PORT:$COUCHDB_PORT --name couchdb $COUCHDB_IMAGE
wget -O /dev/null --retry-connrefused --waitretry=$COUCHDB_HEALTHCHECK_DELAY --read-timeout=20 --timeout=1 --tries=$COUCHDB_HEALTHCHECK_TIMEOUT http://localhost:$COUCHDB_PORT

# Load the CouchDB data
docker run --rm --link couchdb -v $STATIC_DATA_DIR:/static_data -e STATIC_DATA_DIR=/static_data -v $BUILD_DATA_DIR:/build_data -e BUILD_DATA_DIR=/build_data -e COUCHDB_URL=$DATALOADER_COUCHDB_URL -e CLEAR_INDEX=$CLEAR_INDEX $DATALOADER_IMAGE
docker run --rm --link couchdb -v $STATIC_DATA_DIR:/static_data -e STATIC_DATA_DIR=/static_data -v $BUILD_DATA_DIR:/build_data -e BUILD_DATA_DIR=/build_data -e COUCHDB_URL=$DATALOADER_COUCHDB_URL -e CLEAR_INDEX=$CLEAR_INDEX $UNIVERSAL_IMAGE $DATALOADER_CMD

# Wait for the CouchDB views become accessible. Accessing the view URL forced the view index to build which take time.
# The URL returns 500 when the index is not ready, so use "--retry-on-http-error" option to continue retries at 500 response code.
Expand Down