Skip to content
This repository was archived by the owner on Sep 21, 2020. It is now read-only.

Conversation

@stepanstipl
Copy link
Contributor

@stepanstipl stepanstipl commented Oct 5, 2018

I was able to reliably reproduce the problem described in GPII-3391, it happens when the version of dataloader is updated. The reason being that batch/v1.Job's container image is not mutable.

Solution is to use helm's --force upgrade which :

--force force resource update through delete/recreate if needed
--
(see https://docs.helm.sh/helm/helm_upgrade/ for details)

As the dataloader is expected to be idempotent, this should work just fine.

This depends on gpii-ops/exekube#19

Update::
As this is connected with new dataloader release, it will require a manual cleanup of gpiiKey and correcponding prefsSafe records in a DB. And as we're dealing with prod db, I documented all the steps to be taken and expected number of deleted docs:

# Get credentials
export COUCHDB_AUTH=$(kubectl exec -it -n gpii couchdb-couchdb-0 -- /bin/bash -c 'echo -n $COUCHDB_USER:$COUCHDB_PASSWORD')

# Forward CouchDB port
kubectl port-forward couchdb-couchdb-0 -n gpii 5984:5984

# Create filter
echo '{"selector": {"type": "prefsSafe", "prefsSafeType": "user"}, "limit": 99999}' > filter-prefs.json

# Get all the prefs
curl -v http://${COUCHDB_AUTH}@localhost:5984/gpii/_find -X POST -d @filter-prefs.json -H "Content-Type: application/json" > tmp-prefs.json

# Filter out UUID based ones
jq -r '.docs[] | ._id + "?rev=" + ._rev' tmp-prefs.json | grep -v -E 'prefsSafe-[a-z0-9-]{36}' > tmp-prefs.list

# Check IDs (expect 105 records)
cat tmp-prefs.list

# Delete
cat tmp-prefs.list | xargs -I{} curl  -X DELETE "http://${COUCHDB_AUTH}@localhost:5984/gpii/{}"

# Filter for Keys
echo '{"selector": {"type": "gpiiKey"}, "limit": 99999}' > filter-keys.json

# Get all the keys
curl -v http://${COUCHDB_AUTH}@localhost:5984/gpii/_find -X POST -d @filter-keys.json -H "Content-Type: application/json" > tmp-keys.json

# Filter out UUID based ones
jq -r '.docs[] | ._id + "?rev=" + ._rev' tmp-keys.json | grep -v -E '^[a-z0-9-]{36}\?' > tmp-keys.list

# Check IDs (expect 105 records)
cat tmp-keys.list

# Delete
cat tmp-keys.list | xargs -I{} curl  -X DELETE "http://${COUCHDB_AUTH}@localhost:5984/gpii/{}"

For details see GPII/universal#626.

Update 2 - merge/release steps::
When merging and releaseing the order should be following:

(All the cleanup steps apply only to GCP, as behaviour for AWS will not change - the new versions of dataloader are not applied there)

@stepanstipl stepanstipl changed the title Add force_update to gpii-dataloader GPII-3391: Add force_update to gpii-dataloader Oct 5, 2018
@amatas
Copy link
Contributor

amatas commented Oct 5, 2018

yes, the dataloader is idempotent, but it will delete all the data in the couchdb before uploading again the data.

https://github.com/gpii-ops/gpii-dataloader/blob/master/loadData.sh#L45-L50
https://github.com/gpii-ops/gpii-infra/blob/master/shared/charts/gpii-dataloader/templates/job.yaml#L18-L19

So I guess that it will destroy all the empty preferences set that Javi will be creating eventually.

@stepanstipl
Copy link
Contributor Author

stepanstipl commented Oct 5, 2018

Ok, after discussion at standup I'll leave this open and we'll bring this up on Wednesday APCP meeting. Originally the dataloader was meant to be run every time there's a new version, but because of its current implementation, when all data are lost, we might not really want to fix this :).

@amatas
Copy link
Contributor

amatas commented Oct 10, 2018

This PR gpii-ops/gpii-dataloader#6 and GPII/universal#626 should be merged first.

@stepanstipl
Copy link
Contributor Author

This PR gpii-ops/gpii-dataloader#6 and GPII/universal#626 should be merged first.

Thanks Alf, I'm going through those tickets fo review them. There's also #83 in this repo that should be merged I assume.

@stepanstipl
Copy link
Contributor Author

I've updated the PR description with DB cleanup steps required for new dataloader and at the moment this is still waiting for the move of data loader from it's own image to the universal.

@klown
Copy link
Contributor

klown commented Oct 18, 2018

@stepanstipl You mean "Delete":

# Detete

@stepanstipl
Copy link
Contributor Author

@stepanstipl You mean "Delete":

# Detete

Yea, I do :), thanks, fixed speling.

@stepanstipl stepanstipl changed the title GPII-3391: Add force_update to gpii-dataloader GPII-3391: New dataloader (orig: Add force_update to gpii-dataloader) Oct 22, 2018
@stepanstipl
Copy link
Contributor Author

stepanstipl commented Oct 22, 2018

This now depends on GPII/universal#692 and gpii-ops/gpii-version-updater#10, replaces #83. See description above for details.

@natarajaya
Copy link
Contributor

LGTM

1 similar comment
@amatas
Copy link
Contributor

amatas commented Oct 25, 2018

LGTM

@klown
Copy link
Contributor

klown commented Oct 25, 2018

@stepanstipl FYI, I've got a copy of this pull and #10 (although there I'm using my docker account for components.conf) and trying it out in a dev version of AWS. I'll let you know how it goes.

@mrtyler
Copy link
Contributor

mrtyler commented Oct 25, 2018

LGTM

@mrtyler
Copy link
Contributor

mrtyler commented Oct 26, 2018

LGTM

@klown
Copy link
Contributor

klown commented Oct 26, 2018

@stepanstipl Progress!

I wrote:

I've got a copy of this pull and #10 (although there I'm using my docker account for components.conf) and trying it out in a dev version of AWS. I'll let you know how it goes.

With @mrtyler's help, my AWS dev cluster is back up and running. The dataloader did its thing and properly added the views and snapsets. Since this run involved creating a new database from scratch, that's all the dataloader did. I'll try adding some new views and 'user' PrefsSafes. But, since it's a weekend, I'll rake destroy the cluster before I leave.

@klown
Copy link
Contributor

klown commented Oct 26, 2018

I killed the dataloader, and re-deployed it. It correctly erased and replaced the snapsets (I checked by looking at the creation dates).

@klown
Copy link
Contributor

klown commented Oct 26, 2018

Added one 'user' PrefsSafe/GPII key by hand to the database, and re-deployed the data loader. Only the snapsets were updated (again, by checking creation dates).

@stepanstipl
Copy link
Contributor Author

@klown thanks for testing further

@stepanstipl stepanstipl merged commit 27cb956 into gpii-ops:master Oct 29, 2018
@klown
Copy link
Contributor

klown commented Oct 29, 2018

@klown thanks for testing further

One more successful test this morning:

  1. added a 'user' PrefsSafe/GPII-key to the database,
  2. modified the _design/views in the database,
  3. re-deployed the dataloader,
  4. inspection of database and/or the dataloader log shows:
    4a. 'user' PrefsSafe/GPII-key left intact,
    4b. _design/views updated,
    4c. snapset PrefsSafes/GPII-keys updated.

@klown
Copy link
Contributor

klown commented Oct 29, 2018

I found one odditity, @stepanstipl.

The dataloader log shows that the "build" directory is: /tmp/build/dbData and not /app/build/dbData. The reason this is odd is:

  • the Dockerfile has an npm install command,
  • package.json has a postinstall script that creates the build directory tree.
  • running npm install locally causes the posinstall script to run.

So: why does the dataloader running inside the universal image not have access to the build directory tree? Is it not built in the docker image? At the moment, I'm using my universal docker image and perhaps that is the problem. I'll look into it.

@klown
Copy link
Contributor

klown commented Oct 29, 2018

Actually, I think I see part of the problem. Line 6 of deleteAndLoadSnapsets.sh doesn't default to /app/build:

GPII_BUILD_DATA_DIR=${GPII_BUILD_DATA_DIR:-'/tmp/build/dbData'}

@stepanstipl
Copy link
Contributor Author

Yes, that's on purpose - the files in the build directory gets created when the convert script is run (node "${CONVERT_JS}" "${GPII_PREFERENCES_DATA_DIR}" "${GPII_BUILD_DATA_DIR}" snapset). i wanted to make clear disctinction between data that comes from the image build time (/app) and data that are generated when the dataloader runs and will not be preserved (/tmp).

You're right the build directory tree is created as part of postinstall, but it does not contain any files, just two empty dirs - snapset and user. The build directory can always be changed via the GPII_BUILD_DATA_DIR variable if needed, but I believe it behaves correctly atm. Or do you see any problem with the current behaviour?

@klown
Copy link
Contributor

klown commented Oct 29, 2018

You're right the build directory tree is created as part of postinstall, but it does not contain any files, just two empty dirs - snapset and user.

Starting with no build directory in existence, when I run npm install locally, the build directory tree is fully populated. There are a number of calls to node convertPrefs.js in the postinstall script that ought to be populating the snapset, user, and build/test/dbData/user directories.

I wanted to say that when I create the universal docker image, I see the same result. But, actually an error occurs with the postinstall script where it fails. I suspect this is a bug:

npm WARN lifecycle [email protected]~postinstall: cannot run in wd %s %s (wd=%s) [email protected] node scripts/browserifyTestDependency.js && node scripts/convertPrefs.js testData/preferences/ build/dbData/snapset/ snapset && node scripts/convertPrefs.js testData/preferences/ build/dbData/user/ user && node scripts/convertPrefs.js tests/data/preferences/ build/tests/dbData/ user /app

Double checking CI and the last test run on Fri shows the same error message -- look for WARN lifecycle... (Talk about picking at a thread and seeing unexpected stuff).

Still, you have a point about the distinction between preferences data from the docker build time vs. data generated when the dataloader runs. The latter has the latest versions of snapset preferences data. I think the question then becomes: should the build directory tree contain the latest snapset data after dataloader time? Does anything in universal expect the latest data to be in the build directory tree? Or is everything run off the database by this point, which has the latest because the dataloader updated it properly?

Any insights @cindyli?

@klown
Copy link
Contributor

klown commented Oct 29, 2018

@stepanstipl
Copy link
Contributor Author

I think that there's actually no need to do the conversion during the dataloader runtime as the Docker image is immutable, therefore the result should always be the same - both during build time and runtime (I guess it was necessary to do that when the dataloader was cloning the repo during runtime).

So if we fix the postinstall step, we should just be able to load the data directly from the build/dbData/snapset location and remove the snapset conversion step from dataloader.

I imagine the only difference would be in case you run the vagrantCloudBasedContainers.sh for local development without rebuilding the image, but you would modify the snapsets - in that case the changes made after the image is built would not be refleted.

@klown
Copy link
Contributor

klown commented Oct 29, 2018

There's another difference. A developer can, similarly, run deleteAndLoadSnapsets.sh locally, without any docker image. They can set their environment variables to point at the relevant folders in their copy of universal where they perhaps have added new "test" preferences since the last npm install. Or even reference other directories of their choosing. In these cases, they would want to run the conversion step.

Beyond fixing the postinstall step (still trying to figure that out) I think all that needs doing is set GPII_BUILD_DATA_DIR to default to /app/build.

@stepanstipl
Copy link
Contributor Author

I think all that needs doing is set GPII_BUILD_DATA_DIR to default to /app/build

@klown you mean change the default value of that variable in the dataloader (deleteAndLoadSnapsets.sh) script? In that case it would also load the user preferences generated by node scripts/convertPrefs.js testData/preferences/ build/dbData/user/ user in addition to what's loaded currently, is that desired? And would you still keep the conversion step in the dataloader script?

@klown
Copy link
Contributor

klown commented Oct 30, 2018

@klown you mean change the default value of that variable in the dataloader (deleteAndLoadSnapsets.sh) script?

No, my mistake. I meant that it should default value to /app/build/dbdata/snapset. Formally, change the line in the script to:
GPII_BUILD_DATA_DIR=${GPII_BUILD_DATA_DIR:-"${GPII_APP_DIR}/build/dbData/snapset"}

And would you still keep the conversion step in the dataloader script?

Yes, for development. Suppose a developer wants to make a tweak to one of the preferences in testData/preferences by making a copy of alice.json5, naming the copy alice2.json5, and modifying some of the preferences therein. Their intent is to test a new snapset locally. After making the copy, all they need to do is:
$ (GPII_COUCHDB_URL=$MY_COUCHDB_URL GPII_APP_DIR=. ./scripts/deleteAndLoadSnapsets.sh)

The developer quickly gets their new test preferences converted to a snapset and gpii-key, and added to their local copy of the database. They can test further from that point.

@stepanstipl stepanstipl deleted the GPII-3391-parametrise-force-update branch March 21, 2019 14:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants