forked from datacommonsorg/website
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathrun_cdc_dev_docker.sh
More file actions
executable file
·554 lines (496 loc) · 19.1 KB
/
run_cdc_dev_docker.sh
File metadata and controls
executable file
·554 lines (496 loc) · 19.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
#!/bin/bash
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# See documentation in the help function below.
help() {
cat << EOF
Usage:
./run_cdc_dev_docker.sh [--env_file|-e <env.list file path>]
[--actions|-a run|build_run|build|build_upload|upload] [--container|-c all|service]
[--release|-r latest|stable] [--image|-i <custom image name:tag>]
[--package|-p <package name:tag>] [--schema_update|-s]
If no options are set, the default is '--env_file $PWD/custom_dc/env.list --actions run --container all --release stable'
All containers are run, using the Data Commons-provided 'stable' image.
Some of these options are mutually exclusive and some are required depending on
the setting of the '--actions' option. Here is a high-level summary of valid
combinations:
-a build_run [-c all|service] [-r latest|stable] -i <custom image name:tag>
-a build -i <custom image name:tag>
-a build_upload|upload -i <custom image name:tag> [-p <package name:tag>]
-a run|build_run [-c all] [-r latest|stable] [-i <custom image name:tag>] -s
Note: If you are running in "hybrid" mode, the only valid options are:
./run_cdc_dev_docker.sh [--env_file|-e <env.list file path>] [--actions|-a run|build_run]
[--image|-i <custom image name:tag>]
./run_cdc_dev_docker.sh [--env_file|-e <env.list file path>] [--actions|-a run]
[--schema_update|-s]
All others will be ignored. The script will infer the correct container based on the
env.list directory settings and/or the 'build_run' setting.
Options:
--env_file|-e <path to env.list file>
Optional: The path and file name of the environment file env.list.
Default: custom_dc/env.list
Use this option to maintain multiple alternate environment files with different
settings and directories (helpful for testing).
--actions|-a
Optional: The different Docker commands to run.
Default: run: Runs containers, using a prebuilt release or a custom build.
Other options are:
* build: Only build a custom service image, don't run any containers.
* build_run: Build a custom service image and run containers.
* build_upload: Build a custom service image and upload it to Google Cloud
(no containers run).
* upload: Upload a previously built image (no containers run).
With all these options, you must also specify '--image' with the (source)
image name and tag.
-container|-c all|service|data
Optional: The containers to run.
Default with 'run' and 'build_run': all: Run all containers. Other options are:
* service: Only run the service container. You can use this if you have not made
any changes to your data, or you are only running the service container
locally (with the data container in the cloud) Exclusive with '--schema-update'.
* data: Only run the data container. This is only valid if you are running the
data container locally (with the service container in the cloud).
Only valid with 'run'. Ignored otherwise.
For "hybrid" setups, the script will infer the correct container to run from the
env.list file; this setting will be ignored.
--release|-r stable|latest
Optional with 'run' and 'build_run'.
Default: stable: run the prebuilt 'stable' image provided by Data Commons team.
Other options:
* latest: Run the 'latest' release provided by Data Commons team.
If you specify this with an additional '--image' option, the option applies
only to the data container. Otherise, it applies to both containers.
Only valid with 'run' and 'build_run'. Ignored otherwise.
--image|-i <custom image name:tag>
Optional with 'run': the name and tag of the custom image to run in the service
container.
Required for all other actions: the name and tag of the custom image to build/run/upload.
--package|-p <target package name:tag>
Optional: The target image to be created and uploaded to Google Cloud.
Default: same as the name and tag provided in the '--image' option.
Only valid with 'build_upload' and 'upload'. Ignored otherwise.
--schema_update|-s
Optional. In the rare case that you get a 'SQL checked failed' error in
your running service, you can set this to run the data container in
schema update mode, which skips embeddings generation and completes much faster.
Only valid with 'run' and 'build_run' actions and 'all' or 'data' containers.
Ignored otherwise.
Examples:
./run_cdc_dev_docker.sh
Start all containers, using the prebuilt 'stable' release from Data Commons team.
./run_cdc_dev_docker.sh --container service --release latest
Start only the service container, using the prebuilt latest release.
Use this if you haven't made any changes to your data but just want to pick
up the latest code.
./run_cdc_dev_docker.sh --image my-datacommons:dev
Start all containers, using a custom-built image for the service container.
./run_cdc_dev_docker.sh --actions build --image my-datacommons:dev
Build a custom image only, and don't start any containers. Use this if you are
building a custom image that you will upload and test later in Google Cloud.
./run_cdc_dev_docker.sh --actions build_run --image my-datacommons:dev --container service
Build a custom image and only start the service. Use this if you haven't made any
changes to your data but are developing your custom site.
./run_cdc_dev_docker.sh --actions build_upload --image my-datacommons:dev
Build a custom image, create a package with the same name and tag, and upload
it to the Cloud Artifact Registry. Does not start any local containers.
For more details, see https://docs.datacommons.org/custom_dc/
EOF
}
cd $(dirname "$0")
source scripts/utils.sh
set -e
# Build custom image
build() {
log_notice "Starting Docker build of '$IMAGE'. This will take several minutes..."
docker build --tag $IMAGE \
-f build/cdc_services/Dockerfile .
}
# Package and push custom image to GCP
upload() {
check_app_credentials
# Need Docker credentials for running docker tag to create an Artifact Registry package
get_docker_credentials
log_notice "Creating package '$GOOGLE_CLOUD_REGION-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT/$GOOGLE_CLOUD_PROJECT-artifacts/${PACKAGE}'..."
docker tag ${IMAGE} ${GOOGLE_CLOUD_REGION}-docker.pkg.dev/${GOOGLE_CLOUD_PROJECT}/${GOOGLE_CLOUD_PROJECT}-artifacts/${PACKAGE}
# Need principal account credentials to run docker push.
check_gcloud_credentials
log_notice "Uploading package to Google Artifact Registry. This will take several minutes..."
docker push ${GOOGLE_CLOUD_REGION}-docker.pkg.dev/${GOOGLE_CLOUD_PROJECT}/${GOOGLE_CLOUD_PROJECT}-artifacts/${PACKAGE}
}
# Run data container
run_data() {
if [ "$RELEASE" == "latest" ]; then
docker pull gcr.io/datcom-ci/datacommons-data:latest
fi
schema_update='""'
schema_update_text=""
if [ "$SCHEMA_UPDATE" == true ]; then
schema_update="-e DATA_UPDATE_MODE=schemaupdate"
schema_update_text=" in schema update mode"
fi
if [ "$data_hybrid" == true ]; then
check_app_credentials
log_notice "Starting Docker data container with '$RELEASE' release${schema_update_text} and writing output to Google Cloud..."
docker run -it \
--env-file "$ENV_FILE" \
${schema_update//\"/} \
-e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
-v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
-v $INPUT_DIR:$INPUT_DIR \
gcr.io/datcom-ci/datacommons-data:${RELEASE}
else
log_notice "Starting Docker data container with '$RELEASE' release${schema_update_text}..."
docker run -it \
--env-file "$ENV_FILE" \
${schema_update//\"/} \
-v $INPUT_DIR:$INPUT_DIR \
-v $OUTPUT_DIR:$OUTPUT_DIR \
gcr.io/datcom-ci/datacommons-data:${RELEASE}
fi
}
# Run service container
run_service() {
if [ "$service_hybrid" == true ]; then
check_app_credentials
# Custom-built image
if [ -n "$IMAGE" ]; then
log_notice "Starting Docker services container with custom image '${IMAGE}' reading data in Google Cloud..."
docker run -it \
--env-file "$ENV_FILE" \
-p 8080:8080 \
-e DEBUG=true \
-e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
-v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
-v $PWD/server/templates/custom_dc/$CUSTOM_DIR:/workspace/server/templates/custom_dc/$CUSTOM_DIR \
-v $PWD/static/custom_dc/$CUSTOM_DIR:/workspace/static/custom_dc/$CUSTOM_DIR \
$IMAGE
# Data Commons-released images
else
if [ "$RELEASE" == "latest" ]; then
docker pull gcr.io/datcom-ci/datacommons-services:latest
fi
log_notice "Starting Docker services container with '${RELEASE}' release reading data in Google Cloud..."
docker run -it \
--env-file "$ENV_FILE" \
-p 8080:8080 \
-e DEBUG=true \
-e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
-v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
-v $PWD/server/templates/custom_dc/$CUSTOM_DIR:/workspace/server/templates/custom_dc/$CUSTOM_DIR \
gcr.io/datcom-ci/datacommons-services:${RELEASE}
fi
# Regular mode
else
# Custom-built image
if [ -n "$IMAGE" ]; then
log_notice "Starting Docker services container with custom image '${IMAGE}'..."
docker run -it \
--env-file "$ENV_FILE" \
-p 8080:8080 \
-e DEBUG=true \
-v $INPUT_DIR:$INPUT_DIR \
-v $OUTPUT_DIR:$OUTPUT_DIR \
-v $PWD/server/templates/custom_dc/$CUSTOM_DIR:/workspace/server/templates/custom_dc/$CUSTOM_DIR \
-v $PWD/static/custom_dc/$CUSTOM_DIR:/workspace/static/custom_dc/$CUSTOM_DIR \
"$IMAGE"
# Data Commons-released images
else
if [ "$RELEASE" == "latest" ]; then
docker pull gcr.io/datcom-ci/datacommons-services:latest
fi
log_notice "Starting Docker services container with '${RELEASE}' release..."
docker run -it \
--env-file "$ENV_FILE" \
-p 8080:8080 \
-e DEBUG=true \
-v $INPUT_DIR:$INPUT_DIR \
-v $OUTPUT_DIR:$OUTPUT_DIR \
-v $PWD/server/templates/custom_dc/$CUSTOM_DIR:/workspace/server/templates/custom_dc/$CUSTOM_DIR \
gcr.io/datcom-ci/datacommons-services:${RELEASE}
fi
fi
}
# Functions for checking GCP credentials
############################################################
# Check application default credentials. Needed for hybrid setups and docker tag/push.
check_app_credentials() {
log_notice "Checking for valid Cloud application default credentials..."
# Attempt to print the access token
gcloud auth application-default print-access-token > /dev/null
exit_status=$?
if [ ${exit_status} -eq 0 ]; then
log_notice "GCP application default credentials are valid."
return 0
# If they're not, the gcloud auth application-default login program will take over
fi
}
# Get credentials to authenticate Docker to GCP. Needed for docker tag
get_docker_credentials() {
log_notice "Getting credentials for Cloud Docker package..."
gcloud auth configure-docker ${GOOGLE_CLOUD_REGION}-docker.pkg.dev
exit_status=$?
if [ ${exit_status} -eq 0 ]; then
return 0
fi
}
# Check the user's/service account's credentials to authorize
# gcloud to access GCP. Needed for docker push.
check_gcloud_credentials() {
log_notice "Checking for valid gcloud credentials..."
# Attempt to print the identity token
gcloud auth print-identity-token > /dev/null
exit_status=$?
if [ ${exit_status} -eq 0 ]; then
log_notice "gcloud credentials are valid."
return 0
# If they're not, the gcloud auth login program will take over
fi
}
# Begin execution
#######################################################
# Initialize variables for optional settings
ENV_FILE="$PWD/custom_dc/env.list"
ACTIONS="run"
CONTAINER="all"
RELEASE="stable"
SCHEMA_UPDATE=false
IMAGE=""
PACKAGE=""
# Helper to parse arguments (handles both --opt=val and --opt val)
# Echoes "value|shift_count" to stdout
#
# IMPORTANT: This function is ONLY for options that REQUIRE a value.
# Do NOT use this for boolean flags (like -d or -h). If you do, it will
# incorrectly attempt to consume the next argument as a value.
#
# Arguments:
# $1: current_arg (the flag being parsed, e.g. "-e" or "--env_file=foo")
# $2: next_arg (the argument following the flag in the command line)
# $3: remaining_count (total number of arguments remaining in "$@")
#
# Logic:
# - If current_arg contains '=', value is extracted from it. shift_count is 1.
# - If not, it checks if next_arg exists and is NOT a flag (doesn't start with -).
# If valid, value is next_arg and shift_count is 2.
# - If next_arg is missing or is another flag, exits with error.
parse_arg() {
local current_arg="$1"
local next_arg="$2"
local remaining_count="$3"
local val
local shift_count
if [[ "$current_arg" == *"="* ]]; then
val="${current_arg#*=}"
shift_count=1
else
if [[ $remaining_count -lt 2 || "$next_arg" == -* ]]; then
log_error "Option $current_arg requires an argument."
exit 1
fi
val="$next_arg"
shift_count=2
fi
echo "$val|$shift_count"
}
# Parse command-line options
while [[ $# -gt 0 ]]; do
case "$1" in
-e | --env_file | --env_file=*)
parsed=$(parse_arg "$1" "$2" "$#")
val="${parsed%|*}"
shift_count="${parsed#*|}"
shift $shift_count
if [ -f "$val" ]; then
ENV_FILE="$val"
else
log_error "Error parsing --env_file: File '$val' does not exist.\nPlease specify a valid path and file name."
exit 1
fi
;;
-a | --actions | --actions=*)
parsed=$(parse_arg "$1" "$2" "$#")
val="${parsed%|*}"
shift_count="${parsed#*|}"
shift $shift_count
if [[ "$val" =~ ^(run|build|build_run|build_upload|upload)$ ]]; then
ACTIONS="$val"
else
log_error "That is not a valid action. Valid options are:\nrun\nbuild\nbuild_run\nbuild_upload\nupload\n"
exit 1
fi
;;
-c | --container | --container=*)
parsed=$(parse_arg "$1" "$2" "$#")
val="${parsed%|*}"
shift_count="${parsed#*|}"
shift $shift_count
if [[ "$val" =~ ^(all|service|data)$ ]]; then
CONTAINER="$val"
else
log_error "That is not a valid container option. Valid options are 'all' or 'service' or 'data'\n"
exit 1
fi
;;
-r | --release | --release=*)
parsed=$(parse_arg "$1" "$2" "$#")
val="${parsed%|*}"
shift_count="${parsed#*|}"
shift $shift_count
if [[ "$val" =~ ^(latest|stable)$ ]]; then
RELEASE="$val"
else
log_error "That is not a valid release option. Valid options are 'stable' or 'latest'\n"
exit 1
fi
;;
-i | --image | --image=*)
parsed=$(parse_arg "$1" "$2" "$#")
val="${parsed%|*}"
shift_count="${parsed#*|}"
shift $shift_count
if [[ "$val" =~ ^(latest|stable)$ ]]; then
log_error "That is not a valid custom image name. Did you mean to use the '--release' option?\n"
exit 1
else
IMAGE="$val"
fi
;;
-s | --schema_update)
SCHEMA_UPDATE=true
shift
;;
-p | --package | --package=*)
parsed=$(parse_arg "$1" "$2" "$#")
val="${parsed%|*}"
shift_count="${parsed#*|}"
shift $shift_count
PACKAGE="$val"
;;
-h | --help)
help
exit 0
;;
-d | --debug)
set -x
shift
;;
--)
shift
break
;;
*)
log_error "Invalid input: $1"
log_error "Please try again. See '--help' for correct usage."
exit 1
;;
esac
done
# Get options from the selected env.list file
source "$ENV_FILE"
# Set variables for hybrid mode
#----------------------------------------------------
# Determine hybrid mode and set a variable to true for use throughout the script
if [[ "$INPUT_DIR" == *"gs://"* ]] && [[ "$OUTPUT_DIR" == *"gs://"* ]]; then
service_hybrid=true
elif [[ "$INPUT_DIR" != *"gs://"* ]] && [[ "$OUTPUT_DIR" == *"gs://"* ]]; then
data_hybrid=true
elif [[ "$INPUT_DIR" == *"gs://"* ]] && [[ "$OUTPUT_DIR" != *"gs://"* ]]; then
log_error "Invalid data directory settings in env.list file. Please set your OUTPUT_DIR to a Cloud Path or your INPUT_DIR to a local path.\n"
exit 1
fi
# Handle various error conditions
#######################################################
# Missing variables needed to construct Docker commands
#============================================================
# Missing variables in env.list file
#-------------------------------------------------------------
# Needed for docker run -v option
if [ -z "$INPUT_DIR" ] || [ -z "$OUTPUT_DIR" ]; then
log_error "Missing input or output data directories.\nPlease set 'INPUT_DIR' and 'OUTPUT_DIR' in your env.list file.\n"
exit 1
fi
# Needed for docker tag and push
if [[ ( "$ACTIONS" == *"upload"* ) && ( -z "$GOOGLE_CLOUD_PROJECT" || -z "$GOOGLE_CLOUD_REGION" )]]; then
log_error "Missing GCP project and region settings.\nPlease set 'GOOGLE_CLOUD_PROJECT' and/or 'GOOGLE_CLOUD_REGION' in your env.list file.\n"
exit 1
fi
# Missing variables from input
#-------------------------------------------------------------
# Missing required custom image for build and upload
if [ "$ACTIONS" != "run" ] && [ -z "$IMAGE" ]; then
log_error "Missing an image name and tag for build and/or upload.\nPlease use the -'-image' or '-i' option with the name and tag of the custom image you are building or have already built.\n"
exit 1
fi
# Missing package for upload; not an error, just info
if [[ "$ACTIONS" == *"upload"* ]] && [ -z "$PACKAGE" ]; then
log_notice "No '--package' option specified."
log_notice "The target image will use the same name and tag as the source image '$IMAGE'.\n"
sleep 3
# Assign image name
PACKAGE=$IMAGE
fi
# Handle invalid option combinations and reset to valid (most are silently
# ignored and handled by the case statement)
#--------------------------------------------------------------------
if [ "$data_hybrid" == true ]; then
ACTIONS="run"
CONTAINER="data"
elif [ "$SCHEMA_UPDATE" == true ]; then
CONTAINER="all"
fi
if [ "$service_hybrid" == true ]; then
if [ "$ACTIONS" != "run" ] && [ "$ACTIONS" != "build_run" ]; then
log_error "Invalid action for running in "hybrid" service mode.\n Valid options are 'run' or 'build_run'.\n"
exit 1
fi
if [ -n "$IMAGE" ]; then
RELEASE=''
fi
CONTAINER="service"
fi
# Call Docker commands
######################################
case "$ACTIONS" in
"build")
build
;;
"build_run")
build
if [ "$CONTAINER" == "service" ]; then
run_service
else
run_data
run_service
fi
;;
"upload")
upload
;;
"build_upload")
build
upload
;;
"run")
if [ "$CONTAINER" == "service" ]; then
run_service
elif [ "$CONTAINER" == "data" ]; then
run_data
else
run_data
run_service
fi
;;
esac
exit 0