Skip to content

Commit bc15998

Browse files
authored
modified wandb related files
1 parent 64540a7 commit bc15998

File tree

3 files changed

+182
-12
lines changed

3 files changed

+182
-12
lines changed

Diff for: README.md

+32-12
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,12 @@ This project shows how to realize MLOps in Git/GitHub. In order to achieve this
2121
4. Run `dvc add [ADDED FILE OR DIRECTORY]` to track your data with DVC
2222
5. Run `dvc remote add -d gdrive_storage gdrive://[ID of specific folder in gdrive]` to add Google Drive as the remote data storage
2323
6. Run `dvc push`, then URL to auth is provided. Copy and paste it to the browser, and autheticate
24-
7. Copy the content of `.dvc/tmp/gdrive-user-credentials.json` and put it as in [GitHub Secret](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository) with the name of `GDRIVE_CREDENTIALS`
24+
7. Copy the content of `.dvc/tmp/gdrive-user-credentials.json` and put it as in [GitHub Secret](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository) with the name of `GDRIVE_CREDENTIAL`
2525
8. Run `git add . && git commit -m "initial commit" && git push origin main` to keep the initial setup
2626
9. Write your own pipeline under `pipeline` directory. Codes for basic image classification in TensorFlow are provided initially.
2727
10. Run the following `dvc stage add` for training stage
2828
```bash
29+
# if you want to use Iterative Studio / DVCLive for tracking training progress
2930
$ dvc stage add -n train \
3031
-p train.train_size,train.batch_size,train.epoch,train.lr \
3132
-d pipeline/modeling.py -d pipeline/train.py -d data \
@@ -35,25 +36,44 @@ $ dvc stage add -n train \
3536
--plots-no-cache dvclive/scalars/eval/sparse_categorical_accuracy.tsv \
3637
-o outputs/model \
3738
python pipeline/train.py outputs/model
39+
40+
# if you want to use W&B for tracking training progress
41+
$ dvc stage add -n train \
42+
-p train.train_size,train.batch_size,train.epoch,train.lr \
43+
-d pipeline/modeling.py -d pipeline/train.py -d data \
44+
-o outputs/model \
45+
python pipeline/train.py outputs/model
3846
```
39-
10. Run the following `dvc stage add` for evaluate stage
47+
11. Run the following `dvc stage add` for evaluate stage
4048
```bash
49+
# if you want to use Iterative Studio / DVCLive for tracking training progress
4150
$ dvc stage add -n evaluate \
4251
-p evaluate.test,evaluate.batch_size \
4352
-d pipeline/evaluate.py -d data/test -d outputs/model \
4453
-M outputs/metrics.json \
4554
python pipeline/evaluate.py outputs/model
55+
56+
# if you want to use W&B for tracking training progress
57+
$ dvc stage add -n evaluate \
58+
-p evaluate.test,evaluate.batch_size \
59+
-d pipeline/evaluate.py -d data/test -d outputs/model \
60+
python pipeline/evaluate.py outputs/model
4661
```
47-
11. Update `params.yaml` as you need.
48-
12. Run `git add . && git commit -m "add initial pipeline setup" && git push origin main`
49-
13. Run `dvc repro` to run the pipeline initially
50-
14. Run `dvc add outputs/model.tar.gz` to add compressed version of model
51-
15. Run `dvc push outputs/model.tar.gz`
52-
16. Run `echo "/pipeline/__pycache__" >> .gitignore` to ignore unnecessary directory
53-
17. Run `git add . && git commit -m "add initial pipeline run" && git push origin main`
54-
18. Add access token and user email of [JarvisLabs.ai](https://jarvislabs.ai/) to GitHub Secret as `JARVISLABS_ACCESS_TOKEN` and `JARVISLABS_USER_EMAIL`
55-
19. Add GitHub access token to GitHub Secret as `GH_ACCESS_TOKEN`
56-
20. Create a PR and write `#train` as in comment (you have to be the onwer of the repo)
62+
12. Update `params.yaml` as you need.
63+
13. Run `git add . && git commit -m "add initial pipeline setup" && git push origin main`
64+
14. Run `dvc repro` to run the pipeline initially
65+
15. Run `dvc add outputs/model.tar.gz` to add compressed version of model
66+
16. Run `dvc push outputs/model.tar.gz`
67+
17. Run `echo "/pipeline/__pycache__" >> .gitignore` to ignore unnecessary directory
68+
18. Run `git add . && git commit -m "add initial pipeline run" && git push origin main`
69+
19. Add access token and user email of [JarvisLabs.ai](https://jarvislabs.ai/) to GitHub Secret as `JARVISLABS_ACCESS_TOKEN` and `JARVISLABS_USER_EMAIL`
70+
20. Add GitHub access token to GitHub Secret as `GH_ACCESS_TOKEN`
71+
21. Create a PR and write `#train` as in comment (you have to be the onwer of the repo)
72+
73+
### W&B Integration Setup
74+
75+
1. Add W&B's project name to GitHub Secret as `WANDB_PROJECT`
76+
2. Add W&B's API KEY to GitHub Secret as `WANDB_API_KEY`
5777

5878
### HuggingFace Integration Setup
5979

Diff for: pipeline/train_wandb.py

+99
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
import os
2+
import sys
3+
import glob
4+
import yaml
5+
import json
6+
import random
7+
import tarfile
8+
from pathlib import Path
9+
10+
import tensorflow as tf
11+
from tensorflow.keras.applications import resnet50
12+
13+
import modeling
14+
15+
import wandb
16+
from wandb.keras import WandbCallback
17+
18+
if len(sys.argv) != 2:
19+
sys.stderr.write("Arguments error. Usage:\n")
20+
sys.stderr.write("\tpython prepare.py data-file\n")
21+
sys.exit(1)
22+
23+
params = yaml.safe_load(open("params.yaml"))["train"]
24+
print(params)
25+
26+
train = 'data'/Path(params['train'])
27+
test = 'data'/Path(params['test'])
28+
output = Path(sys.argv[1])
29+
30+
_image_feature_description = {
31+
'image': tf.io.FixedLenFeature([], tf.string),
32+
'label': tf.io.FixedLenFeature([], tf.int64),
33+
}
34+
35+
def _parse_image_function(example_proto):
36+
features = tf.io.parse_single_example(example_proto, _image_feature_description)
37+
image = tf.io.decode_png(features['image'], channels=3) # tf.io.decode_raw(features['image'], tf.uint8)
38+
image = tf.image.resize(image, [224, 224])
39+
image = resnet50.preprocess_input(image)
40+
41+
label = tf.cast(features['label'], tf.int32)
42+
43+
return image, label
44+
45+
def _read_dataset(epochs, batch_size, channel):
46+
filenames = glob.glob(str(channel/'*.tfrecord'))
47+
dataset = tf.data.TFRecordDataset(filenames)
48+
49+
dataset = dataset.map(_parse_image_function, num_parallel_calls=4)
50+
dataset = dataset.prefetch(tf.data.AUTOTUNE)
51+
dataset = dataset.repeat(epochs)
52+
dataset = dataset.shuffle(buffer_size=10 * batch_size)
53+
dataset = dataset.batch(batch_size, drop_remainder=True)
54+
55+
return dataset
56+
57+
def make_tarfile(output_filename, source_dir):
58+
with tarfile.open(output_filename, "w:gz") as tar:
59+
tar.add(source_dir, arcname=os.path.basename(source_dir))
60+
61+
def run_train():
62+
project_name = os.environ["WANDB_PROJECT"]
63+
wandb_key = os.environ["WANDB_API_KEY"]
64+
wandb_run_name = os.environ["WANDB_RUN_NAME"]
65+
66+
wandb.login(
67+
anonymous="never",
68+
key=wandb_key
69+
)
70+
_ = wandb.init(project=project_name,
71+
config=params,
72+
name=wandb_run_name)
73+
74+
train_size = params['train_size']
75+
train_step_size = train_size // params['batch_size']
76+
77+
train_ds = _read_dataset(params['epoch'], params['batch_size'], train)
78+
test_ds = _read_dataset(params['epoch'], params['batch_size'], test)
79+
80+
wandbCallback = WandbCallback(training_data=train_ds,
81+
log_weights=(True), log_gradients=(True))
82+
83+
m = modeling._build_keras_model()
84+
m = modeling._compile(m, float(params['lr']))
85+
86+
m.fit(
87+
train_ds,
88+
epochs=params['epoch'],
89+
steps_per_epoch=train_step_size,
90+
validation_data=test_ds,
91+
callbacks=[wandbCallback])
92+
93+
m.save(output,
94+
save_format='tf',
95+
signatures=modeling._get_signature(m))
96+
97+
make_tarfile(f'{output}.tar.gz', output)
98+
99+
run_train()

Diff for: scripts/jl_exp_wandb.sh

+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
#!/bin/sh
2+
3+
# install gh cli
4+
apt install sudo
5+
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg
6+
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null
7+
sudo apt update
8+
sudo apt install gh
9+
10+
# grant gh access
11+
export GH_TOKEN='$GH_ACCESS_TOKEN'
12+
git config --global user.name "chansung"
13+
git config --global user.email "[email protected]"
14+
15+
# set W&B specific keys
16+
export WANDB_PROJECT='$WANDB_PROJECT'
17+
export WANDB_API_KEY='$WANDB_API_KEY'
18+
19+
# move to the repo
20+
git clone https://github.com/codingpot/git-mlops.git
21+
22+
# install dependencies
23+
cd git-mlops
24+
gh auth setup-git
25+
git checkout $CUR_BRANCH
26+
pip install -r requirements.txt
27+
pip install git+https://github.com/jarvislabsai/jlclient.git
28+
29+
# set Gdrive credential
30+
mkdir .dvc/tmp
31+
echo '$GDRIVE_CREDENTIAL' > .dvc/tmp/gdrive-user-credentials.json
32+
33+
# pull data
34+
dvc pull
35+
36+
export WANDB_RUN_NAME=$CUR_BRANCH
37+
dvc repro
38+
39+
exp_result=$(dvc exp show --only-changed --md)
40+
wandb_url="https://wandb.ai/codingpot/git-mlops"
41+
gh pr comment $CUR_PR_ID --body "[Visit W&B Log Page for this Pull Request]($wandb_url)"
42+
43+
git reset --hard
44+
45+
echo ${exp_ids[$idx]}
46+
echo ${exp_names[$idx]}
47+
dvc add outputs/model.tar.gz
48+
dvc push outputs/model.tar.gz
49+
50+
VM_ID=$(tail -n 2 /home/.jarviscloud/jarvisconfig | head -n 1)
51+
python clouds/jarvislabs.py vm destroy $CLOUD_AT $CLOUD_ID $VM_ID

0 commit comments

Comments
 (0)