Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds comet-ml plugin example #1702

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions examples/comet_ml_plugin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
(comet_ml)=

# Comet ML

```{eval-rst}
.. tags:: Integration, Data, Metrics, Intermediate
```

Comet’s machine learning platform integrates with your existing infrastructure and tools so you can manage, visualize, and optimize models—from training runs to production monitoring. This plugin integrates Flyte with Comet by configuring links between the two platforms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Comet’s machine learning platform integrates with your existing infrastructure and tools so you can manage, visualize, and optimize modelsfrom training runs to production monitoring. This plugin integrates Flyte with Comet by configuring links between the two platforms.
Comet’s machine learning platform integrates with your existing infrastructure and tools so you can manage, visualize, and optimize models from training runs to production monitoring. This plugin integrates Flyte with Comet by configuring links between the two platforms.


To install the plugin, run:

```bash
pip install flytekitplugins-comet-ml
```

Comet requires an API key to authenticate with their platform. In the above example, a secret is created using
[Flyte's Secrets manager](https://docs.flyte.org/en/latest/user_guide/productionizing/secrets.html).

To enable linking from the Flyte side panel to Comet.ml, add the following to Flyte's configuration:

```yaml
plugins:
logs:
dynamic-log-links:
- comet-ml-execution-id:
displayName: Comet
templateUris: {{ .taskConfig.host }}/{{ .taskConfig.workspace }}/{{ .taskConfig.project_name }}/{{ .executionName }}{{ .nodeId }}{{ .taskRetryAttempt }}{{ .taskConfig.link_suffix }}
- comet-ml-custom-id:
displayName: Comet
templateUris: {{ .taskConfig.host }}/{{ .taskConfig.workspace }}/{{ .taskConfig.project_name }}/{{ .taskConfig.experiment_key }}
```

```{auto-examples-toc}
comet_ml_example
```
Empty file.
156 changes: 156 additions & 0 deletions examples/comet_ml_plugin/comet_ml_plugin/comet_ml_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# %% [markdown]
# (comet_ml_example)=
#
# # Comet Example
# Comet’s machine learning platform integrates with your existing infrastructure and
# tools so you can manage, visualize, and optimize models—from training runs to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# tools so you can manage, visualize, and optimize modelsfrom training runs to
# tools so you can manage, visualize, and optimize models from training runs to

# production monitoring. This plugin integrates Flyte with Comet by configuring
# links between the two platforms.
import os
import os.path

from flytekit import (
ImageSpec,
Secret,
current_context,
task,
workflow,
)
from flytekit.types.directory import FlyteDirectory
from flytekitplugins.comet_ml import comet_ml_login

# %% [markdown]
# First, we specify the project and workspace that we will use with Comet's platform
# Please update `PROJECT_NAME` and `WORKSPACE` to the value associated with your account.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Please update `PROJECT_NAME` and `WORKSPACE` to the value associated with your account.
# Please update `PROJECT_NAME` and `WORKSPACE` to the values associated with your account.

# %%
PROJECT_NAME = "flytekit-comet-ml-v1"
WORKSPACE = "thomas-unionai"

# %% [markdown]
# W&B requires an API key to authenticate with Comet. In the above example,
# the secret is created using
# [Flyte's Secrets manager](https://docs.flyte.org/en/latest/user_guide/productionizing/secrets.html).
# %%
secret = Secret(key="comet-ml-key", group="comet-ml-group")

# %% [markdown]
# Next, we use `ImageSpec` to construct a container that contains the dependencies for this
# task:
# %%

REGISTRY = os.getenv("REGISTRY", "localhost:30000")
image = ImageSpec(
name="unionai",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as below, we should use ghcr.io/flyteorg

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ImageSpec, most users will not have push access to ghcr.io/flyteorg, so they will need to change the registry to run it.

Using localhost:30000 means I am optimizing for the sandbox use case. An alternative is to use os.getenv("REGISTRY", "localhost:30000"), and let it be an environment variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I like your idea

packages=[
"torch==2.3.1",
"comet-ml==3.43.2",
"lightning==2.3.0",
"flytekitplugins-comet-ml",
"torchvision",
],
registry=REGISTRY,
)


# %% [markdown]
# Here, we use a Flyte task to download the dataset and cache it:
# %%
@task(cache=True, cache_version="2", container_image=image)
def get_dataset() -> FlyteDirectory:
from torchvision.datasets import MNIST

ctx = current_context()
dataset_dir = os.path.join(ctx.working_directory, "datasetset")
os.makedirs(dataset_dir, exist_ok=True)

# Download training and evaluation dataset
MNIST(dataset_dir, train=True, download=True)
MNIST(dataset_dir, train=False, download=True)

return dataset_dir


# %%
# The `comet_ml_login` decorator calls `comet_ml.init` and configures it to use Flyte's
# execution id as the Comet's experiment key. The body of the task is PyTorch Lightning
# training code, where we pass `CometLogger` into the `Trainer`'s `logger`.
@task(
secret_requests=[secret],
container_image=image,
)
@comet_ml_login(
project_name=PROJECT_NAME,
workspace=WORKSPACE,
secret=secret,
)
def train_lightning(dataset: FlyteDirectory, hidden_layer_size: int):
import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from pytorch_lightning import Trainer
from pytorch_lightning.loggers import CometLogger
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST

class Model(pl.LightningModule):
def __init__(self, layer_size=784, hidden_layer_size=256):
super().__init__()
self.save_hyperparameters()
self.layers = torch.nn.Sequential(
torch.nn.Linear(layer_size, hidden_layer_size),
torch.nn.Linear(hidden_layer_size, 10),
)

def forward(self, x):
return torch.relu(self.layers(x.view(x.size(0), -1)))

def training_step(self, batch, batch_nb):
x, y = batch
loss = F.cross_entropy(self(x), y)
self.logger.log_metrics({"train_loss": loss}, step=batch_nb)
return loss

def validation_step(self, batch, batch_nb):
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
self.logger.log_metrics({"val_loss": loss}, step=batch_nb)
return loss

def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)

dataset.download()
train_ds = MNIST(dataset, train=True, download=False, transform=transforms.ToTensor())
eval_ds = MNIST(dataset, train=False, download=False, transform=transforms.ToTensor())
train_loader = DataLoader(train_ds, batch_size=32)
eval_loader = DataLoader(eval_ds, batch_size=32)

comet_logger = CometLogger()
comet_logger.log_hyperparams({"batch_size": 32})

model = Model(hidden_layer_size=hidden_layer_size)
trainer = Trainer(max_epochs=1, fast_dev_run=True, logger=comet_logger)
trainer.fit(model, train_loader, eval_loader)


@workflow
def main(hidden_layer_size: int = 32):
dataset = get_dataset()
train_lightning(dataset=dataset, hidden_layer_size=hidden_layer_size)


# %% [markdown]
# To enable dynamic log links, add plugin to Flyte's configuration file:
# ```yaml
# plugins:
# logs:
# dynamic-log-links:
# - comet-ml-execution-id:
# displayName: Comet
# templateUris: {{ .taskConfig.host }}/{{ .taskConfig.workspace }}/{{ .taskConfig.project_name }}/{{ .executionName }}{{ .nodeId }}{{ .taskRetryAttempt }}{{ .taskConfig.link_suffix }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add double quote here.

# - comet-ml-custom-id:
# displayName: Comet
# templateUris: {{ .taskConfig.host }}/{{ .taskConfig.workspace }}/{{ .taskConfig.project_name }}/{{ .taskConfig.experiment_key }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing

# ```
4 changes: 2 additions & 2 deletions examples/wandb_plugin/wandb_plugin/wandb_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,8 @@ def wf() -> float:
# dynamic-log-links:
# - wandb-execution-id:
# displayName: Weights & Biases
# templateUris: '{{ .taskConfig.host }}/{{ .taskConfig.entity }}/{{ .taskConfig.project }}/runs/{{ .executionName }}-{{ .nodeId }}-{{ .taskRetryAttempt }}'
# templateUris: "{{ .taskConfig.host }}/{{ .taskConfig.entity }}/{{ .taskConfig.project }}/runs/{{ .executionName }}-{{ .nodeId }}-{{ .taskRetryAttempt }}"
# - wandb-custom-id:
# displayName: Weights & Biases
# templateUris: '{{ .taskConfig.host }}/{{ .taskConfig.entity }}/{{ .taskConfig.project }}/runs/{{ .taskConfig.id }}'
# templateUris: "{{ .taskConfig.host }}/{{ .taskConfig.entity }}/{{ .taskConfig.project }}/runs/{{ .taskConfig.id }}"
# ```
Loading