Epic: Launch cloud experiments from CLI #9461

dberenbaum · 2023-05-16T18:39:33Z

Summary / Background

Cloud experiments in Studio will allow for running experiments on cloud runners without CI. This could give DVC a way to launch remote experiments (similar to dvc machine). Thanks to @daavoo for raising the idea.

Scope

CLI interface to launch a Studio cloud experiment
Specify the machine config from a file or CLI
Save the machine config for reuse
Out of scope:
- Monitoring, collecting, and reporting results of cloud experiments (dvc exp show does not need to track these)
- Queuing cloud experiments or running a grid search (this could be a follow-up task)

Assumptions

Some Studio API will exist to which DVC can send the exp and machine info

Open Questions

How should everything be shared (dvc data cache, git stash with the dirty changes, machine info)?

Blockers / Dependencies

https://github.com/iterative/studio/issues/4954

General Approach

Possible syntax:

$ dvc exp run --cloud mycloud

Alternatively, it could immediately start with the queue and be something like:

$ dvc queue start --cloud mycloud

Steps

Must have (p1)

Specify cloud machine spec
Studio API to send experiment info
Collect and post/push experiment info (params and code changes, machine spec, dvc data cache)
UI to start cloud experiment

Timelines

Need to discuss priority first

The text was updated successfully, but these errors were encountered:

dberenbaum · 2023-06-29T20:12:08Z

Another use for this would be that it could replace the current CI workflow. The CI script could use default runners and use dvc to launch the cloud experiment. The actual runner and experiment execution would be managed by Studio.

I'm working on the CI setup for Gitlab now, and it feels almost impossible to configure everything right to configure a self-hosted runner, execute the dvc experiment on it, and setup the right cml commands. Even in the alpha state, I would much rather work with cloud experiments even from within CI.

daavoo · 2023-06-30T07:47:01Z

Another use for this would be that it could replace the current CI workflow. The CI script could use default runners and use dvc to launch the cloud experiment. The actual runner and experiment execution would be managed by Studio.

I'm working on the CI setup for Gitlab now, and it feels almost impossible to configure everything right to configure a self-hosted runner, execute the dvc experiment on it, and setup the right cml commands. Even in the alpha state, I would much rather work with cloud experiments even from within CI.

Yes, this was one of the main points I was also trying to make.

On top of being simpler to set up, I think it would be also:

More flexible
In the current CML runner, you hardcode the full spec about the machine as part of the workflow file.
Changing anything (i.e. larger instance type) between experiments requires editing the file or setting up a complicated parametrized workflow.
Whichever option we decide to use for customizing when launching from CLI, it would be more flexible to change between experiments (i.e. pass as CLI args, even serialize config, etc)
Consistent environment / easier debugging
Today, if you want to replicate the CI environment (i.e. to debug), is a tedious task: need to set up CML locally, ssh into the machine, and run the steps that run in the local runner. The alternative is to keep editing the workflow file and triggering workflows, which is not great UX.
If you could run the cloud experiments from CLI, you could use and debug the same environment locally and in CI more easily.

dberenbaum · 2023-08-10T13:11:47Z

@daavoo Another advantage of this approach for CI is related to #9612. You don't need to worry about setting up permissions to push to the repo from within the CI job if relying on cloud experiments to dvc exp push.

dberenbaum · 2024-01-09T15:34:10Z

Closing in favor of https://github.com/iterative/studio/issues/8429.

dberenbaum added discussion requires active participation to reach a conclusion epic labels May 16, 2023

dberenbaum mentioned this issue Jun 30, 2023

example-get-started-experiments: add gitlab workflow iterative/example-repos-dev#211

Closed

dberenbaum closed this as not planned Won't fix, can't repro, duplicate, stale Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Launch cloud experiments from CLI #9461

Epic: Launch cloud experiments from CLI #9461

dberenbaum commented May 16, 2023 •

edited

Loading

dberenbaum commented Jun 29, 2023

daavoo commented Jun 30, 2023

dberenbaum commented Aug 10, 2023

dberenbaum commented Jan 9, 2024

Epic: Launch cloud experiments from CLI #9461

Epic: Launch cloud experiments from CLI #9461

Comments

dberenbaum commented May 16, 2023 • edited Loading

Summary / Background

Scope

Assumptions

Open Questions

Blockers / Dependencies

General Approach

Steps

Must have (p1)

Timelines

dberenbaum commented Jun 29, 2023

daavoo commented Jun 30, 2023

dberenbaum commented Aug 10, 2023

dberenbaum commented Jan 9, 2024

dberenbaum commented May 16, 2023 •

edited

Loading