Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Launch cloud experiments from CLI #9461

Closed
4 tasks
dberenbaum opened this issue May 16, 2023 · 4 comments
Closed
4 tasks

Epic: Launch cloud experiments from CLI #9461

dberenbaum opened this issue May 16, 2023 · 4 comments
Labels
discussion requires active participation to reach a conclusion

Comments

@dberenbaum
Copy link
Contributor

dberenbaum commented May 16, 2023

Summary / Background

Cloud experiments in Studio will allow for running experiments on cloud runners without CI. This could give DVC a way to launch remote experiments (similar to dvc machine). Thanks to @daavoo for raising the idea.

Scope

  • CLI interface to launch a Studio cloud experiment
  • Specify the machine config from a file or CLI
  • Save the machine config for reuse
  • Out of scope:
    • Monitoring, collecting, and reporting results of cloud experiments (dvc exp show does not need to track these)
    • Queuing cloud experiments or running a grid search (this could be a follow-up task)

Assumptions

  • Some Studio API will exist to which DVC can send the exp and machine info

Open Questions

  • How should everything be shared (dvc data cache, git stash with the dirty changes, machine info)?

Blockers / Dependencies

General Approach

Possible syntax:

$ dvc exp run --cloud mycloud

Alternatively, it could immediately start with the queue and be something like:

$ dvc queue start --cloud mycloud

Steps

Must have (p1)

  • Specify cloud machine spec
  • Studio API to send experiment info
  • Collect and post/push experiment info (params and code changes, machine spec, dvc data cache)
  • UI to start cloud experiment

Timelines

Need to discuss priority first

@dberenbaum dberenbaum added discussion requires active participation to reach a conclusion epic labels May 16, 2023
@dberenbaum
Copy link
Contributor Author

Another use for this would be that it could replace the current CI workflow. The CI script could use default runners and use dvc to launch the cloud experiment. The actual runner and experiment execution would be managed by Studio.

I'm working on the CI setup for Gitlab now, and it feels almost impossible to configure everything right to configure a self-hosted runner, execute the dvc experiment on it, and setup the right cml commands. Even in the alpha state, I would much rather work with cloud experiments even from within CI.

@daavoo
Copy link
Contributor

daavoo commented Jun 30, 2023

Another use for this would be that it could replace the current CI workflow. The CI script could use default runners and use dvc to launch the cloud experiment. The actual runner and experiment execution would be managed by Studio.

I'm working on the CI setup for Gitlab now, and it feels almost impossible to configure everything right to configure a self-hosted runner, execute the dvc experiment on it, and setup the right cml commands. Even in the alpha state, I would much rather work with cloud experiments even from within CI.

Yes, this was one of the main points I was also trying to make.

On top of being simpler to set up, I think it would be also:

  • More flexible
    In the current CML runner, you hardcode the full spec about the machine as part of the workflow file.
    Changing anything (i.e. larger instance type) between experiments requires editing the file or setting up a complicated parametrized workflow.
    Whichever option we decide to use for customizing when launching from CLI, it would be more flexible to change between experiments (i.e. pass as CLI args, even serialize config, etc)

  • Consistent environment / easier debugging
    Today, if you want to replicate the CI environment (i.e. to debug), is a tedious task: need to set up CML locally, ssh into the machine, and run the steps that run in the local runner. The alternative is to keep editing the workflow file and triggering workflows, which is not great UX.
    If you could run the cloud experiments from CLI, you could use and debug the same environment locally and in CI more easily.

@dberenbaum
Copy link
Contributor Author

@daavoo Another advantage of this approach for CI is related to #9612. You don't need to worry about setting up permissions to push to the repo from within the CI job if relying on cloud experiments to dvc exp push.

@dberenbaum
Copy link
Contributor Author

Closing in favor of https://github.com/iterative/studio/issues/8429.

@dberenbaum dberenbaum closed this as not planned Won't fix, can't repro, duplicate, stale Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion requires active participation to reach a conclusion
Projects
None yet
Development

No branches or pull requests

2 participants