-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
autogluon-bench POC #1
Conversation
src/autogluon/bench/cloud/aws/batch_stack/lambdas/lambda_function.py
Outdated
Show resolved
Hide resolved
|
||
setup_build_env | ||
|
||
black --check --diff src/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add test folder as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update after adding unit tests.
Dockerfile
Outdated
@@ -0,0 +1,11 @@ | |||
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.12.1-gpu-py38-cu116-ubuntu20.04-ec2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering why we need dockerfile for this repo? If needed, we shall have Dockerfiles for both GPU and CPU. Also the latest image for pytorch-training
is 1.13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By design, we want to run this repo in "local" mode on AWS batch instances. Is this the desired way to do it?
|
||
|
||
## Run benchmarkings on AWS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we codify the permission template? Not every user has the dev-user
role defined in the AWS account with the correct permission scope set up
LAMBDA_FUNCTION_NAME: ag-bench-test-job-function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 config are very specific to the underlying implementations? Ideally, we should hide the internal complexity from end users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design of providing custom VPC_NAME
is because we wanted to reuse vpc ideally. I can do it in a follow up PR to make it optional, and also auto generate LAMBDA_FUNCTION_NAME
. Created an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Make most of the config optional and provide default values. Power users should still be able to customize themselves
runbenchmarks.py
Outdated
if configs["module"] == "multimodal": | ||
benchmark = MultiModalBenchmark(benchmark_name=configs["benchmark_name"]) | ||
git_uri, git_branch = configs["git_uri#branch"].split("#") | ||
benchmark.setup(git_uri=git_uri, git_branch=git_branch) | ||
benchmark.run(data_path=configs["data_path"]) | ||
if configs.get("metrics_bucket", None): | ||
benchmark.upload_metrics(s3_bucket=configs["metrics_bucket"], s3_dir=f'{configs["module"]}/{benchmark.benchmark_name}') | ||
elif configs["module"] == "tabular": | ||
benchmark = TabularBenchmark( | ||
benchmark_name=configs["benchmark_name"], | ||
) | ||
benchmark.setup() | ||
benchmark.run( | ||
framework=f'{configs["framework"]}:{configs["label"]}', | ||
benchmark=configs["amlb_benchmark"], | ||
constraint=configs["amlb_constraint"], | ||
task=configs["amlb_task"] | ||
) | ||
if configs["metrics_bucket"] is not None: | ||
benchmark.upload_metrics(s3_bucket=configs["metrics_bucket"], s3_dir=f'{configs["module"]}/{benchmark.benchmark_name}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The steps involved for AutoMM and Tabular seems to be identical and the only difference is the class to initiate. If so, a cleaner way would be just load the corresponding class from somewhere (e.g. dict mapping from module name to class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main difference is the parameters to setup
and run
, but I extracted them out to a helper class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock this PR assuming issues will be addressed in follow-up PRs. Great work!
LAMBDA_FUNCTION_NAME: ag-bench-test-job-function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Make most of the config optional and provide default values. Power users should still be able to customize themselves
- name: Lint Check | ||
run: | | ||
chmod +x ./.github/workflow_scripts/lint_check.sh && ./.github/workflow_scripts/lint_check.sh | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: new lines still missing in multiple files
MultimodalPredictor
, and currently only supports one datasetMNIST
and default hyperparameters.