Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autogluon-bench POC #1

Merged
merged 11 commits into from
Mar 29, 2023
Merged

autogluon-bench POC #1

merged 11 commits into from
Mar 29, 2023

Conversation

suzhoum
Copy link
Collaborator

@suzhoum suzhoum commented Mar 23, 2023

  1. Supports local runs on single benchmarking task
  2. Supports AWS runs on multiple benchmarking tasks with multiple instances
  3. Tabular benchmarking is run on AMLB backend
  4. Multimodal benchmarking is run on AutoGluon MultimodalPredictor, and currently only supports one dataset MNIST and default hyperparameters.

__pycache__/__init__.cpython-38.pyc Outdated Show resolved Hide resolved
__pycache__/benchmark.cpython-38.pyc Outdated Show resolved Hide resolved
.dockerignore Outdated Show resolved Hide resolved
src/autogluon/bench/benchmark.py Show resolved Hide resolved
src/autogluon/bench/frameworks/multimodal/setup.sh Outdated Show resolved Hide resolved
.dockerignore Outdated Show resolved Hide resolved
.github/workflow_scripts/lint_check.sh Outdated Show resolved Hide resolved
.github/workflows/continuous_integration.yml Outdated Show resolved Hide resolved
.github/workflows/continuous_integration.yml Show resolved Hide resolved
.github/workflows/continuous_integration.yml Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
src/autogluon/bench/benchmark.py Outdated Show resolved Hide resolved

setup_build_env

black --check --diff src/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add test folder as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update after adding unit tests.

Dockerfile Outdated
@@ -0,0 +1,11 @@
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.12.1-gpu-py38-cu116-ubuntu20.04-ec2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering why we need dockerfile for this repo? If needed, we shall have Dockerfiles for both GPU and CPU. Also the latest image for pytorch-training is 1.13

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design, we want to run this repo in "local" mode on AWS batch instances. Is this the desired way to do it?

README.md Show resolved Hide resolved


## Run benchmarkings on AWS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we codify the permission template? Not every user has the dev-user role defined in the AWS account with the correct permission scope set up

Comment on lines +63 to +64
LAMBDA_FUNCTION_NAME: ag-bench-test-job-function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 config are very specific to the underlying implementations? Ideally, we should hide the internal complexity from end users

Copy link
Collaborator Author

@suzhoum suzhoum Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design of providing custom VPC_NAME is because we wanted to reuse vpc ideally. I can do it in a follow up PR to make it optional, and also auto generate LAMBDA_FUNCTION_NAME. Created an issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Make most of the config optional and provide default values. Power users should still be able to customize themselves

pyproject.toml Outdated Show resolved Hide resolved
runbenchmarks.py Show resolved Hide resolved
runbenchmarks.py Outdated
Comment on lines 24 to 43
if configs["module"] == "multimodal":
benchmark = MultiModalBenchmark(benchmark_name=configs["benchmark_name"])
git_uri, git_branch = configs["git_uri#branch"].split("#")
benchmark.setup(git_uri=git_uri, git_branch=git_branch)
benchmark.run(data_path=configs["data_path"])
if configs.get("metrics_bucket", None):
benchmark.upload_metrics(s3_bucket=configs["metrics_bucket"], s3_dir=f'{configs["module"]}/{benchmark.benchmark_name}')
elif configs["module"] == "tabular":
benchmark = TabularBenchmark(
benchmark_name=configs["benchmark_name"],
)
benchmark.setup()
benchmark.run(
framework=f'{configs["framework"]}:{configs["label"]}',
benchmark=configs["amlb_benchmark"],
constraint=configs["amlb_constraint"],
task=configs["amlb_task"]
)
if configs["metrics_bucket"] is not None:
benchmark.upload_metrics(s3_bucket=configs["metrics_bucket"], s3_dir=f'{configs["module"]}/{benchmark.benchmark_name}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps involved for AutoMM and Tabular seems to be identical and the only difference is the class to initiate. If so, a cleaner way would be just load the corresponding class from somewhere (e.g. dict mapping from module name to class)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference is the parameters to setup and run, but I extracted them out to a helper class.

runbenchmarks.py Show resolved Hide resolved
Copy link

@yinweisu yinweisu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock this PR assuming issues will be addressed in follow-up PRs. Great work!

Comment on lines +63 to +64
LAMBDA_FUNCTION_NAME: ag-bench-test-job-function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Make most of the config optional and provide default values. Power users should still be able to customize themselves

- name: Lint Check
run: |
chmod +x ./.github/workflow_scripts/lint_check.sh && ./.github/workflow_scripts/lint_check.sh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new lines still missing in multiple files

runbenchmarks.py Show resolved Hide resolved
runbenchmarks.py Show resolved Hide resolved
@suzhoum suzhoum merged commit ac3e721 into master Mar 29, 2023
@suzhoum suzhoum deleted the poc_0.0.1 branch April 20, 2023 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants