autogluon-bench POC #1

suzhoum · 2023-03-23T22:38:09Z

Supports local runs on single benchmarking task
Supports AWS runs on multiple benchmarking tasks with multiple instances
Tabular benchmarking is run on AMLB backend
Multimodal benchmarking is run on AutoGluon MultimodalPredictor, and currently only supports one dataset MNIST and default hyperparameters.

__pycache__/__init__.cpython-38.pyc

__pycache__/benchmark.cpython-38.pyc

.dockerignore

src/autogluon/bench/benchmark.py

src/autogluon/bench/frameworks/multimodal/exec.py

src/autogluon/bench/frameworks/multimodal/setup.sh

.dockerignore

.github/workflow_scripts/lint_check.sh

.github/workflows/continuous_integration.yml

setup.py

src/autogluon/bench/benchmark.py

src/autogluon/bench/cloud/aws/batch_stack/lambdas/lambda_function.py

tonyhoo · 2023-03-28T21:45:23Z

.github/workflow_scripts/lint_check.sh

+
+setup_build_env
+
+black --check --diff src/


need to add test folder as well

Will update after adding unit tests.

tonyhoo · 2023-03-28T21:46:02Z

Dockerfile

@@ -0,0 +1,11 @@
+FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.12.1-gpu-py38-cu116-ubuntu20.04-ec2


wondering why we need dockerfile for this repo? If needed, we shall have Dockerfiles for both GPU and CPU. Also the latest image for pytorch-training is 1.13

By design, we want to run this repo in "local" mode on AWS batch instances. Is this the desired way to do it?

README.md

tonyhoo · 2023-03-28T21:52:47Z

README.md

+
+
+## Run benchmarkings on AWS


Shall we codify the permission template? Not every user has the dev-user role defined in the AWS account with the correct permission scope set up

tonyhoo · 2023-03-28T21:53:47Z

README.md

These 2 config are very specific to the underlying implementations? Ideally, we should hide the internal complexity from end users

The design of providing custom VPC_NAME is because we wanted to reuse vpc ideally. I can do it in a follow up PR to make it optional, and also auto generate LAMBDA_FUNCTION_NAME. Created an issue.

+1. Make most of the config optional and provide default values. Power users should still be able to customize themselves

pyproject.toml

runbenchmarks.py

tonyhoo · 2023-03-28T22:07:40Z

runbenchmarks.py

+    if configs["module"] == "multimodal":
+        benchmark = MultiModalBenchmark(benchmark_name=configs["benchmark_name"])
+        git_uri, git_branch = configs["git_uri#branch"].split("#")
+        benchmark.setup(git_uri=git_uri, git_branch=git_branch)
+        benchmark.run(data_path=configs["data_path"])
+        if configs.get("metrics_bucket", None):
+            benchmark.upload_metrics(s3_bucket=configs["metrics_bucket"], s3_dir=f'{configs["module"]}/{benchmark.benchmark_name}')
+    elif configs["module"] == "tabular":
+        benchmark = TabularBenchmark(
+            benchmark_name=configs["benchmark_name"], 
+        )
+        benchmark.setup()
+        benchmark.run(
+            framework=f'{configs["framework"]}:{configs["label"]}',
+            benchmark=configs["amlb_benchmark"],
+            constraint=configs["amlb_constraint"],
+            task=configs["amlb_task"]
+        )
+        if configs["metrics_bucket"] is not None:
+            benchmark.upload_metrics(s3_bucket=configs["metrics_bucket"], s3_dir=f'{configs["module"]}/{benchmark.benchmark_name}')


The steps involved for AutoMM and Tabular seems to be identical and the only difference is the class to initiate. If so, a cleaner way would be just load the corresponding class from somewhere (e.g. dict mapping from module name to class)

The main difference is the parameters to setup and run, but I extracted them out to a helper class.

runbenchmarks.py

yinweisu

Approving to unblock this PR assuming issues will be addressed in follow-up PRs. Great work!

yinweisu · 2023-03-29T21:16:29Z

README.md

+1. Make most of the config optional and provide default values. Power users should still be able to customize themselves

yinweisu · 2023-03-29T21:16:57Z

.github/workflows/continuous_integration.yml

+      - name: Lint Check
+        run: |
+          chmod +x ./.github/workflow_scripts/lint_check.sh && ./.github/workflow_scripts/lint_check.sh
+


nit: new lines still missing in multiple files

runbenchmarks.py

suzhoum requested a review from gidler March 23, 2023 22:38

suzhoum force-pushed the poc_0.0.1 branch from d0ec414 to 65f4ebb Compare March 23, 2023 22:51

suzhoum added 7 commits March 23, 2023 23:49

poc

f10ed7d

v2

7680073

upload to s3

edecc44

aws batch integration

c819267

add configs

69d07ba

use config files and add lambda support for aws batch

eec60d5

add PR template

3e83abf

suzhoum force-pushed the poc_0.0.1 branch from 65f4ebb to ab18bdc Compare March 23, 2023 23:49

add CI lint

a31cf34

suzhoum force-pushed the poc_0.0.1 branch from ab18bdc to a31cf34 Compare March 24, 2023 00:09

suzhoum requested a review from tonyhoo March 24, 2023 17:08

gidler reviewed Mar 24, 2023

View reviewed changes

yinweisu suggested changes Mar 24, 2023

View reviewed changes

suzhoum force-pushed the poc_0.0.1 branch from 28d0d6a to 029ec54 Compare March 27, 2023 17:14

tonyhoo reviewed Mar 28, 2023

View reviewed changes

suzhoum added 2 commits March 29, 2023 16:17

lint and fixes

b25dfcd

fix lint

c3a6d37

suzhoum force-pushed the poc_0.0.1 branch from 029ec54 to 1f99539 Compare March 29, 2023 17:08

refactor

f8b2a15

suzhoum force-pushed the poc_0.0.1 branch from 1f99539 to f8b2a15 Compare March 29, 2023 17:39

suzhoum requested review from yinweisu, gidler and tonyhoo March 29, 2023 17:44

yinweisu approved these changes Mar 29, 2023

View reviewed changes

suzhoum merged commit ac3e721 into master Mar 29, 2023

suzhoum deleted the poc_0.0.1 branch April 20, 2023 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autogluon-bench POC #1

autogluon-bench POC #1

suzhoum commented Mar 23, 2023

tonyhoo Mar 28, 2023

suzhoum Mar 29, 2023

tonyhoo Mar 28, 2023

suzhoum Mar 29, 2023

tonyhoo Mar 28, 2023

tonyhoo Mar 28, 2023

suzhoum Mar 29, 2023 •

edited

Loading

yinweisu Mar 29, 2023

tonyhoo Mar 28, 2023

suzhoum Mar 29, 2023

yinweisu left a comment

yinweisu Mar 29, 2023

yinweisu Mar 29, 2023

		@@ -0,0 +1,11 @@
		FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.12.1-gpu-py38-cu116-ubuntu20.04-ec2

autogluon-bench POC #1

autogluon-bench POC #1

Conversation

suzhoum commented Mar 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suzhoum Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yinweisu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suzhoum Mar 29, 2023 •

edited

Loading