Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
Description of changes:
The current version has warmpool implemented, hence if
max_concurrent_jobs
are 30 and 90 jobs are launched, the remaining 60 will be queued and SageMaker instances will be reused.This has been tested on
D244_F3_C1530_30
over all available folds[0, 1, 2]
, forLightGBM_c1_BAG_L1_Reproduced_AWS
configmax_concurrent_jobs
should be less than account limit which is 34 for nowTo setup and run:
tabflow
folder into a parent directory - NOTE this parent directory must also containtabrepo
,autogluon-benchmark
andautogluon-bench
folders, make sure all 3 are installed before installingtabflow
autogluon
ortabrepo
, you will need to re-build the image - navigate to parent folder followed bytabflow/docker
, and run./build_docker.sh {ecr_repo_name} {tag} {source_account} {target_account} {region}
- AWS credentials requiredlaunch_jobs.py
: like entering yourdocker image URI
which you just pushed to ECR, make the change here -DOCKER_IMAGE_ALIASES
. (I plan to make these as args in the future edits)pip install tabflow
evaluate.py
Example:
To run one or several datasets over certain folds (datasets and folds are space separated)
tabflow --datasets Australian --folds 0 1 --methods_file ~/method_configs.yaml --s3_bucket test-bucket --experiment_name test-experiment --max-concurrent-jobs 30 --wait
To run all datasets in a context over all folds for that context
tabflow --datasets run_all --folds -1 --methods_file ~/method_configs.yaml --s3_bucket test-bucket --experiment_name test-experiment --max-concurrent-jobs 30 --wait
Note:
experiment_names
, caching won't come into playTo Do (mostly prioritized order):
requirements.txt
or pyproject.toml [x]Dockerfile name and build
etc., add docker building step to pipelineBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.