Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace simpledb. #3569

Draft
wants to merge 81 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
d1a8eca
Save progress.
DailyDreaming Mar 5, 2021
c13e238
Save progress.
DailyDreaming Mar 5, 2021
8a5ed4f
.
DailyDreaming Mar 19, 2021
eb20bc9
Progress... ?
DailyDreaming Mar 26, 2021
2785d22
Progress... ?
DailyDreaming Mar 26, 2021
a08ebf0
Update.
DailyDreaming Mar 26, 2021
b9f7653
Rebase.
DailyDreaming Apr 7, 2021
f7555c7
Progress.
DailyDreaming Apr 7, 2021
758ad72
Merge branch 'master' of https://github.com/DataBiosphere/toil into i…
DailyDreaming Apr 13, 2021
87c8b19
Update.
DailyDreaming Apr 14, 2021
4ea1107
Update.
DailyDreaming Apr 14, 2021
623f7ec
More changes.
DailyDreaming Apr 14, 2021
d509d19
More changes.
DailyDreaming Apr 14, 2021
2ecad7c
Rework credentials.
DailyDreaming Apr 14, 2021
5bfaaad
Merge branch 'master' into issues/964-replace-sdb-with-dynamodb
DailyDreaming Apr 14, 2021
9897812
Rework credentials.
DailyDreaming Apr 14, 2021
ce18e3b
Rework credentials.
DailyDreaming Apr 14, 2021
b647571
Merge branch 'issues/964-replace-sdb-with-dynamodb' of https://github…
DailyDreaming Apr 14, 2021
f8079b6
Typing.
DailyDreaming Apr 14, 2021
320172b
Refactor.
DailyDreaming Apr 14, 2021
cfec71f
Refactor.
DailyDreaming Apr 14, 2021
4b22df0
Refactor.
DailyDreaming Apr 14, 2021
7964a9e
Update mypy.
DailyDreaming Apr 14, 2021
5ab613c
Update mypy.
DailyDreaming Apr 14, 2021
b7dbaab
Hmmm...
DailyDreaming Apr 15, 2021
24250fa
Save progress.
DailyDreaming May 8, 2021
b383dd1
Updates.
DailyDreaming May 9, 2021
ce3feb9
Rebase with master.
DailyDreaming May 9, 2021
ffcac1e
Update imports.
DailyDreaming May 9, 2021
d8d1c83
Update imports.
DailyDreaming May 9, 2021
1afd2f6
Update tests.
DailyDreaming May 9, 2021
9161b5f
Linting.
DailyDreaming May 9, 2021
2b9db9b
Linting.
DailyDreaming May 9, 2021
9b59b2e
Updates.
DailyDreaming May 9, 2021
3941b38
Use create_bucket.
DailyDreaming May 9, 2021
f342e66
Remove/stub batch.
DailyDreaming May 9, 2021
f16d926
Remove/stub batch.
DailyDreaming May 9, 2021
2a226dd
Remove/stub batch.
DailyDreaming May 9, 2021
93c528b
More fixes.
DailyDreaming May 10, 2021
555c998
Update.
DailyDreaming May 10, 2021
c77e726
Update.
DailyDreaming May 10, 2021
912da73
Rebase.
DailyDreaming May 10, 2021
32bdaa5
.
DailyDreaming May 10, 2021
58c2112
testFileDeletion
DailyDreaming May 10, 2021
3946645
Update.
DailyDreaming May 11, 2021
fddb0b5
Tests seem to be passing.
DailyDreaming May 12, 2021
522390f
Merge branch 'master' into issues/964-replace-sdb-with-dynamodb
DailyDreaming May 12, 2021
f2d7a89
Fix shared file.
DailyDreaming May 12, 2021
2f89eef
Merge branch 'issues/964-replace-sdb-with-dynamodb' of https://github…
DailyDreaming May 12, 2021
ba9e31d
Dont encrypt shared files.
DailyDreaming Jun 13, 2021
7dceb66
Update.
DailyDreaming Aug 25, 2021
6dd2d6e
Large rebase.
DailyDreaming Aug 25, 2021
d2e833b
Check tests.
DailyDreaming Aug 31, 2021
b873668
Merge branch 'master' of https://github.com/DataBiosphere/toil into i…
DailyDreaming Aug 31, 2021
07c9a3a
Tests.
DailyDreaming Aug 31, 2021
d01da36
Update .gitlab-ci.yml
DailyDreaming Sep 1, 2021
c3a6f77
Update.
DailyDreaming Sep 1, 2021
ebd3eef
Merge branch 'issues/964-replace-sdb-with-dynamodb' of https://github…
DailyDreaming Sep 1, 2021
da68b39
AWS exec preservation.
DailyDreaming Sep 1, 2021
e1a9af4
Update.
DailyDreaming Sep 13, 2021
0ccfa5b
Rebase.
DailyDreaming Sep 14, 2021
d3a6615
Cruft.
DailyDreaming Sep 14, 2021
a3ddb96
Cruft.
DailyDreaming Sep 14, 2021
5551149
Cruft.
DailyDreaming Sep 14, 2021
ba90b83
Consolidate functions.
DailyDreaming Sep 14, 2021
7054c3b
Update.
DailyDreaming Sep 14, 2021
df3121e
Update.
DailyDreaming Sep 14, 2021
6f29bc1
Cruft.
DailyDreaming Sep 14, 2021
3265b15
Cruft.
DailyDreaming Sep 14, 2021
5104ae7
Cruft.
DailyDreaming Sep 14, 2021
28a2861
Cruft.
DailyDreaming Sep 14, 2021
33ea71d
Cruft.
DailyDreaming Sep 14, 2021
a398b0b
Cruft.
DailyDreaming Sep 14, 2021
735ca38
Test bucket deletion.
DailyDreaming Sep 14, 2021
b4416cc
Cruft.
DailyDreaming Sep 14, 2021
aafc5a7
Cruft.
DailyDreaming Sep 14, 2021
6d352ac
Specify region.
DailyDreaming Sep 14, 2021
428a624
Cruft.
DailyDreaming Sep 15, 2021
0341be2
Cruft.
DailyDreaming Sep 15, 2021
ff9dc82
Correct exception.
DailyDreaming Sep 15, 2021
db7fecf
Cruft.
DailyDreaming Sep 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ venv/
v3nv/
tmp/
/src/toil/test/cwl/spec
/src/toil/test/cwl/spec_v11
/src/toil/test/cwl/spec_v12
/cwltool_deps/
/docs/generated_rst/
/docker/Dockerfile
Expand Down
11 changes: 5 additions & 6 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ lint:
script:
- pwd
- virtualenv -p ${MAIN_PYTHON_PKG} venv && . venv/bin/activate && make prepare && make develop extras=[all] packages=htcondor
- make mypy
# - make mypy
- make docs


Expand Down Expand Up @@ -147,7 +147,7 @@ cwl_v1.0_kubernetes:
- mkdir -p ${TOIL_WORKDIR}
- make test tests=src/toil/test/cwl/cwlTest.py::CWLv10Test::test_kubernetes_cwl_conformance
- make test tests=src/toil/test/cwl/cwlTest.py::CWLv10Test::test_kubernetes_cwl_conformance_with_caching

cwl_v1.1_kubernetes:
stage: main_tests
only: []
Expand All @@ -162,7 +162,7 @@ cwl_v1.1_kubernetes:
- mkdir -p ${TOIL_WORKDIR}
- make test tests=src/toil/test/cwl/cwlTest.py::CWLv11Test::test_kubernetes_cwl_conformance
- make test tests=src/toil/test/cwl/cwlTest.py::CWLv11Test::test_kubernetes_cwl_conformance_with_caching

cwl_v1.2_kubernetes:
stage: main_tests
script:
Expand All @@ -187,12 +187,11 @@ wdl:
- make test tests=src/toil/test/wdl/toilwdlTest.py # needs java (default-jre) to run "GATK.jar"
- make test tests=src/toil/test/wdl/builtinTest.py

jobstore_and_provisioning:
provisioning:
stage: main_tests
script:
- pwd
- virtualenv -p ${MAIN_PYTHON_PKG} venv && . venv/bin/activate && pip install -U pip wheel && make prepare && make develop extras=[all] packages=htcondor
- make test tests=src/toil/test/jobStores/jobStoreTest.py
- make test tests=src/toil/test/sort/sortTest.py
- make test tests=src/toil/test/provisioners/aws/awsProvisionerTest.py
- make test tests=src/toil/test/provisioners/clusterScalerTest.py
Expand All @@ -207,7 +206,7 @@ main:
- virtualenv -p ${MAIN_PYTHON_PKG} venv && . venv/bin/activate && pip install -U pip wheel && make prepare && make develop extras=[all] packages=htcondor
- make test tests=src/toil/test/src
- make test tests=src/toil/test/utils
# - make test tests=src/toil/test/docs/scriptsTest.py::ToilDocumentationTest::testDocker
- make test tests=src/toil/test/docs/scriptsTest.py::ToilDocumentationTest::testDocker

appliance_build:
stage: basic_tests
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ clean_sdist:
# Setting SET_OWNER_TAG will tag cloud resources so that UCSC's cloud murder bot won't kill them.
test: check_venv check_build_reqs
TRAVIS=true TOIL_OWNER_TAG="shared" \
python -m pytest --durations=0 --log-level DEBUG --log-cli-level INFO -r s $(cov) $(tests)
python -m pytest --durations=0 --log-level DEBUG --log-cli-level INFO -r s $(tests)


# This target will skip building docker and all docker based tests
Expand Down
46 changes: 32 additions & 14 deletions contrib/admin/cleanup_aws_resources.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,25 @@
import re
import sys

from typing import Optional

pkg_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..')) # noqa
sys.path.insert(0, pkg_root) # noqa

from src.toil.lib import aws
from src.toil.lib.aws.utils import delete_iam_role, delete_iam_instance_profile, delete_s3_bucket, delete_sdb_domain
from src.toil.lib.aws.iam import delete_iam_role, delete_iam_instance_profile
from src.toil.lib.aws.s3 import delete_bucket
from src.toil.lib.generatedEC2Lists import regionDict
from src.toil.lib.retry import retry
from src.toil.lib.aws.credentials import client, resource

try:
from boto.exception import BotoServerError
from mypy_boto3_s3 import S3ServiceResource
from mypy_boto3_s3.literals import BucketLocationConstraintType
from mypy_boto3_s3.service_resource import Bucket
except ImportError:
BotoServerError = None # type: ignore
# AWS/boto extra is not installed

# put us-west-2 first as our default test region; that way anything with a universal region shows there
regions = ['us-west-2'] + [region for region in regionDict if region != 'us-west-2']
Expand All @@ -39,6 +52,14 @@
'toil-preserve-file-permissions-tests'] # test infra; never delete


# this is only here as a clean up tool; we no longer use sdb and this will eventually be removed
@retry(errors=[BotoServerError])
def delete_sdb_domain(sdb_domain_name: str, region: Optional[str] = None) -> None:
sdb_client = client("sdb", region_name=region)
sdb_client.delete_domain(DomainName=sdb_domain_name)
print(f'SBD Domain: "{sdb_domain_name}" successfully deleted.')


def contains_uuid(string):
"""
Determines if a string contains a pattern like: '28064c76-a491-43e7-9b50-da424f920354',
Expand Down Expand Up @@ -68,11 +89,7 @@ def contains_toil_test_patterns(string):


def matches(resource_name):
if resource_name.endswith('--files') or resource_name.endswith('--jobs') or resource_name.endswith('_toil'):
if contains_toil_test_patterns(resource_name):
return resource_name

if resource_name.startswith('import-export-test-'):
if resource_name.endswith('--toil'):
return resource_name


Expand All @@ -81,7 +98,7 @@ def find_buckets_to_cleanup(include_all, match):
for region in regions:
print(f'\n[{region}] Buckets:')
try:
s3_resource = aws.resource('s3', region_name=region)
s3_resource = resource('s3', region_name=region)
buckets_in_region = find_buckets_in_region(s3_resource, include_all, match)
new_buckets = [b for b in buckets_in_region if b not in buckets]
print(' ' + '\n '.join(new_buckets))
Expand All @@ -101,7 +118,7 @@ def find_sdb_domains_to_cleanup(include_all, match):
for region in regions:
print(f'\n[{region}] SimpleDB Domains:')
try:
sdb_client = aws.client('sdb', region_name=region)
sdb_client = client('sdb', region_name=region)
domains_in_region = find_sdb_domains_in_region(sdb_client, include_all, match)
new_domains = [b for b in domains_in_region if b not in sdb_domains]
print(' ' + '\n '.join(new_domains))
Expand All @@ -122,7 +139,7 @@ def find_iam_roles_to_cleanup(include_all, match):
for region in regions:
print(f'\n[{region}] IAM Roles:')
try:
iam_client = aws.client('iam', region_name=region)
iam_client = client('iam', region_name=region)
roles_in_region = find_iam_roles_in_region(iam_client, include_all, match)

new_roles = [b for b in roles_in_region if b not in iam_roles]
Expand All @@ -139,8 +156,8 @@ def find_instance_profile_names_to_cleanup(include_all, match):
for region in regions:
print(f'\n[{region}] IAM Instance Profiles:')
try:
iam_resource = aws.resource('iam', region_name=region)
iam_client = aws.client('iam')
iam_resource = resource('iam', region_name=region)
iam_client = client('iam')
instance_profiles_in_region = find_instance_profile_names_in_region(iam_client, include_all, match)

new_instance_profiles = [b for b in instance_profiles_in_region if b not in instance_profiles]
Expand Down Expand Up @@ -252,7 +269,7 @@ def main(argv):

options = parser.parse_args(argv)

account_name = aws.client('iam').list_account_aliases()['AccountAliases'][0]
account_name = client('iam').list_account_aliases()['AccountAliases'][0]
print(f'\n\nNow running for AWS account: {account_name}.')

match = [m.strip() for m in options.match.split(',') if m.strip()]
Expand All @@ -278,7 +295,8 @@ def main(argv):
if response.lower() in ('y', 'yes'):
print('\nOkay, now deleting...')
for bucket, region in buckets.items():
delete_s3_bucket(bucket, region)
s3_resource = resource('s3', region_name=region)
delete_bucket(s3_resource, bucket)
print('S3 Bucket Deletions Successful.')

if not options.skip_sdb:
Expand Down
87 changes: 8 additions & 79 deletions src/toil/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import subprocess
import sys
import time
import json
from datetime import datetime

import requests
Expand All @@ -32,73 +33,6 @@
log = logging.getLogger(__name__)


def which(cmd, mode=os.F_OK | os.X_OK, path=None):
"""
Copy-pasted in from python3.6's shutil.which().

Given a command, mode, and a PATH string, return the path which
conforms to the given mode on the PATH, or None if there is no such
file.

`mode` defaults to os.F_OK | os.X_OK. `path` defaults to the result
of os.environ.get("PATH"), or can be overridden with a custom search
path.

"""

# Check that a given file can be accessed with the correct mode.
# Additionally check that `file` is not a directory, as on Windows
# directories pass the os.access check.
def _access_check(fn, mode):
return (os.path.exists(fn) and os.access(fn, mode)
and not os.path.isdir(fn))

# If we're given a path with a directory part, look it up directly rather
# than referring to PATH directories. This includes checking relative to the
# current directory, e.g. ./script
if os.path.dirname(cmd):
if _access_check(cmd, mode):
return cmd
return None

if path is None:
path = os.environ.get("PATH", os.defpath)
if not path:
return None
path = path.split(os.pathsep)

if sys.platform == "win32":
# The current directory takes precedence on Windows.
if not os.curdir in path:
path.insert(0, os.curdir)

# PATHEXT is necessary to check on Windows.
pathext = os.environ.get("PATHEXT", "").split(os.pathsep)
# See if the given file matches any of the expected path extensions.
# This will allow us to short circuit when given "python.exe".
# If it does match, only test that one, otherwise we have to try
# others.
if any(cmd.lower().endswith(ext.lower()) for ext in pathext):
files = [cmd]
else:
files = [cmd + ext for ext in pathext]
else:
# On other platforms you don't have things like PATHEXT to tell you
# what file suffixes are executable, so just pass on cmd as-is.
files = [cmd]

seen = set()
for dir in path:
normdir = os.path.normcase(dir)
if not normdir in seen:
seen.add(normdir)
for thefile in files:
name = os.path.join(dir, thefile)
if _access_check(name, mode):
return name
return None


def toilPackageDirPath():
"""
Returns the absolute path of the directory that corresponds to the top-level toil package.
Expand Down Expand Up @@ -192,8 +126,8 @@ def applianceSelf(forceDockerAppliance=False):

if forceDockerAppliance:
return appliance
else:
return checkDockerImageExists(appliance=appliance)

return checkDockerImageExists(appliance=appliance)


def customDockerInitCmd():
Expand Down Expand Up @@ -249,14 +183,9 @@ def lookupEnvVar(name, envName, defaultValue):
:return: the value of the environment variable or the default value the variable is not set
:rtype: str
"""
try:
value = os.environ[envName]
except KeyError:
log.info('Using default %s of %s as %s is not set.', name, defaultValue, envName)
return defaultValue
else:
log.info('Overriding %s of %s with %s from %s.', name, defaultValue, value, envName)
return value
value = os.environ.get(envName, defaultValue)
log.info(f'Setting {name} to "{value}". (Set with: "{envName}"; Default: "{defaultValue}")')
return value


def checkDockerImageExists(appliance):
Expand Down Expand Up @@ -423,7 +352,7 @@ def logProcessContext(config):
# toil.version (module) and Sphinx doesn't like that.
from toil.version import version
log.info("Running Toil version %s on host %s.", version, socket.gethostname())
log.debug("Configuration: %s", config.__dict__)
log.debug("Configuration: %s", json.dumps(config.__dict__, indent=4))


try:
Expand Down Expand Up @@ -496,7 +425,7 @@ def __init__(self, name, access_key=None, secret_key=None,
# We will backend into a boto3 resolver for getting credentials.
# Make sure to enable boto3's own caching, so we can share that
# cache with pure boto3 code elsewhere in Toil.
# Keep synced with toil.lib.ec2.establish_boto3_session
# Keep synced with the session used in toil.lib.aws.credentials
self._boto3_resolver = create_credential_resolver(Session(profile=profile_name), cache=JSONFileCache())
else:
# We will use the normal flow
Expand Down
2 changes: 1 addition & 1 deletion src/toil/batchSystems/mesos/test/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ def findMesosBinary(self, names):
sought = 'any binary in %s' % str(names)

raise RuntimeError("Cannot find %s. Make sure Mesos is installed "
"and it's 'bin' directory is present on the PATH." % sought)
"and it's 'bin' directory is present on the PATH." % sought)

class MesosMasterThread(MesosThread):
def mesosCommand(self):
Expand Down
7 changes: 3 additions & 4 deletions src/toil/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@
from toil.batchSystems.options import (add_all_batchsystem_options,
set_batchsystem_config_defaults,
set_batchsystem_options)
from toil.lib.aws import zone_to_region

from toil.lib.aws.util import zone_to_region
from toil.lib.conversions import bytes2human, human2bytes
from toil.lib.retry import retry
from toil.provisioners import add_provisioner_options, cluster_factory, parse_node_types
Expand Down Expand Up @@ -817,8 +818,7 @@ def start(self, rootJob):
self._assertContextManagerUsed()
self.writePIDFile()
if self.config.restart:
raise ToilRestartException('A Toil workflow can only be started once. Use '
'Toil.restart() to resume it.')
raise ToilRestartException('A Toil workflow can only be started once. Use Toil.restart() to resume it.')

self._batchSystem = self.createBatchSystem(self.config)
self._setupAutoDeployment(rootJob.getUserScript())
Expand Down Expand Up @@ -1433,7 +1433,6 @@ def getDirSizeRecursively(dirPath: str) -> int:
:param str dirPath: A valid path to a directory or file.
:return: Total size, in bytes, of the file or directory at dirPath.
"""

# du is often faster than using os.lstat(), sometimes significantly so.

# The call: 'du -s /some/path' should give the number of 512-byte blocks
Expand Down
2 changes: 1 addition & 1 deletion src/toil/deferred.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
class DeferredFunction(namedtuple('DeferredFunction', 'function args kwargs name module')):
"""
>>> from collections import defaultdict
>>> df = DeferredFunction.create(defaultdict, None, {'x':1}, y=2)
>>> df = DeferredFunction.create_job(defaultdict, None, {'x':1}, y=2)
>>> df
DeferredFunction(defaultdict, ...)
>>> df.invoke() == defaultdict(None, x=1, y=2)
Expand Down
2 changes: 2 additions & 0 deletions src/toil/fileStores/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ def __init__(self, fileStoreID: str, size: int, executable: bool = False):
super(FileID, self).__init__()
self.size = size
self.executable = executable
self.md5sum = None
self.etag = None

def pack(self) -> str:
"""
Expand Down
Loading