Skip to content
This repository has been archived by the owner on Jun 4, 2024. It is now read-only.

[autoscaler] Using ami-0f92e9d2b63bc61a2 fails with error "ERROR: ray-1.2.0.dev0-cp36-cp36m-manylinux2014_x86_64.whl is not a supported wheel on this platform." #7

Open
jennakwon06 opened this issue Feb 5, 2021 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@jennakwon06
Copy link
Contributor

Problem

I am using ami-00f92e9d2b63bc61a2 which is supposed to be the ami for Linux - Python 3.7 - Ray 1.2.0.

I am using below yaml file, where my docker image 048211272910.dkr.ecr.us-west-2.amazonaws.com/jkkwon-batscli:zarr is a custom image based off of 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-training:2.3.1-cpu-py37-ubuntu18.04.

cluster_name: jkkwon_ray_test

min_workers: 10
max_workers: 100
upscaling_speed: 1.0

docker: "
    image: "048211272910.dkr.ecr.us-west-2.amazonaws.com/jkkwon-batscli:zarr"
    container_name: "miamiml_container"
    pull_before_run: True

idle_timeout_minutes: 5

provider:
    type: aws
    region: us-west-2
    availability_zone: us-west-2a,us-west-2b,us-west-2c,us-west-2d
    cache_stopped_nodes: False

auth:
    ssh_user: ubuntu
    ssh_private_key: miami_dev_dask_emr_key_pair.pem

head_node:
    InstanceType: r5n.24xlarge
    ImageId: ami-0f92e9d2b63bc61a2 # https://github.com/amzn/amazon-ray
    SecurityGroupIds:
        - "sg-08ed97f6d08d451f6"
    SubnetIds: [
        "subnet-02876545b671b57b0"
    ]
    BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
              VolumeSize: 100
    KeyName: "miami_dev_dask_emr_key_pair"

worker_nodes:
    InstanceType: r5n.24xlarge
    ImageId: ami-0f92e9d2b63bc61a2 # https://github.com/amzn/amazon-ray
    SecurityGroupIds:
        - "sg-08ed97f6d08d451f6"
    SubnetIds: [
        "subnet-0180e9267b994bf97",  # us-west-2a, 8187 IP addresses. 10.0.32.0/19
        "subnet-073e6e0338bf209cb",  # us-west-2b, 8187 IP addresses. 10.0.64.0/19
        "subnet-03caa10b59288efae",  # us-west-2c, 8187 IP addresses. 10.0.96.0/19
        "subnet-06dd6dbb8caf5c310",  # us-west-2d, 8187 IP addresses. 10.0.128.0/19
    ]
    InstanceMarketOptions:
        MarketType: spot
    KeyName: "miami_dev_dask_emr_key_pair"

    
file_mounts_sync_continuously: False
rsync_exclude:
    - "**/.git"
    - "**/.git/**"
    - 
rsync_filter:
    - ".gitignore"

initialization_commands: []

head_setup_commands: []

worker_setup_commands: []

head_start_ray_commands:
    - ray stop
    - ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml

worker_start_ray_commands:
    - ray stop
    - ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076

The problem is that running ray up fails with message



  [6/7] Running setup commands
    (0/2) echo 'export PATH="$HOME/anaco...
Shared connection to 10.0.0.34 closed.
    (1/2) pip install -U https://s3-us-w...
ERROR: ray-1.2.0.dev0-cp36-cp36m-manylinux2014_x86_64.whl is not a supported wheel on this platform.
WARNING: You are using pip version 20.3.3; however, version 21.0.1 is available.
You should consider upgrading via the '/usr/local/bin/python3.7 -m pip install --upgrade pip' command.
Shared connection to 10.0.0.34 closed.
  New status: update-failed
  !!!
  SSH command failed.
  !!!
  
  Failed to setup head node.

When NOT using the docker image, I am able to actually get the Ray cluster up and running. But when I log onto it with ray attach and look at Python console, I get below:

Python 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
[1]+  Stopped                 python
ubuntu@ip-10-0-0-108:~$ python3
Python 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

I am wondering if Ray wheel was mis-uploaded for 3.6 version, not 3.7 version?

Thanks!

@jennakwon06 jennakwon06 added the bug Something isn't working label Feb 5, 2021
@pdames pdames self-assigned this Feb 6, 2021
@pdames
Copy link
Member

pdames commented Feb 6, 2021

@jennakwon06 - make sure to add the following line to your autoscaler config to prevent default setup_commands from default.yaml (which may differ depending on the version of ray installed on the host running ray up) being automatically applied and trying to install a Python 3.6 Ray Wheel:

setup_commands: []

For example, I launched a cluster via ray up us-west-2-cp37-ray120-test.yaml from the same AMI using the following autoscaler config, and verified that the final result matched my expectations:

cluster_name: us-west-2-cp37-ray120-test 

max_workers: 1

provider:
  type: aws
  region: us-west-2
  availability_zone: us-west-2a

auth:
  ssh_user: ubuntu

head_node:
  InstanceType: r5n.xlarge
  ImageId: ami-0f92e9d2b63bc61a2
  SecurityGroupIds: 
    - sg-07f4b3353e442a2ce

worker_nodes:
  InstanceType: r5n.xlarge
  ImageId: ami-0f92e9d2b63bc61a2
  SecurityGroupIds: 
    - sg-07f4b3353e442a2ce
    
setup_commands: []
pdames$ ray attach us-west-2-cp37-ray120-test.yaml
ubuntu@ip-XXX-XX-XX-XXX:~$ pip show amzn-ray
Name: amzn-ray
Version: 1.2.0
Summary: Staging area for ongoing enhancements to Ray focused on improving its integration with AWS and other Amazon technologies.
Home-page: https://github.com/amzn/amazon-ray
Author: Amazon Ray Team
Author-email: [email protected]
License: Apache 2.0
Location: /home/ubuntu/anaconda3/lib/python3.7/site-packages
Requires: numpy, jsonschema, aiohttp-cors, colorama, msgpack, redis, colorful, filelock, aiohttp, pyyaml, click, py-spy, grpcio, requests, opencensus, aioredis, prometheus-client, protobuf, gpustat
Required-by: 
ubuntu@ip-XXX-XX-XX-XXX:~$ python
Python 3.7.7 (default, Mar 26 2020, 15:48:22) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

@jennakwon06
Copy link
Contributor Author

jennakwon06 commented Feb 8, 2021

I see. Sounds good. Thanks! It sounds like this could be a documentation improvement about the behavior of empty fields. I will leave this open until we improve that documentation.

@pdames pdames added documentation Improvements or additions to documentation enhancement New feature or request and removed bug Something isn't working labels Feb 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants