Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HyperPod Default Enroot Path is Root Volume #427

Closed
nghtm opened this issue Sep 12, 2024 · 5 comments
Closed

HyperPod Default Enroot Path is Root Volume #427

nghtm opened this issue Sep 12, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@nghtm
Copy link
Collaborator

nghtm commented Sep 12, 2024

The HyperPod default enroot path uses /opt/sagemaker due to the first if statement defined here. This is ussually approx 500 GB of root volume, depending on user configuration.

For larger models, including Nemotron 340b, a larger volume is required, to avoid running out of enroot space as seen in error log below:

slurmstepd: error: pyxis: child 1528235 failed with error code: 1
slurmstepd: error: pyxis: failed to create container filesystem
slurmstepd: error: pyxis: printing enroot log file:
slurmstepd: error: pyxis:     [INFO] Extracting squashfs filesystem...
slurmstepd: error: pyxis:     Write on output file failed because No space left on device
slurmstepd: error: pyxis:     FATAL ERROR:writer: failed to write file /opt/sagemaker/tmp/enroot/data/user-1000/pyxis_167.0/usr/local/tensorrt/targets/x86_64-linux-gnu/lib/libnvinfer_static.a
slurmstepd: error: pyxis:     Parallel unsquashfs: Using 96 processors
slurmstepd: error: pyxis:     433207 inodes (820890 blocks) to write

This can be fixed by changing the order of the if/elif statement to default to/opt/dlami/nvme (28TB on p5s) instead, which will make enroot use NVME instead of root volume space here.A PR is required to modify the order if/elif in the lifecycle script

@nghtm
Copy link
Collaborator Author

nghtm commented Sep 12, 2024

When run on a newly deployed HyperPod cluster (4x p5s), I see the following, which indicates that the enroot path is indeed being set to /opt/dlami/nvme ...

srun -N 4 cat /etc/enroot/enroot.conf | grep -E "ENROOT_CONFIG_PATH|ENROOT_CACHE_PATH|ENROOT_RUNTIME_PATH|ENROOT_DATA_PATH|ENROOT_TEMP_PATH"
ENROOT_RUNTIME_PATH        /opt/dlami/nvme/tmp/enroot/user-$(id -u)
ENROOT_CONFIG_PATH         ${HOME}/enroot
ENROOT_CACHE_PATH          /fsx/enroot
ENROOT_DATA_PATH           /opt/dlami/nvme/tmp/enroot/data/user-$(id -u)
ENROOT_TEMP_PATH           /opt/dlami/nvme/tmp
ENROOT_RUNTIME_PATH        /opt/dlami/nvme/tmp/enroot/user-$(id -u)
ENROOT_CONFIG_PATH         ${HOME}/enroot
ENROOT_CACHE_PATH          /fsx/enroot
ENROOT_DATA_PATH           /opt/dlami/nvme/tmp/enroot/data/user-$(id -u)
ENROOT_TEMP_PATH           /opt/dlami/nvme/tmp
ENROOT_RUNTIME_PATH        /opt/dlami/nvme/tmp/enroot/user-$(id -u)
ENROOT_CONFIG_PATH         ${HOME}/enroot
ENROOT_CACHE_PATH          /fsx/enroot
ENROOT_DATA_PATH           /opt/dlami/nvme/tmp/enroot/data/user-$(id -u)
ENROOT_TEMP_PATH           /opt/dlami/nvme/tmp
ENROOT_RUNTIME_PATH        /opt/dlami/nvme/tmp/enroot/user-$(id -u)
ENROOT_CONFIG_PATH         ${HOME}/enroot
ENROOT_CACHE_PATH          /fsx/enroot
ENROOT_DATA_PATH           /opt/dlami/nvme/tmp/enroot/data/user-$(id -u)
ENROOT_TEMP_PATH           /opt/dlami/nvme/tmp

@nghtm
Copy link
Collaborator Author

nghtm commented Sep 12, 2024

As a temp workaround, considering 2 options:

1. Exporting env variables:

According to enroot Docs, env variables will over-ride configuration. If we go this route, it is important to note that the paths must be created on the nodes seperately first:

mkdir -p /opt/dlami/nvme/tmp/enroot/
chmod 1777 /opt/dlami/nvme/tmp
chmod 1777 /opt/dlami/nvme/tmp/enroot/


mkdir -p /opt/dlami/nvme/tmp/enroot/data/
chmod 1777 /opt/dlami/nvme/tmp/enroot/data/


mkdir -p /opt/dlami/nvme/enroot
chmod 1777 /opt/dlami/nvme/enroot

Then export env vars:

export ENROOT_RUNTIME_PATH=/opt/dlami/nvme/tmp/enroot/user-$(id -u)
export ENROOT_CONFIG_PATH=${HOME}/enroot
export ENROOT_CACHE_PATH=/fsx/enroot
export ENROOT_DATA_PATH=/opt/dlami/nvme/tmp/enroot/data/user-$(id -u)
export ENROOT_TEMP_PATH=/opt/dlami/nvme/tmp

Note the lifecycle script must be changed afterwards.

2. use script to modify inline the enroot.conf:

What the script does is create new directories and set enroot paths at /opt/dlami/nvme/enroot, where we have 28TB of SSD compared to 500GB at /opt/sagemaker/.
(on head/controller node)

create file called update-enroot.sh

cat > update-enroot.sh << EOL 
#!/bin/bash


# Create directories for Enroot and set appropriate permissions
echo "Creating directories for Enroot in /opt/dlami/nvme/tmp/enroot/..."
mkdir -p /opt/dlami/nvme/tmp/enroot/
chmod 1777 /opt/dlami/nvme/tmp
chmod 1777 /opt/dlami/nvme/tmp/enroot/
echo "Directory /opt/dlami/nvme/tmp/enroot/ created and permissions set."

echo "Creating directory /opt/dlami/nvme/tmp/enroot/data/..."
mkdir -p /opt/dlami/nvme/tmp/enroot/data/
chmod 1777 /opt/dlami/nvme/tmp/enroot/data/
echo "Directory /opt/dlami/nvme/tmp/enroot/data/ created and permissions set."

echo "Creating directory /opt/dlami/nvme/enroot..."
mkdir -p /opt/dlami/nvme/enroot
chmod 1777 /opt/dlami/nvme/enroot
echo "Directory /opt/dlami/nvme/enroot created and permissions set."

# Modify paths in enroot.conf to point to /opt/dlami/nvme instead of /opt/sagemaker
echo "Modifying paths in /etc/enroot/enroot.conf to use /opt/dlami/nvme instead of /opt/sagemaker..."

# Modify values in enroot.conf

sed -i 's|/opt/sagemaker/tmp|/opt/dlami/nvme/tmp|g' /etc/enroot/enroot.conf

echo "Paths in enroot.conf modified."
EOL

Apply the script via ansible or srun (requires sudo):

srun -n 8 sudo ./update-enroot.sh

Run sanity check to check if enroot.conf is updated:

srun -N 8 cat /etc/enroot/enroot.conf | grep -E "ENROOT_CONFIG_PATH|ENROOT_CACHE_PATH|ENROOT_RUNTIME_PATH|ENROOT_DATA_PATH|ENROOT_TEMP_PATH"

@nghtm
Copy link
Collaborator Author

nghtm commented Sep 19, 2024

Another report of Enroot path being /opt/sagemaker/tmp on compute nodes by default (p5 in this case).

Above script was provided, but long term fix to install_enroot_pyxis.sh to default to /opt/dlami/nvme instead is needed

@nghtm
Copy link
Collaborator Author

nghtm commented Sep 19, 2024

The simplest fix would be to switch the order of this if statement, to use /opt/dlami/nvme instead

https://github.com/aws-samples/awsome-distributed-training/blob/main/1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/utils/install_enroot_pyxis.sh

@mhuguesaws
Copy link
Contributor

Will prioritize.

@mhuguesaws mhuguesaws added the bug Something isn't working label Sep 20, 2024
nghtm added a commit that referenced this issue Sep 20, 2024
…e and execStart messages. This provides assurance that /opt/dlami/nvme is mounted to node prior to executing enroot configuration which will use /opt/dlami/nvme. This commit also updates the order of if /elif statement to first try /opt/dlami/nvme before /opt/sagemaker. For more, see issue #427 #427

Signed-off-by: nghtm <[email protected]>
nghtm added a commit that referenced this issue Sep 21, 2024
…e and execStart messages. This provides assurance that /opt/dlami/nvme is mounted to node prior to executing enroot configuration which will use /opt/dlami/nvme. This commit also updates the order of if /elif statement to first try /opt/dlami/nvme before /opt/sagemaker. For more, see issue #427 #427

+correction from mhugueaws comment on original
Signed-off-by: nghtm <[email protected]>
nghtm added a commit that referenced this issue Sep 23, 2024
* fix incorrect config param for update_neuron_sdk LCS

* move Docker/Enroot/Pyxis installation after Observability (if enabled) opertunisitically, allowing more time for nvme to mount on clusters which enable observability. note docker is installed independently if observability is enabled, and install_docker script is idempotent (line 6-9 install_docker.sh)

* add while loop that will poll (max 120s) dlami-nvme.service for active and execStart messages. This provides assurance that /opt/dlami/nvme is mounted to node prior to executing enroot configuration which will use /opt/dlami/nvme. This commit also updates the order of if /elif statement to first try /opt/dlami/nvme before /opt/sagemaker. For more, see issue #427 #427

+correction from mhugueaws comment on original
Signed-off-by: nghtm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants