Documentation Issue
The Aurora cluster parameter group in 1.architectures/8.accounting-database/cf_database-accounting.yaml enforces TLS:
# cf_database-accounting.yaml L89–94
AccountingClusterParameterGroup:
Type: 'AWS::RDS::DBClusterParameterGroup'
Properties:
...
Parameters:
require_secure_transport: 'ON'
…but the ## Amazon SageMaker HyperPod Orchestrated by Slurm section of README.md (L78–95) builds slurmdbd.conf without any TLS configuration:
cat > /opt/slurm/etc/slurmdbd.conf << EOF
AuthType=auth/munge
DbdHost=$(hostname)
DbdPort=6819
SlurmUser=slurm
LogFile=/var/log/slurmdbd.log
StorageType=accounting_storage/mysql
StorageUser=${DATABASE_ADMIN}
StoragePass=$(aws secretsmanager get-secret-value --secret-id ${DATABASE_SECRET_ARN} --query SecretString --output text)
StorageHost=${DATABASE_URI}
StoragePort=3306
EOF
slurmdbd uses libmysqlclient under the hood. With require_secure_transport: ON, the server requires the client to negotiate TLS, and the RDS public CA is not in the default trust store on Amazon Linux or Ubuntu base images. Following the README verbatim on a HyperPod controller, slurmdbd fails to connect (SSL handshake error visible in /var/log/slurmdbd.log).
Scope note: This is HyperPod-only. The ParallelCluster section above it (L60–73) defers connection setup to PC's SlurmSettings.Database, which plumbs the RDS CA automatically — verified on a live PC 3.15 cluster against this template.
Suggested Fix
Add a step to download the RDS global CA bundle before building slurmdbd.conf, and add a StorageParameters line:
# Download the global RDS public CA bundle
sudo curl -o /etc/ssl/certs/rds-global-bundle.pem \
https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem
sudo chmod 644 /etc/ssl/certs/rds-global-bundle.pem
cat > /opt/slurm/etc/slurmdbd.conf << EOF
AuthType=auth/munge
DbdHost=$(hostname)
DbdPort=6819
SlurmUser=slurm
LogFile=/var/log/slurmdbd.log
StorageType=accounting_storage/mysql
StorageUser=${DATABASE_ADMIN}
StoragePass=$(aws secretsmanager get-secret-value --secret-id ${DATABASE_SECRET_ARN} --query SecretString --output text)
StorageHost=${DATABASE_URI}
StoragePort=3306
StorageParameters=SSL_CA=/etc/ssl/certs/rds-global-bundle.pem
EOF
Reference: Slurm slurmdbd.conf docs — StorageParameters, Using SSL/TLS with Aurora.
Happy to submit a PR.
Documentation Issue
The Aurora cluster parameter group in
1.architectures/8.accounting-database/cf_database-accounting.yamlenforces TLS:…but the
## Amazon SageMaker HyperPod Orchestrated by Slurmsection ofREADME.md(L78–95) buildsslurmdbd.confwithout any TLS configuration:slurmdbd uses libmysqlclient under the hood. With
require_secure_transport: ON, the server requires the client to negotiate TLS, and the RDS public CA is not in the default trust store on Amazon Linux or Ubuntu base images. Following the README verbatim on a HyperPod controller,slurmdbdfails to connect (SSL handshake error visible in/var/log/slurmdbd.log).Suggested Fix
Add a step to download the RDS global CA bundle before building
slurmdbd.conf, and add aStorageParametersline:# Download the global RDS public CA bundle sudo curl -o /etc/ssl/certs/rds-global-bundle.pem \ https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem sudo chmod 644 /etc/ssl/certs/rds-global-bundle.pemReference: Slurm slurmdbd.conf docs — StorageParameters, Using SSL/TLS with Aurora.
Happy to submit a PR.