Skip to content

Conversation

@thomasbeaudry
Copy link
Collaborator

@thomasbeaudry thomasbeaudry commented Jul 3, 2025

Summary by CodeRabbit

  • New Features

    • Added automated setup and configuration for user servers, including NIS, NFS, Slurm workload manager, and FRP client services.
    • Introduced systemd service management for Slurm and FRP clients.
    • Expanded host inventory with new compute nodes and user servers.
    • Enhanced system configuration with unattended upgrades, APT repositories, sudoers, and kernel parameters.
    • Automated deployment of static hosts file, bash and environment settings, and watchdog service configuration.
  • Bug Fixes

    • Updated NIS server configuration to use a new server.
  • Chores

    • Added encrypted variable for Slurm database password management.

@coderabbitai
Copy link

coderabbitai bot commented Jul 3, 2025

Walkthrough

New Ansible playbooks, tasks, handlers, templates, and configuration files were introduced to automate the setup of user servers, including FRP and Slurm services, NIS, NFS, system configurations, and package management. Inventory and playbooks were expanded to include new hosts and groups, with supporting service unit files and templates added for deployment.

Changes

File(s) / Path(s) Change Summary
files/frpc_cloud1_bk.service, files/frpc_cloud2.service, Added systemd service unit files for FRP clients and Slurm daemons (slurmctld, slurmdbd).
files/slurmctld.service, files/slurmdbd.service
files/frpc_cloud1_bk.toml, files/frpc_cloud2.toml Added FRP client configuration files specifying connection, proxy, and logging settings.
files/users.exports Added NFS export configuration for /tank with multiple client access rules.
files/yp.conf Changed NIS server entry from cicus04 to dnpus01.
initial-setup.yml Included "userserver" in playbook host groups.
inventory Added hosts ciccs01–ciccs05 to [computenodes]; introduced [userservers] group with roles.
roles/userserver/handlers/main.yml Added Ansible handlers for restarting and managing system services.
roles/userserver/tasks/file-config.yml Added Ansible tasks for system configuration (APT, smartmontools, msmtp, watchdog, time, grub, etc.).
roles/userserver/tasks/frpc.yml Added playbook for FRP client deployment: user, directories, download, configs, services.
roles/userserver/tasks/hosts.yml Added task to generate /etc/hosts from template.
roles/userserver/tasks/main.yml Added main task file importing sub-tasks, setting locale, and configuring AppArmor.
roles/userserver/tasks/nfs.yml Added playbook for NFS and autofs setup, sysctl tuning, and directory management.
roles/userserver/tasks/nis.yml Added playbook for NIS setup: package install, config files, service management, ypinit, permissions.
roles/userserver/tasks/packages.yml Added task to install a comprehensive list of common APT packages.
roles/userserver/tasks/quarantine.yml Added tasks for bash config, Lmod symlink, and scratch directory setup.
roles/userserver/tasks/repos.yml Added tasks to manage APT repositories and related settings.
roles/userserver/tasks/slurm.yml Added playbook for Slurm installation, configuration, and service management.
roles/userserver/tasks/system-config.yml Added tasks for sudoers, root GECOS, and bash history configuration.
roles/userserver/templates/99-backports.j2 Added APT pinning template for backports.
roles/userserver/templates/hosts.j2 Added Jinja2 template for /etc/hosts file generation.
roles/userserver/templates/slurm.conf.j2 Added SLURM cluster configuration template.
roles/userserver/templates/slurmdbd.conf.j2 Added Slurm database daemon configuration template.
roles/userserver/templates/watchdog.j2 Added watchdog service configuration template.
site.yml Added play targeting userservers with vars and userserver role.
vars/slurm.yml Added encrypted variable for Slurm MySQL database password.

Sequence Diagram(s)

sequenceDiagram
    participant Ansible
    participant UserServer
    participant Systemd
    participant Services

    Ansible->>UserServer: Deploys configuration files, templates, and scripts
    Ansible->>Systemd: Installs and reloads service unit files (FRP, Slurm, NFS, NIS)
    Ansible->>UserServer: Installs packages, sets up directories and permissions
    Systemd->>Services: Starts/Restarts FRP, Slurm, NIS, NFS, and other services
    Services->>UserServer: Services become active and configured
Loading

Poem

🛠️
New servers join the cluster’s song,
With configs, scripts, and tasks so strong.
FRP and Slurm now spin with pride,
NIS and NFS stand side by side.
Handlers ready, services start—
Automation’s art, a sysadmin’s heart!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 23

♻️ Duplicate comments (2)
files/frpc_cloud2.service (1)

5-11: Repeat hardening & whitespace fixes as in frpc_cloud1_bk.service.

Apply the exact same sandboxing directives and whitespace cleanup to keep both unit files consistent and secure.

roles/userserver/tasks/frpc.yml (1)

58-68: Copy destination depends on the previous bug

Because of the nesting issue above, these copies will point to a non-existent path and fail. Fixing the unarchive step will resolve this.

🧹 Nitpick comments (15)
files/frpc_cloud1_bk.service (2)

1-4: Trim trailing spaces in section headers.

Extra spaces after [Unit] & [Service] aren’t fatal but clutter systemctl cat output and break copy-pasting.

-[Unit] 
+[Unit]
...
-[Service] 
+[Service]

5-11: Harden the service sandbox.

FRP runs as root‐capable binary; add minimal hardening:

 [Service]
 Type=simple
 User=frpclient
+CapabilityBoundingSet=CAP_NET_BIND_SERVICE
+NoNewPrivileges=true
+PrivateTmp=true
 Restart=always
 RestartSec=5s
roles/userserver/tasks/packages.yml (1)

80-85: Consider apt state pinning for Docker packages.

Mixing upstream Docker packages with distro ones often causes dependency drift after upgrades. Pin versions or add update_cache: yes cache_valid_time: 3600 so nightly runs don’t unexpectedly upgrade Docker engine.

roles/userserver/tasks/quarantine.yml (1)

1-2: Fix typo and align module syntax.

‘quarnantine’ → ‘quarantine’. Also prefer full mapping form for clarity:

- name: Configure quarantine in /etc/bash.bashrc
- copy: src=files/bash.bashrc dest=/etc/bash.bashrc
+ - name: Configure quarantine bashrc
+   copy:
+     src: files/bash.bashrc
+     dest: /etc/bash.bashrc
roles/userserver/templates/watchdog.j2 (1)

32-33: Consider bumping priority

realtime=yes without an explicit sched_rt_runtime_us adjustment can starve other kernel threads on heavily loaded boxes. Validate that priority=1 is low enough for your workload or document the rationale.

roles/userserver/tasks/main.yml (1)

1-6: Duplicate task names hinder --list-tasks diagnostics

Both tasks share the identical name “enable ssh keys”; this makes troubleshooting harder.

-- name: enable ssh keys
+- name: enable ssh key for beatom
...
-- name: enable ssh keys
+- name: enable ssh key for devgab
roles/userserver/handlers/main.yml (2)

1-22: Inconsistent module usage
Mixing the legacy service: shortcut with the systemd: module complicates handler maintenance and feature parity (e.g., enabled= not honoured by service on all distros). Stick to ansible.builtin.systemd for systemd hosts.

Also applies to: 30-32, 51-52


33-35: Use command for idempotent grub recreation
update-grub needs no shell features. Switching to ansible.builtin.command avoids /bin/sh -c overhead and lint warnings.

roles/userserver/tasks/repos.yml (1)

1-3: pro config likely always reports “changed”
The command has no side-effects after the first run, yet Ansible will mark the task as changed. Add changed_when: false to keep reports clean.

files/slurmctld.service (1)

6-18: Service runs as root without PIDFile or ExecReload
Running slurmctld as root increases blast radius; the upstream unit typically uses the slurm user. Also consider uncommenting PIDFile= and adding an ExecReload so systemctl reload works cleanly.

roles/userserver/templates/slurm.conf.j2 (2)

57-59: LLN=yes on both partitions may starve interactive jobs
LeafNodeLimits is powerful but can mis-prioritise if all nodes belong to both partitions. Ensure this aligns with your scheduling policy.


64-67: Hard-coded path to task_prolog.sh
If the script is updated via Ansible, remember to add a handler to restart slurmd so changes propagate.

roles/userserver/tasks/file-config.yml (1)

36-38: Task label vs. value inconsistency

The task title says “enable softdog driver” but you set watchdog_module="iTCO_wdt". Either rename the task or use softdog to avoid confusion for the next operator.

roles/userserver/tasks/nfs.yml (1)

7-10: Misleading task name

Two tasks are both labelled “autofs installed”; the second actually removes cachefilesd.
Consider renaming to avoid confusion in play output.

roles/userserver/tasks/slurm.yml (1)

76-96: Compile step not idempotent

configure is re-run on every play, marking slurm_configure as changed each time, which triggers a full rebuild.
Add creates: /opt/slurm/24.05.1/bin/srun (or similar) to the configure task to make it idempotent.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42d96ee and 155fd0a.

📒 Files selected for processing (29)
  • files/frpc_cloud1_bk.service (1 hunks)
  • files/frpc_cloud1_bk.toml (1 hunks)
  • files/frpc_cloud2.service (1 hunks)
  • files/frpc_cloud2.toml (1 hunks)
  • files/slurmctld.service (1 hunks)
  • files/slurmdbd.service (1 hunks)
  • files/users.exports (1 hunks)
  • files/yp.conf (1 hunks)
  • initial-setup.yml (1 hunks)
  • inventory (1 hunks)
  • roles/userserver/handlers/main.yml (1 hunks)
  • roles/userserver/tasks/file-config.yml (1 hunks)
  • roles/userserver/tasks/frpc.yml (1 hunks)
  • roles/userserver/tasks/hosts.yml (1 hunks)
  • roles/userserver/tasks/main.yml (1 hunks)
  • roles/userserver/tasks/nfs.yml (1 hunks)
  • roles/userserver/tasks/nis.yml (1 hunks)
  • roles/userserver/tasks/packages.yml (1 hunks)
  • roles/userserver/tasks/quarantine.yml (1 hunks)
  • roles/userserver/tasks/repos.yml (1 hunks)
  • roles/userserver/tasks/slurm.yml (1 hunks)
  • roles/userserver/tasks/system-config.yml (1 hunks)
  • roles/userserver/templates/99-backports.j2 (1 hunks)
  • roles/userserver/templates/hosts.j2 (1 hunks)
  • roles/userserver/templates/slurm.conf.j2 (1 hunks)
  • roles/userserver/templates/slurmdbd.conf.j2 (1 hunks)
  • roles/userserver/templates/watchdog.j2 (1 hunks)
  • site.yml (1 hunks)
  • vars/slurm.yml (1 hunks)
🔇 Additional comments (16)
vars/slurm.yml (1)

8-14: LGTM! Proper use of Ansible Vault for sensitive data.

The encrypted password variable follows security best practices for protecting database credentials.

inventory (2)

71-75: LGTM! Proper addition of compute nodes.

The new compute nodes follow the existing naming convention and are correctly placed in the computenodes group.


82-85: LGTM! Well-structured userservers group.

The role assignments (nis_role, slurmctld) are clearly defined and align with the infrastructure requirements.

roles/userserver/templates/hosts.j2 (1)

43-44: Fix duplicate IPMI entries.

Lines 43-44 contain identical entries for the backup server IPMI interface.

-172.16.69.64 ipmi-bs   ipmi-cicbs
-172.16.69.64 ipmi-bs   ipmi-cicbs
+172.16.69.64 ipmi-bs   ipmi-cicbs

Likely an incorrect or invalid review comment.

files/frpc_cloud2.toml (1)

17-24: LGTM! Proper FRP proxy configuration.

The SSH proxy configuration with health checks is correctly configured.

files/yp.conf (1)

15-15: LGTM! NIS server update aligns with inventory.

The change from cicus04 to dnpus01 is consistent with the new userservers group configuration.

site.yml (1)

20-25: Confirm inventory group consistency.

The play targets userservers, which is correct, but be sure the inventory file contains that exact group name and not userserver to avoid silent skips. Align naming across all playbooks (see issue flagged in initial-setup.yml).

roles/userserver/templates/99-backports.j2 (1)

1-3: Looks good.

Template is minimal and correct; no issues spotted.

roles/userserver/templates/slurmdbd.conf.j2 (1)

6-9: Expose StoragePort explicitly.

If MySQL is not listening on the default port (3306) the daemon will silently fail to connect. Add an explicit port parameter (StoragePort=) or make it templated:

StoragePort={{ mysql_slurmdb_port | default(3306) }}
roles/userserver/tasks/quarantine.yml (1)

11-12: Ensure /scratch survives reboots with correct perms.

If /scratch is on a separate mount or tmpfs it may vanish. Add a mount: task or place the directory creation in a handler triggered after filesystem mounts.

roles/userserver/tasks/system-config.yml (1)

1-5: Modifying root’s GECOS can break automated tooling

Some LDAP/PAM setups rely on the default comment. Double-check downstream expectations before merging.

roles/userserver/tasks/repos.yml (1)

10-14: Blindly deleting trusted.gpg may break other repos
That file can contain keys used by Ubuntu itself. Consider pruning specific keys with apt_key instead of wholesale removal.

roles/userserver/tasks/nis.yml (1)

46-55: Conflicting YPPWDDIR values
/etc/default/nis sets YPPWDDIR=/etc, but later you overwrite the Makefile to /var/lib/extrausers. Keep them consistent to avoid map build failures.

roles/userserver/templates/slurm.conf.j2 (1)

50-58: NodeName=DEFAULT lacks Sockets/Cores/Threads/RealMemory
Many Slurm sub-commands break when these fields are absent. Add conservative defaults or override per-node.

roles/userserver/tasks/file-config.yml (1)

31-35: Handler name mismatch

You notify restart timesyncd, but most distros ship a systemd-timesyncd unit. Ensure the handler name matches the actual service or the restart will silently skip.

roles/userserver/tasks/slurm.yml (1)

114-118: set_fact loop overwrites itself

Looping over every host means the last matching host wins, which may be non-deterministic.
Break after the first assignment or use selectattr | first to pick a single master.

#Login/User Servers IPMI
172.16.69.50 ipmi-us01 ipmi-cicus01
172.16.69.51 ipmi-us02 ipmi-cicus02
172.16.69.65 cicus03-mgmt.douglasrtss.qc.ca cicus03-mgmt ipmi-us03
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in domain name.

Missing dot in domain name "douglasrtss.qc.ca" should be "douglas.rtss.qc.ca".

-172.16.69.65 cicus03-mgmt.douglasrtss.qc.ca cicus03-mgmt ipmi-us03
+172.16.69.65 cicus03-mgmt.douglas.rtss.qc.ca cicus03-mgmt ipmi-us03
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
172.16.69.65 cicus03-mgmt.douglasrtss.qc.ca cicus03-mgmt ipmi-us03
172.16.69.65 cicus03-mgmt.douglas.rtss.qc.ca cicus03-mgmt ipmi-us03
🤖 Prompt for AI Agents
In roles/userserver/templates/hosts.j2 at line 35, there is a typo in the domain
name "douglasrtss.qc.ca" missing a dot. Correct the domain name by inserting the
missing dot so it reads "douglas.rtss.qc.ca".

tls_enable = true
#transport.protocol = "kcp"
#transport.protocol = "quic"
auth.token = "raiCeeshie6ahz1ahre2Eegha5eeng8Shahx1Ma8vae0xeepheiShieshaeN5eic"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Security concern: Hardcoded authentication token.

The authentication token is stored in plaintext. Consider using Ansible variables to inject this from an encrypted source.

-auth.token = "raiCeeshie6ahz1ahre2Eegha5eeng8Shahx1Ma8vae0xeepheiShieshaeN5eic"
+auth.token = "{{ frp_auth_token }}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auth.token = "raiCeeshie6ahz1ahre2Eegha5eeng8Shahx1Ma8vae0xeepheiShieshaeN5eic"
auth.token = "{{ frp_auth_token }}"
🤖 Prompt for AI Agents
In files/frpc_cloud2.toml at line 9, the authentication token is hardcoded in
plaintext, posing a security risk. Remove the hardcoded token and replace it
with a placeholder that references an Ansible variable. Configure Ansible to
inject the actual token securely from an encrypted source during deployment,
avoiding storing sensitive data directly in the file.

Comment on lines +1 to +2
- name: generate hostsfile
template: src=hosts.j2 dest=/etc/hosts
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Specify owner, group and mode for /etc/hosts.

Relying on the template module defaults can leave file permissions inconsistent across distros or after prior manual edits. Add explicit settings to guarantee idempotence.

-  template: src=hosts.j2 dest=/etc/hosts
+  template:
+    src: hosts.j2
+    dest: /etc/hosts
+    owner: root
+    group: root
+    mode: '0644'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: generate hostsfile
template: src=hosts.j2 dest=/etc/hosts
- name: generate hostsfile
template:
src: hosts.j2
dest: /etc/hosts
owner: root
group: root
mode: '0644'
🤖 Prompt for AI Agents
In roles/userserver/tasks/hosts.yml at lines 1 to 2, the template task
generating /etc/hosts lacks explicit owner, group, and mode settings, which can
cause inconsistent file permissions. Add parameters to the template module
specifying the owner (e.g., root), group (e.g., root), and mode (e.g., 0644) to
ensure consistent and idempotent file permissions across environments.

Comment on lines 1 to +4
- hosts:
- computenodes
- workstations
- userserver
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Group name typo breaks targeting.

userserver here does not match the new inventory group userservers (plural) used elsewhere, so this play never runs on the intended hosts.

-  - userserver
+  - userservers
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- hosts:
- computenodes
- workstations
- userserver
- hosts:
- computenodes
- workstations
- userservers
🤖 Prompt for AI Agents
In initial-setup.yml at lines 1 to 4, the group name 'userserver' is incorrect
and should be changed to 'userservers' to match the inventory group used
elsewhere. Update the hosts list to replace 'userserver' with 'userservers' so
the play targets the correct hosts.

🛠️ Refactor suggestion

hosts: expects a scalar, not a YAML list.

Although some YAML parsers tolerate it, Ansible expects hosts to be a single string pattern. Join the groups with a comma for portability.

- hosts:
-  - computenodes
-  - workstations
-  - userservers
+ hosts: computenodes,workstations,userservers

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In initial-setup.yml at lines 1 to 4, the hosts field is incorrectly specified
as a YAML list, but Ansible requires hosts to be a single string pattern.
Replace the list under hosts with a single string where the group names are
joined by commas, like "computenodes,workstations,userserver", to ensure
compatibility and correct parsing.

Comment on lines +1 to +10
/tank 172.16.67.0/24(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
127.0.0.1(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
172.16.67.231(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
10.116.134.0/24(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
10.116.130.0/24(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
cichm*(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
cicws*(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
ciccs*(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
dnpws*(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
cicss*(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Duplicate fsid=0 options can confuse exportfs.

fsid=0 should appear only once per export path. Repeating it for every client spec may trigger warnings or undefined behaviour. Keep it on the first entry and drop from the continuations.

-/tank 172.16.67.0/24(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
-      127.0.0.1(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
+/tank 172.16.67.0/24(rw,async,fsid=0,crossmnt,no_subtree_check,root_squash) \
+      127.0.0.1(rw,async,crossmnt,no_subtree_check,root_squash) \
       172.16.67.231(rw,async,crossmnt,no_subtree_check,root_squash) \
       10.116.134.0/24(rw,async,crossmnt,no_subtree_check,root_squash) \
       10.116.130.0/24(rw,async,crossmnt,no_subtree_check,root_squash) \
       cichm*(rw,async,crossmnt,no_subtree_check,root_squash) \
       cicws*(rw,async,crossmnt,no_subtree_check,root_squash) \
       ciccs*(rw,async,crossmnt,no_subtree_check,root_squash) \
       dnpws*(rw,async,crossmnt,no_subtree_check,root_squash) \
       cicss*(rw,async,crossmnt,no_subtree_check,root_squash)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In files/users.exports lines 1 to 10, the option fsid=0 is duplicated for every
client specification under the same export path, which can cause warnings or
undefined behavior in exportfs. To fix this, keep fsid=0 only on the first
client entry for the /tank export and remove fsid=0 from all subsequent client
entries on the continuation lines.

Comment on lines +26 to +30
- name: watchdog enable
lineinfile: dest=/etc/default/watchdog regexp=^run_watchdog line=run_watchdog=1
notify:
- restart watchdog

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handlers are commented out – configuration changes never take effect

The two watchdog-related tasks add/replace config but the notify: block is commented. Unless another task restarts watchdog, the service keeps the old config.

Uncomment the notifications or add a single final task that restarts watchdog if any of these files changed.

Also applies to: 36-45

🤖 Prompt for AI Agents
In roles/userserver/tasks/file-config.yml around lines 26 to 30 and 36 to 45,
the notify block that triggers the watchdog service restart is commented out, so
configuration changes do not take effect. Uncomment the notify block in these
tasks or add a final task that restarts the watchdog service if any of the
related configuration files were changed to ensure the new settings are applied.

Comment on lines +1 to +9
- name: enable unattended upgrades
copy: src=files/50unattended-upgrades dest=/etc/apt/apt.conf.d/

- name: enable periodic upgrades
copy: src=files/10periodic dest=/etc/apt/apt.conf.d/

- name: enable apt proxy
copy: src=files/99proxy dest=/etc/apt/apt.conf.d/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Drop the files/ prefix for role-local copies

Inside a role, copy: automatically looks in the role’s files directory. Prefixing the path with files/ creates roles/userserver/files/files/… at runtime, which will 404.

-  copy: src=files/50unattended-upgrades dest=/etc/apt/apt.conf.d/
+  copy: src=50unattended-upgrades dest=/etc/apt/apt.conf.d/

Repeat for every other copy using the same pattern in this task file.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: enable unattended upgrades
copy: src=files/50unattended-upgrades dest=/etc/apt/apt.conf.d/
- name: enable periodic upgrades
copy: src=files/10periodic dest=/etc/apt/apt.conf.d/
- name: enable apt proxy
copy: src=files/99proxy dest=/etc/apt/apt.conf.d/
- name: enable unattended upgrades
copy: src=50unattended-upgrades dest=/etc/apt/apt.conf.d/
- name: enable periodic upgrades
copy: src=10periodic dest=/etc/apt/apt.conf.d/
- name: enable apt proxy
copy: src=99proxy dest=/etc/apt/apt.conf.d/
🤖 Prompt for AI Agents
In roles/userserver/tasks/file-config.yml lines 1 to 9, remove the 'files/'
prefix from all src paths in the copy tasks because Ansible automatically looks
in the role's files directory. Change src=files/50unattended-upgrades to
src=50unattended-upgrades, src=files/10periodic to src=10periodic, and
src=files/99proxy to src=99proxy to avoid incorrect file path resolution.

Comment on lines +32 to +48
- name: Ensure FRP version directory exists
file:
path: /opt/frp/frp_0.53.2_linux_amd64
state: directory
owner: root
group: frpclient
mode: '0770'

- name: Extract FRP
unarchive:
src: /opt/frp/frp_0.53.2_linux_amd64.tar.gz
dest: /opt/frp/frp_0.53.2_linux_amd64
remote_src: yes
creates: /opt/frp/frp_0.53.2_linux_amd64/frpc
owner: root
group: frpclient
mode: '0770'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Extract path will nest the archive one level too deep

The tarball already contains a top-level directory frp_0.53.2_linux_amd64.
Extracting it into the directory of the same name yields:

/opt/frp/frp_0.53.2_linux_amd64/frp_0.53.2_linux_amd64/...

Subsequent tasks expect /opt/frp/frp_0.53.2_linux_amd64/frpc, so the play will loop forever or fail.

-  dest: /opt/frp/frp_0.53.2_linux_amd64
+  dest: /opt/frp/frp_0.53.2_linux_amd64
+  extra_opts: ["--strip-components=1"]

Add --strip-components=1 or extract to /opt/frp instead.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Ensure FRP version directory exists
file:
path: /opt/frp/frp_0.53.2_linux_amd64
state: directory
owner: root
group: frpclient
mode: '0770'
- name: Extract FRP
unarchive:
src: /opt/frp/frp_0.53.2_linux_amd64.tar.gz
dest: /opt/frp/frp_0.53.2_linux_amd64
remote_src: yes
creates: /opt/frp/frp_0.53.2_linux_amd64/frpc
owner: root
group: frpclient
mode: '0770'
- name: Ensure FRP version directory exists
file:
path: /opt/frp/frp_0.53.2_linux_amd64
state: directory
owner: root
group: frpclient
mode: '0770'
- name: Extract FRP
unarchive:
src: /opt/frp/frp_0.53.2_linux_amd64.tar.gz
dest: /opt/frp/frp_0.53.2_linux_amd64
extra_opts: ["--strip-components=1"]
remote_src: yes
creates: /opt/frp/frp_0.53.2_linux_amd64/frpc
owner: root
group: frpclient
mode: '0770'
🤖 Prompt for AI Agents
In roles/userserver/tasks/frpc.yml around lines 32 to 48, the unarchive task
extracts the tarball into a directory that already exists with the same name,
causing a nested directory structure that breaks subsequent tasks expecting the
binary at a specific path. Fix this by adding the option to strip the top-level
directory from the archive during extraction (e.g., using --strip-components=1)
or change the extraction destination to /opt/frp so the files are placed
correctly without nesting.

Comment on lines +51 to +63
- name: Verify that slurm directories exist
file:
path: "{{ item }}"
state: directory
owner: slurm
group: slurm
mode: '0755'
loop:
- /opt/slurm
- /opt/slurm/logs
- /opt/slurm/etc
- /var/spool/slurmctld

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

slurm user/group are referenced before creation

The file: loop sets ownership to slurm:slurm, but no user/group is created earlier.
Add:

- name: Ensure slurm user and group exist
  ansible.builtin.user:
    name: slurm
    system: yes
    shell: /usr/sbin/nologin

before this block.

🤖 Prompt for AI Agents
In roles/userserver/tasks/slurm.yml around lines 51 to 63, the task sets
ownership of directories to the slurm user and group, but the slurm user and
group are not created beforehand. To fix this, add a task before this block that
ensures the slurm user and group exist by using the ansible.builtin.user module
with parameters name: slurm, system: yes, and shell: /usr/sbin/nologin.

Comment on lines +104 to +111
- name: Copy slurmctld and slurmdbd service files
copy:
src: "{{ item }}"
dest: /etc/systemd/system/
loop:
- files/slurmctld.service
- files/slurmdbd.service
notify: reload systemd
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use role-relative paths for service units

src: files/slurmctld.service assumes the file is under the playbook root.
When these tasks live in a role, drop the prefix:

-    src: "{{ item }}"
+    src: "{{ item | basename }}"

and set the loop list to just the filenames.

🤖 Prompt for AI Agents
In roles/userserver/tasks/slurm.yml around lines 104 to 111, the src path for
copying service files uses a relative path assuming the playbook root, which is
incorrect for role tasks. Remove the 'files/' prefix from the src parameter and
update the loop list to only include the filenames 'slurmctld.service' and
'slurmdbd.service' so the paths are correctly resolved relative to the role's
files directory.

Copy link
Contributor

@gdevenyi gdevenyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be pared way back.

  • remove all the repo additions
  • remove all package installation but the packages mandatory for the services needed
  • update the frp version to latest (all other servers need this too)
  • vault the frp keys (already rotated see servers)
  • no ups attached to these
  • no apptainer
  • don't modify grub config
  • update ntp to ca.pool.ntp.org
  • Slurm from new deb packages, so no need for build scripts
  • password required for sudoers
  • drop adding hosts file
  • drop watchdog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants