A comprehensive SLURM command emulator with time manipulation capabilities for testing periodic limits and decay calculations.
- 🎮 Interactive CLI - Full command-line interface with time travel
- ⏰ Time Manipulation - Advance time by days, months, or quarters
- 💾 Usage Injection - Add specific node-hour usage at any time point
- 🔄 Decay Calculations - 15-day half-life fairshare decay simulation
- 🎯 QoS Management - Threshold-based QoS switching (normal → slowdown → blocked)
- 📊 Periodic Limits - Quarterly allocation with carryover logic
- 🎬 Scenario Runner - Complete SLURM_PERIODIC_LIMITS_SEQUENCE.md validation
- 🔌 API Integration - REST API for waldur-site-agent integration
- 💾 State Management - Checkpoint/restore functionality for testing
# Clone the repository
git clone https://github.com/waldur/slurm-emulator.git
cd slurm-emulator
# Install dependencies using uv
uv sync# Run with default configuration
uv run slurm-emulator
# Run with SLURM configuration file
uv run slurm-emulator --config examples/slurm.conf
# Advanced features work the same way
uv run slurm-emulator --config examples/custom_slurm.conf
# Validate configuration only
uv run slurm-emulator --validate-only --config /etc/slurm/slurm.conf
🎮 SLURM Emulator - Time Travel Edition (CMD Interface)
Type 'help' or '?' for commands. TAB for auto-completion.
Type 'help <command>' for detailed help on specific commands.
slurm-emulator> help
# Shows all available commands
slurm-emulator> help time_advance
# Shows detailed help for specific command
slurm-emulator> time_advance 2 months
⏭️ Advanced 2 months
slurm-emulator> account_create test "Test Account" 1000
✅ Created account test with 1000Nh allocation
slurm-emulator> account create test-account "Test Account" 1000
✅ Created account test-account with 1000Nh allocation
slurm-emulator> usage inject user1 200 test-account
💾 Injected 200.0Nh usage for user1 in test-account at 2024-01-01 00:00:00
slurm-emulator> time advance 2 months
⏭️ Advanced 2 months
⏰ New time: 2024-03-01 00:00:00
slurm-emulator> usage inject user1 400 test-account
💾 Injected 400.0Nh usage for user1 in test-account at 2024-03-01 00:00:00
slurm-emulator> limits calculate test-account
📊 Periodic Limits for test-account:
Period: 2024-Q1
Base allocation: 1000Nh
Total allocation: 1000.0Nh
Fairshare: 333
QoS threshold: 1200.0Nh
Grace limit: 1200.0Nh
Billing minutes: 60000Run the full scenario from SLURM_PERIODIC_LIMITS_SEQUENCE.md:
slurm-emulator> scenario run sequence --interactive
🎬 Starting SLURM Periodic Limits Sequence Scenario
============================================================
⏸️ Press Enter to execute Step 1: Initial Q1 setup...
📍 Step 1: Initial Q1 2024 Setup
Setting up 1000Nh quarterly allocation with 20% grace period
⚖️ Set fairshare to 333
🚫 Set GrpTRESMins to 72000 billing-minutes
🎯 QoS threshold set to 1200.0Nh
💾 Checkpoint 'initial_setup' created
# ... continues through all 9 steps of the sequenceThe emulator intercepts and emulates real SLURM commands:
slurm-emulator> sacctmgr add account test-account description="Test"
Adding Account(s)
test-account
Settings
Parent = root
Description = Test
slurm-emulator> sacctmgr modify account test-account set fairshare=333
Modified account...
test-account
Settings
fairshare=333
slurm-emulator> sacctmgr modify account test-account set GrpTRESMins=billing=72000
Modified account...
test-account
Settings
GrpTRESMins=billing=72000
slurm-emulator> sacct --accounts=test-account --starttime=2024-01-01 --endtime=2024-12-31
test-account|cpu=12800,mem=102400,gres/gpu=800|08:00:00|user1Start the API server for waldur-site-agent integration:
# From the slurm-emulator directory
uv run uvicorn emulator.api.emulator_server:app --host 0.0.0.0 --port 8080POST /api/apply-periodic-settings- Apply periodic limits settingsPOST /api/downscale-resource- Set QoS to slowdownPOST /api/restore-resource- Restore QoS to normalPOST /api/submit-report- Submit usage reportsGET /api/status- Get emulator statusPOST /api/time/advance- Advance emulator time
# Apply periodic settings (from Waldur Mastermind)
curl -X POST http://localhost:8080/api/apply-periodic-settings \\
-H "Content-Type: application/json" \\
-d '{
"resource_id": "slurm_account_123",
"fairshare": 333,
"grp_tres_mins": {"billing": 72000},
"qos_threshold": {"billing": 1000}
}'
# Submit usage report (from site agent)
curl -X POST http://localhost:8080/api/submit-report \\
-H "Content-Type: application/json" \\
-d '{
"resource_id": "slurm_account_123",
"usage": {"billing": 167},
"billing_period": "2024-01-01",
"date": "2024-01-31T23:59:59Z",
"users": {
"user1": {"billing": 100},
"user2": {"billing": 67}
}
}'
# Advance time for testing
curl -X POST "http://localhost:8080/api/time/advance?months=3"Configure waldur-site-agent to use the emulator:
# waldur-site-agent-config.yaml
offerings:
- name: "SLURM HPC Cluster - Emulator"
backend_type: "slurm"
backend_settings:
# Enable emulator mode
emulator_mode: true
emulator_base_url: "http://localhost:8080"
# Override SLURM commands to use emulator
command_prefix: ["python", "/path/to/slurm-emulator/emulator/commands/dispatcher.py"]
# Periodic limits configuration
periodic_limits:
enabled: true
limit_type: "GrpTRESMins"
tres_billing_enabled: true
tres_billing_weights:
CPU: 0.015625
Mem: 0.001953125G
"GRES/gpu": 0.25
fairshare_decay_half_life: 15
api_endpoints:
apply_periodic_settings: "http://localhost:8080/api/apply-periodic-settings"
downscale_resource: "http://localhost:8080/api/downscale-resource"
restore_resource: "http://localhost:8080/api/restore-resource"The emulator now supports real SLURM configuration files to match actual deployment behavior:
# Use system SLURM configuration
uv run slurm-emulator --config /etc/slurm/slurm.conf
# Use custom configuration
uv run slurm-emulator --config examples/slurm.conf
# Validate configuration
uv run slurm-emulator --validate-only --config slurm.confThe emulator parses and applies these SLURM configuration parameters:
Priority and Decay Settings:
PriorityDecayHalfLife- Fairshare decay half-life (e.g., "15-00:00:00")PriorityUsageResetPeriod- Usage reset period ("None" for manual reset)PriorityWeightFairShare- Fairshare weight for priority calculationsPriorityWeightQOS- QoS weight for priority calculationsFairShareDampeningFactor- Dampening factor for fairshare
TRES Billing:
TRESBillingWeights- Billing weights (e.g., "CPU=0.015625,Mem=0.001953125G,GRES/gpu=0.25")
Priority Flags:
PriorityFlags- Priority calculation flags (e.g., "MAX_TRES,NO_NORMAL_ASSOC")
# SLURM Configuration
PriorityDecayHalfLife = 15-00:00:00
PriorityUsageResetPeriod = None # manual reset via sacctmgr RawUsage=0
PriorityWeightFairShare = 259200
PriorityWeightQOS = 500000
FairShareDampeningFactor = 3
TRESBillingWeights="CPU=0.015625,Mem=0.001953125G,GRES/gpu=0.25"
PriorityFlags=MAX_TRES,NO_NORMAL_ASSOCThe emulator implements SLURM's fairshare decay using the configured half-life:
# Decay formula matches SLURM's implementation
decay_factor = 2 ** (-days_elapsed / half_life_days)
# With default 15-day half-life, after 90 days (1 quarter):
decay_factor = 2 ** (-90 / 15) = 0.0156 (1.56%)
# With 7-day half-life, after 90 days:
decay_factor = 2 ** (-90 / 7) = 0.000135 (0.01%)Example with 15-day half-life: User consumes 2000 hours in Q1. After Q1 ends (90 days later):
- Original impact: 2000 hours
- Decayed impact: 2000 × 0.0156 = 31 hours equivalent
- Q2 allocation: 1000 + (1000 - 31) = 1969 hours available
time_show # Show current time and period
time_advance <amount> <unit> # Advance time (units: days, months, quarters)
time_set YYYY-MM-DD [HH:MM:SS] # Set specific date/time
# Examples:
time_advance 2 months
time_advance 30 days
time_set 2024-05-20usage_inject <user> <amount> [account] # Inject node-hour usage
usage_show [account] [period] # Show usage summary with user breakdown
# Examples:
usage_inject user1 200 test-account
usage_show test-account
usage_show test-account 2024-Q1account_create <name> [description] [allocation] # Create account
account_list # List all accounts with status
account_show <name> # Show detailed account info
account_delete <name> # Delete account
# Examples:
account_create test "Test Account" 1000
account_show test
account_listqos_show [account] # Show QoS status and details
qos_set <account> <qos> # Set QoS level (normal/slowdown/blocked)
qos_check [account] # Check thresholds and auto-update QoS
# Examples:
qos_check test-account
qos_set test-account slowdown
qos_show test-accountlimits_calculate [account] # Calculate and display periodic limits
# Example:
limits_calculate test-accountscenario_list [type] # List scenarios (optionally filter by type)
scenario_describe <name> # Show detailed description and learning objectives
scenario_steps <name> # Show step-by-step command breakdown
scenario_run <name> # Run scenario automatically
scenario_run <name> --interactive # Run with confirmation prompts
scenario_run <name> --step-by-step # Run with detailed step output
scenario_search <query> # Search scenarios by keyword
# Examples:
scenario_list qos_management
scenario_describe qos_thresholds
scenario_run qos_thresholds --step-by-step
scenario_search decayconfig_show # Show current SLURM configuration
config_reload <path> # Hot-reload configuration file
# Examples:
config_show
config_reload examples/slurm.confcleanup_all # Clean all accounts and reset to fresh state
cleanup_scenario <name> # Clean specific scenario accounts
cleanup_account <name> # Clean specific account completely
# Examples:
cleanup_all
cleanup_scenario qos_thresholds
cleanup_account test-accountsacctmgr <args> # Run sacctmgr command
sacct <args> # Run sacct command
sinfo <args> # Run sinfo command
# Examples:
sacctmgr list accounts
sacctmgr modify account test set fairshare=333
sacct --accounts=test --format=Account,User,Elapsed# Setup with specific configuration
uv run slurm-emulator --config examples/slurm.conf
# In emulator CLI:
time set 2024-01-01
account create test-account "Test" 1000
# Month 1: Light usage
usage inject user1 100 test-account
time advance 1 months
# Month 2: Heavy usage
usage inject user1 600 test-account
limits calculate test-account
qos check test-account
# Quarter transition
time advance 1 months
limits apply test-account# Test different decay rates
uv run slurm-emulator --config examples/custom_slurm.conf
# Compare configurations
uv run slurm-emulator --validate-only --config examples/slurm.conf
uv run slurm-emulator --validate-only --config examples/custom_slurm.conf# Q1: Heavy usage
time set 2024-01-01
account create test-account "Test" 1000
usage inject user1 1500 test-account
# Q2: Check decay impact
time set 2024-04-01
limits calculate test-account
# Should show ~23Nh effective previous usage (1500 * 0.0156)# Setup with 1000Nh allocation (1200Nh threshold with 20% grace)
account create test-account "Test" 1000
qos show test-account # Should show "normal"
usage inject user1 1100 test-account
qos check test-account # Should show approaching threshold
usage inject user1 200 test-account # Total: 1300Nh
qos check test-account # Should trigger slowdown QoSslurm-emulator/
├── emulator/
│ ├── core/
│ │ ├── time_engine.py # Time manipulation
│ │ ├── database.py # In-memory state
│ │ ├── slurm_config.py # SLURM config parsing
│ │ └── usage_simulator.py # Usage injection
│ ├── commands/
│ │ ├── sacctmgr.py # sacctmgr emulator
│ │ ├── sacct.py # sacct emulator
│ │ └── dispatcher.py # Command routing
│ ├── periodic_limits/
│ │ ├── calculator.py # Decay & carryover
│ │ └── qos_manager.py # QoS management
│ ├── scenarios/
│ │ ├── sequence_scenario.py # Complete scenario
│ │ ├── scenario_registry.py # Scenario discovery & running
│ │ └── limits_configuration_scenarios.py
│ ├── cli/
│ │ ├── main.py # Interactive CLI
│ │ └── cmd_cli.py # CMD-based CLI
│ └── api/
│ └── emulator_server.py # REST API
├── scripts/
│ ├── release.py # Release management
│ ├── changelog.sh # Changelog generation
│ ├── generate_changelog_data.py # Commit data collection
│ └── prompts/
│ └── changelog-prompt.md # Changelog prompt template
└── tests/ # Test suites
uv run pytest# Full release: update version, generate changelog, tag, push
uv run scripts/release.py release X.Y.Z
# Skip changelog generation
uv run scripts/release.py release X.Y.Z --skip-changelogPushing the tag triggers GitHub Actions for testing and PyPI publishing.
# Create new scenario class
class CustomScenario:
def __init__(self, time_engine, database):
self.time_engine = time_engine
self.database = database
def run_scenario(self):
# Implement scenario steps
pass# Add new SLURM command support
class NewCommandEmulator:
def handle_command(self, args):
# Implement command logic
return "command output"Emulator state is saved to:
/tmp/slurm_emulator_time.json- Current time/tmp/slurm_emulator_db.json- Database state
"Account not found": Create account first with account create
"No usage records": Inject usage with usage inject
"Time not advancing": Check time with time command
"API connection failed": Ensure server is running on port 8080
rm /tmp/slurm_emulator_*.json
slurm-emulator
# Start freshMIT License - See LICENSE file for details.