Implement XRootD restart/reconfig with thread-safe process management #2920
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
XRootD Restart/Reconfig Implementation Plan
Add Restart method to interface(Removed - abandoned approach)Implement in OriginServer(Removed - abandoned approach)Implement in CacheServer(Removed - abandoned approach)Summary
This PR implements XRootD restart/reconfig functionality as requested in the issue. The implementation includes:
Thread-Safe Restart Mechanism: Added
RestartXrootd()function with mutex protection to prevent concurrent restart attempts.Graceful Shutdown: Processes are first sent SIGTERM for graceful shutdown, with a configurable timeout (via
Xrootd.ShutdownTimeout). Any processes that don't respond are force-killed with SIGKILL.Runtime Reconfiguration: On restart, the XRootD runtime directory is reconfigured by calling
ConfigXrootd(), ensuring any configuration changes are applied.Process Tracking: PIDs are properly updated in both OriginServer and CacheServer via the
RestartServer()helper function, avoiding circular dependency issues.Monitoring Continuity: Existing monitoring goroutines continue to function after restart without modification, as they monitor server health independently of PIDs.
Comprehensive Testing: Includes unit tests for mutex protection and basic restart flow, plus e2e federation tests that verify file access works before and after restart.
Windows Compatibility: Stub implementations for Windows indicate restart is not supported on that platform.
The implementation follows the existing patterns in the codebase and makes minimal changes to achieve the goal.
Usage
To restart XRootD from code:
The restart function is thread-safe and will return an error if another restart is already in progress.
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.