Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 21, 2025

XRootD Restart/Reconfig Implementation Plan

  • Understand current XRootD lifecycle management
    • Review launch.go for daemon startup
    • Review daemon/launch_unix.go for process management
    • Review launchers/origin_serve.go and cache_serve.go
    • Review commsocket and communication mechanisms
    • Identify PIDs storage in OriginServer and CacheServer
  • Design restart/reconfig mechanism
    • Add thread-safe restart mutex to prevent concurrent restarts
    • Create restart functions in xrootd package
    • Handle SIGTERM properly to distinguish from shutdown
    • Implement reconfig of xrootd runtime directory on restart
    • Update monitoring goroutines to handle process restart
  • Implement core restart functionality
    • Add RestartXrootd function with mutex protection
    • Implement graceful shutdown of existing processes
    • Reconfigure runtime directory (call ConfigXrootd)
    • Relaunch daemons with updated configuration
    • Update PIDs in server instances via RestartServer helper
  • Update XRootDServer interface
    • Add Restart method to interface (Removed - abandoned approach)
    • Implement in OriginServer (Removed - abandoned approach)
    • Implement in CacheServer (Removed - abandoned approach)
  • Handle monitoring goroutines
    • Ensure monitoring continues after restart (existing code handles this)
    • Handle SIGTERM from restart vs actual shutdown (graceful shutdown implemented)
  • Add tests
    • Create unit test for restart mechanism
    • Test concurrent restart protection
    • Verify in federation tests (e2e_fed_tests)
    • Test monitoring continuation after restart (implicit through file access tests)
  • Documentation
    • Add code comments explaining restart flow
    • Document thread-safety guarantees

Summary

This PR implements XRootD restart/reconfig functionality as requested in the issue. The implementation includes:

  1. Thread-Safe Restart Mechanism: Added RestartXrootd() function with mutex protection to prevent concurrent restart attempts.

  2. Graceful Shutdown: Processes are first sent SIGTERM for graceful shutdown, with a configurable timeout (via Xrootd.ShutdownTimeout). Any processes that don't respond are force-killed with SIGKILL.

  3. Runtime Reconfiguration: On restart, the XRootD runtime directory is reconfigured by calling ConfigXrootd(), ensuring any configuration changes are applied.

  4. Process Tracking: PIDs are properly updated in both OriginServer and CacheServer via the RestartServer() helper function, avoiding circular dependency issues.

  5. Monitoring Continuity: Existing monitoring goroutines continue to function after restart without modification, as they monitor server health independently of PIDs.

  6. Comprehensive Testing: Includes unit tests for mutex protection and basic restart flow, plus e2e federation tests that verify file access works before and after restart.

  7. Windows Compatibility: Stub implementations for Windows indicate restart is not supported on that platform.

The implementation follows the existing patterns in the codebase and makes minimal changes to achieve the goal.

Usage

To restart XRootD from code:

// Direct restart with PID tracking (from xrootd package or higher-level code)
newPids, err := xrootd.RestartXrootd(ctx, oldPids)

// Or using the helper to update server PIDs automatically
err := xrootd.RestartServer(ctx, originOrCacheServer)

The restart function is thread-safe and will return an error if another restart is already in progress.

Original prompt

This section details on the original issue you should resolve

<issue_title>Allow XRootD restart/reconfig</issue_title>
<issue_description>In #2895, @copilot cowardly refused to implement a restart (and implicitly, a reconfig) of XRootD.

To reduce the complexity of that PR, let's carve out the restart as a separate line of work.

Make it so the XRootD server can be shutdown and restarted (this must be thread-safe to prevent concurrent restart requests). The xrootd monitoring goroutines should understand this has occurred (you'll need to SIGTERM the processes which could cause them to think that pelican itself should shutdown. On restart, we should go through and reconfig the xrootd runtime directory as well.

Make sure to use the built-in federation tests to ensure xrootd restarting works.</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Allow thread-safe restart and reconfig of XRootD server Implement XRootD restart/reconfig with thread-safe process management Dec 21, 2025
Copilot AI requested a review from bbockelm December 21, 2025 03:52
Copilot AI requested a review from bbockelm December 21, 2025 05:29
@bbockelm bbockelm added enhancement New feature or request cache Issue relating to the cache component origin Issue relating to the origin component labels Dec 21, 2025
@bbockelm bbockelm force-pushed the copilot/allow-xrootd-restart-reconfig branch 2 times, most recently from 1b1efa1 to 90cb9cc Compare December 21, 2025 14:17
Copilot AI and others added 2 commits December 21, 2025 12:05
- Add restart.go with RestartXrootd() and RestartServer() functions
- Add thread-safe restart mechanism with mutex
- Implement graceful shutdown with SIGTERM followed by SIGKILL
- Reconfigure XRootD runtime directory on restart
- Add Restart() method to XRootDServer interface
- Implement Restart() in OriginServer and CacheServer
- Update launchers to store restart info via StoreRestartInfo()
- Add Windows stub for restart functionality
- Update mockServer in tests to implement Restart() method
@bbockelm bbockelm force-pushed the copilot/allow-xrootd-restart-reconfig branch from 5a1ccd6 to 53d0729 Compare December 21, 2025 18:05
Copy link
Collaborator

@bbockelm bbockelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@bbockelm bbockelm marked this pull request as ready for review December 21, 2025 18:21
@bbockelm
Copy link
Collaborator

Last piece before the really fun one - dynamic log levels!

@bbockelm bbockelm merged commit e9aec41 into main Dec 21, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cache Issue relating to the cache component enhancement New feature or request origin Issue relating to the origin component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow XRootD restart/reconfig

2 participants