feat: add heartbeat feature to auto-terminate program on failure#425
feat: add heartbeat feature to auto-terminate program on failure#425wen2so wants to merge 3 commits intonam20485:developmentfrom
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @wen2so, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a critical reliability feature to the OdbDesignServer by implementing a heartbeat mechanism. It ensures the server's responsiveness by adding a new API endpoint that updates an internal timestamp, coupled with a background monitor that automatically terminates the application if it becomes unresponsive. This proactive self-termination helps maintain system stability and resource integrity.
Highlights
- Heartbeat Endpoint: I've added a new
/healthz/heartbeatendpoint to theHealthCheckController. Calling this endpoint updates an internal timestamp, signaling that the server is active and responsive. - Heartbeat Monitoring: A new background thread has been introduced in
OdbDesignServerAppthat continuously monitors thelastHeartbeat_timestamp. This thread checks every second if the timestamp has been updated recently. - Application Self-Termination: If the
lastHeartbeat_timestamp is not updated for more than 5 seconds, themonitorHeartbeatthread will automatically terminate the application by callingexit(0). This ensures that unresponsive server instances are automatically shut down. - Health Check Response Simplification: The response messages for existing
/healthz/live,/healthz/ready, and/healthz/startedendpoints have been simplified, removing the 'healthy: ' prefix for cleaner output. - API Documentation Update: The
swagger/odbdesign-server-0.9-swagger.yamlfile has been updated to include the definition for the new/healthz/heartbeatGET endpoint.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
Detect heartbeat status every 5 seconds. If no heartbeat is received within 5 seconds, the program will exit automatically.
There was a problem hiding this comment.
Code Review
This pull request introduces a heartbeat mechanism to monitor the server's health and terminate it if it becomes unresponsive. The implementation involves a new heartbeat endpoint and a background thread to check the heartbeat status. The review focuses on improving the design by ensuring a graceful shutdown instead of an abrupt exit, and improving code clarity by replacing magic numbers with named constants. These changes will make the code more robust and maintainable.
| if (diff > std::chrono::seconds(5)) | ||
| { | ||
| std::cerr << "Heartbeat timeout, exiting..." << std::endl; | ||
| exit(0); |
There was a problem hiding this comment.
Calling exit(0) will cause an immediate, forceful termination of the process. This prevents proper cleanup. A more graceful shutdown can be achieved by stopping the Crow application1. This allows the run() method to return and the application to perform any necessary cleanup before exiting.
Style Guide References
| exit(0); | |
| OdbDesignServerApp::inst_->crow_app().stop(); | |
| return; |
Footnotes
| static OdbDesignServerApp* inst_; | ||
| // store last heartbeat time | ||
| std::atomic<std::chrono::steady_clock::time_point> lastHeartbeat_; |
There was a problem hiding this comment.
These members (inst_ and lastHeartbeat_) are public, which breaks encapsulation. The inst_ static pointer, in particular, implements a singleton pattern via a public member, which introduces global state and makes code difficult to test and reason about1. These should be made private.
Style Guide References
Footnotes
| crow::response HealthCheckController::health_check_heartbeat(const crow::request& req) | ||
| { | ||
| // update last heartbeat | ||
| OdbDesignServerApp::inst_->updateLastHeartbeat(); |
There was a problem hiding this comment.
This line uses the global singleton inst_ to access the application object. This creates a tight coupling and dependency on a global variable1. The RouteController base class already provides access to the server application via the m_serverApp member. You can use dynamic_cast to safely cast it to your specific application class and call the method. This approach uses dependency injection and avoids global state, which is a much cleaner design.
Style Guide References
| OdbDesignServerApp::inst_->updateLastHeartbeat(); | |
| dynamic_cast<OdbDesignServerApp&>(m_serverApp).updateLastHeartbeat(); |
Footnotes
There was a problem hiding this comment.
As Gemini states, you already have access to m_serverApp. You can use that and get rid of the singleton object.
| auto lastTime = OdbDesignServerApp::inst_->lastHeartbeat_.load(std::memory_order_relaxed); | ||
| while (true) | ||
| { | ||
| std::this_thread::sleep_for(std::chrono::seconds(1)); | ||
| auto now = std::chrono::steady_clock::now(); | ||
| auto diff = now - lastTime; | ||
| // check heartbeat | ||
| if (diff > std::chrono::seconds(5)) | ||
| { | ||
| std::cerr << "Heartbeat timeout, exiting..." << std::endl; | ||
| exit(0); | ||
| } | ||
| lastTime = OdbDesignServerApp::inst_->lastHeartbeat_.load(std::memory_order_relaxed); | ||
| } |
There was a problem hiding this comment.
This function uses magic numbers for time intervals (1 for the sleep duration on line 59 and 5 for the timeout on line 63). It's a best practice to define these as named constants to improve readability and maintainability1.
Style Guide References
Footnotes
|
consider add an args param to enable or disable this feature |
- Add heartbeatInterval() method to OdbDesignArgs class - Implement heartbeat interval configuration in OdbDesignServerApp - Update usage string to include new heartbeat interval option
|
|
nam20485
left a comment
There was a problem hiding this comment.
I don't understand why you need this heartbeat.
Usually it is the orchestration SW's (e.g. Docker or Kubernetes) responsibility to check the health check endpoints (which already exist) and kill the process if they don't respond.
What is your use case for checking and terminating from inside of the process?
| crow::response HealthCheckController::health_check_heartbeat(const crow::request& req) | ||
| { | ||
| // update last heartbeat | ||
| OdbDesignServerApp::inst_->updateLastHeartbeat(); |
There was a problem hiding this comment.
As Gemini states, you already have access to m_serverApp. You can use that and get rid of the singleton object.
|
Considering that this C++ program is intended to be used as a library and invoked by programs written in other languages, and given that in a standalone deployment (without container management tools like Docker), the C++ program must terminate synchronously if the calling program crashes or exits unexpectedly — to avoid running as an orphaned process — a heartbeat mechanism is therefore required. |



Detect heartbeat status every 5 seconds. If no heartbeat is received within 5 seconds, the program will exit automatically.
Implements: #424