Skip to content

feat: add heartbeat feature to auto-terminate program on failure#425

Open
wen2so wants to merge 3 commits intonam20485:developmentfrom
wen2so:dev/heartbeat
Open

feat: add heartbeat feature to auto-terminate program on failure#425
wen2so wants to merge 3 commits intonam20485:developmentfrom
wen2so:dev/heartbeat

Conversation

@wen2so
Copy link
Collaborator

@wen2so wen2so commented Jul 17, 2025

Detect heartbeat status every 5 seconds. If no heartbeat is received within 5 seconds, the program will exit automatically.

Implements: #424

@wen2so wen2so requested a review from nam20485 as a code owner July 17, 2025 03:08
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @wen2so, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a critical reliability feature to the OdbDesignServer by implementing a heartbeat mechanism. It ensures the server's responsiveness by adding a new API endpoint that updates an internal timestamp, coupled with a background monitor that automatically terminates the application if it becomes unresponsive. This proactive self-termination helps maintain system stability and resource integrity.

Highlights

  • Heartbeat Endpoint: I've added a new /healthz/heartbeat endpoint to the HealthCheckController. Calling this endpoint updates an internal timestamp, signaling that the server is active and responsive.
  • Heartbeat Monitoring: A new background thread has been introduced in OdbDesignServerApp that continuously monitors the lastHeartbeat_ timestamp. This thread checks every second if the timestamp has been updated recently.
  • Application Self-Termination: If the lastHeartbeat_ timestamp is not updated for more than 5 seconds, the monitorHeartbeat thread will automatically terminate the application by calling exit(0). This ensures that unresponsive server instances are automatically shut down.
  • Health Check Response Simplification: The response messages for existing /healthz/live, /healthz/ready, and /healthz/started endpoints have been simplified, removing the 'healthy: ' prefix for cleaner output.
  • API Documentation Update: The swagger/odbdesign-server-0.9-swagger.yaml file has been updated to include the definition for the new /healthz/heartbeat GET endpoint.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Detect heartbeat status every 5 seconds. If no heartbeat is received within 5 seconds, the program will exit automatically.
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a heartbeat mechanism to monitor the server's health and terminate it if it becomes unresponsive. The implementation involves a new heartbeat endpoint and a background thread to check the heartbeat status. The review focuses on improving the design by ensuring a graceful shutdown instead of an abrupt exit, and improving code clarity by replacing magic numbers with named constants. These changes will make the code more robust and maintainable.

if (diff > std::chrono::seconds(5))
{
std::cerr << "Heartbeat timeout, exiting..." << std::endl;
exit(0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Calling exit(0) will cause an immediate, forceful termination of the process. This prevents proper cleanup. A more graceful shutdown can be achieved by stopping the Crow application1. This allows the run() method to return and the application to perform any necessary cleanup before exiting.

Style Guide References

Suggested change
exit(0);
OdbDesignServerApp::inst_->crow_app().stop();
return;

Footnotes

  1. Use crow_app().stop() for a graceful shutdown. (link)

Comment on lines +17 to +19
static OdbDesignServerApp* inst_;
// store last heartbeat time
std::atomic<std::chrono::steady_clock::time_point> lastHeartbeat_;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These members (inst_ and lastHeartbeat_) are public, which breaks encapsulation. The inst_ static pointer, in particular, implements a singleton pattern via a public member, which introduces global state and makes code difficult to test and reason about1. These should be made private.

Style Guide References

Footnotes

  1. Avoid public members for internal state. (link)

crow::response HealthCheckController::health_check_heartbeat(const crow::request& req)
{
// update last heartbeat
OdbDesignServerApp::inst_->updateLastHeartbeat();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This line uses the global singleton inst_ to access the application object. This creates a tight coupling and dependency on a global variable1. The RouteController base class already provides access to the server application via the m_serverApp member. You can use dynamic_cast to safely cast it to your specific application class and call the method. This approach uses dependency injection and avoids global state, which is a much cleaner design.

Style Guide References

Suggested change
OdbDesignServerApp::inst_->updateLastHeartbeat();
dynamic_cast<OdbDesignServerApp&>(m_serverApp).updateLastHeartbeat();

Footnotes

  1. Avoid global singletons. (link)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Gemini states, you already have access to m_serverApp. You can use that and get rid of the singleton object.

Comment on lines +56 to +69
auto lastTime = OdbDesignServerApp::inst_->lastHeartbeat_.load(std::memory_order_relaxed);
while (true)
{
std::this_thread::sleep_for(std::chrono::seconds(1));
auto now = std::chrono::steady_clock::now();
auto diff = now - lastTime;
// check heartbeat
if (diff > std::chrono::seconds(5))
{
std::cerr << "Heartbeat timeout, exiting..." << std::endl;
exit(0);
}
lastTime = OdbDesignServerApp::inst_->lastHeartbeat_.load(std::memory_order_relaxed);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function uses magic numbers for time intervals (1 for the sleep duration on line 59 and 5 for the timeout on line 63). It's a best practice to define these as named constants to improve readability and maintainability1.

Style Guide References

Footnotes

  1. Define time intervals as named constants. (link)

@wen2so
Copy link
Collaborator Author

wen2so commented Jul 17, 2025

consider add an args param to enable or disable this feature

- Add heartbeatInterval() method to OdbDesignArgs class
- Implement heartbeat interval configuration in OdbDesignServerApp
- Update usage string to include new heartbeat interval option
@sonarqubecloud
Copy link

@sonarqubecloud
Copy link

sonarqubecloud bot commented Sep 2, 2025

Copy link
Owner

@nam20485 nam20485 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you need this heartbeat.

Usually it is the orchestration SW's (e.g. Docker or Kubernetes) responsibility to check the health check endpoints (which already exist) and kill the process if they don't respond.

What is your use case for checking and terminating from inside of the process?

crow::response HealthCheckController::health_check_heartbeat(const crow::request& req)
{
// update last heartbeat
OdbDesignServerApp::inst_->updateLastHeartbeat();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Gemini states, you already have access to m_serverApp. You can use that and get rid of the singleton object.

@wen2so
Copy link
Collaborator Author

wen2so commented Sep 9, 2025

Considering that this C++ program is intended to be used as a library and invoked by programs written in other languages, and given that in a standalone deployment (without container management tools like Docker), the C++ program must terminate synchronously if the calling program crashes or exits unexpectedly — to avoid running as an orphaned process — a heartbeat mechanism is therefore required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants