Feature Request: Cron monitoring #1805

MasterPuffin · 2025-02-24T11:29:23Z

Hi,
unfortunately I couldn't find any mention of cron monitoring on the website, the Readme or the issue, that's why I would like to ask if cron monitoring or request monitoring (meaning a Rest endpoint is provided by Checkmate which can be queried by other tools) is possible or planned?

gorkem-bwl · 2025-02-24T15:02:51Z

@MasterPuffin can you please provide more information about how it works for you and others, by giving a very simple workflow together with a few use cases? Many thanks.

ajhollid · 2025-02-24T15:11:20Z

@MasterPuffin indeed this is an interesting request, please let us know what you're thinking!

MasterPuffin · 2025-02-24T15:26:45Z

This is an example workflow, which would fit my use case:
I have a cron script that starts running every day at 4:00. In Checkmate I would configure a new Monitor that creates a new monitor. I would also configure an expected time for the checkin. In return I would get an url like https://mycheckmateinstance.com/endpoint/endpointid/

Then I would configure my cronjob as follows:
When the cronjob starts I call https://mycheckmateinstance.com/endpoint/endpointid/start/. This will return an id which I store in a variable for later use.
During the cronjob I call https://mycheckmateinstance.com/endpoint/endpointid/cronid/update/ for example if a section of my cronjob finishes.
Once the cronjob finishes I call https://mycheckmateinstance.com/endpoint/endpointid/cronid/end/ to let Checkmate know the cronjob finished successfully. Alternativly I can call https://mycheckmateinstance.com/endpoint/endpointid/cronid/error/ with my cron jobs error handler to report an issue.

In Checkmate there would be different states for the Cronjob

Cronjob started on time and completed successfully
Cronjob started but timed out after the set timeout
Cronjob started but reported an error
Cronjob didn't start in the configured timeframe

The last three states would be considered an outage and would trigger a notification for example by email.

Interesting at well, would be some kind of statistics, for example past checkins with timestamps so I could examine if one checkin stage requires an abnormal amount of time.

MasterPuffin · 2025-02-24T15:31:41Z

For use cases I have a few examples:

Clear logs every 2 hours. It would be important to know if this cron fails, eg. if there are missing permissions as there the storage could run full
Aggregate data from a database and send them to an user via email, eg. for unread notifications. The job might fail at times because the send limit of the email provider has been reached
Query an endpoint eg. Stripe for missed payments. The job might fail because API credentials are no longer valid or rate limits have been reached

gorkem-bwl · 2025-02-24T15:55:34Z

A few q for understanding the functionality better:

The last three states would be considered an outage and would trigger a notification for example by email.

In "cronjob started and sent a started call, but didn't send a finished call case, would there be a configuration which states how long the system would have to wait before sending out a notification, right?
If a job doesn’t send a heartbeat within the expected timeframe the system should be alerting you. This should also be configurable, right?
The system has to provide logs of all job executions, including timestamps, durations, and exit codes, to help with debugging and auditing, right?

MasterPuffin · 2025-02-24T16:06:36Z

In "cronjob started and sent a started call, but didn't send a finished call case, would there be a configuration which states how long the system would have to wait before sending out a notification, right?

Correct. However I don't quite know if there should be a difference between 'The cronjob has made an initial call but no subsequent calls' and 'The cronjob has made no call at all'

If a job doesn’t send a heartbeat within the expected timeframe the system should be alerting you. This should also be configurable, right?

Correct. It would be nice to configure this on a monitor by monitor basis, eg. for one monitor only notify when 7 calls have been missed and for another monitor notify immediatly.

The system has to provide logs of all job executions, including timestamps, durations, and exit codes, to help with debugging and auditing, right?

In an ideal case yes, however simple monitoring without full history would be a very good first step.

gorkem-bwl · 2025-02-24T18:49:48Z

In an ideal case yes, however simple monitoring without full history would be a very good first step.

In fact keeping a history of everything is way easier than creating settings for each cron monitor, like a configuration about what to do if a particular cron job is expected but not retrieved, or retrieved but late.

For example in order to provide a good solution for this, we need to classify each check like this. Other than New/Up, the system should be able to send a notification to the admin:

New: A check that has been created but hasn't received any pings yet.
Up: The check is considered active as long as the time since the last ping remains within the defined "Period".
Late: The time since the last ping has exceeded Period, but it is still within the additional "Grace".
Down: The time since the last ping has surpassed both Period and Grace, marking the check as failed. When this happens, Healthchecks.io sends a notification.

Let's think about this and keep this issue open for now. It may require some changes in the backend (apart from cron monitoring related configs/API endpoint creation).

By the way, I'm starting to feel that calling it "cron monitoring" is too Linux crontab-specific. It can actually be configured to retrieve data from any source and check for errors or inconsistencies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Cron monitoring #1805

Feature Request: Cron monitoring #1805

MasterPuffin commented Feb 24, 2025

gorkem-bwl commented Feb 24, 2025

ajhollid commented Feb 24, 2025

MasterPuffin commented Feb 24, 2025

MasterPuffin commented Feb 24, 2025

gorkem-bwl commented Feb 24, 2025

MasterPuffin commented Feb 24, 2025

gorkem-bwl commented Feb 24, 2025

Feature Request: Cron monitoring #1805

Feature Request: Cron monitoring #1805

Comments

MasterPuffin commented Feb 24, 2025

gorkem-bwl commented Feb 24, 2025

ajhollid commented Feb 24, 2025

MasterPuffin commented Feb 24, 2025

MasterPuffin commented Feb 24, 2025

gorkem-bwl commented Feb 24, 2025

MasterPuffin commented Feb 24, 2025

gorkem-bwl commented Feb 24, 2025