Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

care partner alerts #715

Open
wants to merge 54 commits into
base: master
Choose a base branch
from
Open

care partner alerts #715

wants to merge 54 commits into from

Conversation

ewollesen
Copy link
Contributor

@ewollesen ewollesen commented May 6, 2024

This used to be a series of PRs, but that didn't really work out. They're all collapsed into this one.

Shouldn't be merged until tidepool-org/go-common#71 is merged, then this should have it's go-common bumped.

@ewollesen ewollesen requested a review from toddkazakov May 6, 2024 15:55
@ewollesen ewollesen removed the request for review from toddkazakov May 8, 2024 19:59
@ewollesen ewollesen force-pushed the eric-cpa-alerts branch 2 times, most recently from 8549c33 to 8367902 Compare June 24, 2024 19:09
@ewollesen ewollesen changed the title adds List and Get methods to alerts client minimal implementation of care partner alerts Jul 9, 2024
@ewollesen ewollesen requested a review from toddkazakov July 9, 2024 22:50
@ewollesen
Copy link
Contributor Author

ewollesen commented Jul 11, 2024

To use this in QA, it must be paired with tidepool-org/hydrophone#145 and tidepool-org/go-common#71

@ewollesen ewollesen force-pushed the eric-cpa-alerts branch 2 times, most recently from c50d589 to 986106b Compare July 11, 2024 22:55
Copy link
Contributor

@toddkazakov toddkazakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but the retry mechanism which is implemented here doesn't satisfy the latency requirements. The current implementation is ok for internal usage, but it's not production ready. This could be handled in a separate PR if this makes the development and QA process easier.

}
handler := asyncevents.NewSaramaConsumerGroupHandler(&asyncevents.NTimesRetryingConsumer{
Consumer: r.Config.MessageConsumer,
Delay: CappedExponentialBinaryDelay(AlertsEventRetryDelayMaximum),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a suitable retry strategy given the latency requirements for this service. Kafka's consumer group concurrency is limited to the number of partitions of the topic. This number cannot be very high because Kafka's memory consumption grows linearly with the number of partitions. From this follows that the number of partitions is much lower than the number of users we will have and the data of multiple users will end up in the same partition. A failure to evaluate a single user's alerts for one minute as currently set by the CappedExponentialBinaryDelay will introduce at least a minute delay to all of the users sharing the same partition, because messages in a single partition are processed serially.

Alert notifications should be near real-time - up to 10 seconds latency is acceptable. I think the solution proposed in this design document is how this should be handled. Other solutions which satisfy the requirements are welcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require some more in-depth thought on my part... Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you're right, let's get this review merged, and I'll work on getting a multiple topic solution set up. Given the flexibility we have now, it shouldn't be too bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the multi-tier retry in the eric-alerts-multi-topic-retry branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be implemented in this branch now.

@ewollesen ewollesen force-pushed the eric-cpa-alerts branch 2 times, most recently from c08e1fc to 967c617 Compare July 19, 2024 16:02
@ewollesen ewollesen requested a review from toddkazakov July 29, 2024 22:41
@ewollesen ewollesen force-pushed the eric-cpa-alerts branch 2 times, most recently from 9432468 to eaa652e Compare September 17, 2024 20:07
toddkazakov
toddkazakov previously approved these changes Sep 18, 2024
toddkazakov
toddkazakov previously approved these changes Oct 5, 2024
@ewollesen
Copy link
Contributor Author

ewollesen commented Oct 7, 2024

@toddkazakov I just removed two config env vars. I believe we talked about that before, but it slipped my mind until I was reviewing the helm chart changes today, where they came up again.

So the re-review here is just around the config parsing, in the most recent commit of the PR, nothing else is changed. [UPDATE] this comment is outdated.

ewollesen added 13 commits March 4, 2025 08:52
Previously, when completing a task, an available time of nil would
cause the task to be marked as failed.

Now, when a task completes and has available time of nil, time.Now() is
substituted, which should cause the task to be run again ASAP.

In addition, if the available time is in the past, it is substituted
with time.Now(), so that it will run again ASAP.

This supports the care partner no communication check, which wants to
run 1x/second, but as that's not available with the task service (the
smallest interval is 5 seconds), setting the value to 1 second
intervals will run the task on each task service iteration.

BACK-2559
This need for this struct went away when retry delays were removed.

BACK-2559
The minimum (and default value) is 5m. Validate that configs specify
either 0, or a value in the range from 5m-6h.

BACK-2559
This minimizes the Kafka load for non production environments.

BACK-2559
As Todd pointed out in code review, there's no need for this to be
separate from the existing DataRepository, as it only queries the
deviceData collection.

BACK-2559
The tasks service and data service can both push alerts for care
partners. This change consolidates that configuration into one set of
environment variables loaded via the alerts package.

I wish that alerts didn't need to know about envconfig, but at least
for the moment it's the only way to consolidate the information about
the configuration into a single re-usable struct.

Naming of the pusher in both services is prefixed with "alerts" to
communicate that this pusher is configured for care partner alerts.

BACK-2559
@ewollesen
Copy link
Contributor Author

Rebased on latest master.

toddkazakov
toddkazakov previously approved these changes Mar 5, 2025
This matches what's done with existing consumer groups.

BACK-2559
@ewollesen ewollesen force-pushed the eric-cpa-alerts branch 3 times, most recently from dd6cbaf to 35c7de7 Compare March 18, 2025 20:07
Kafka/Sarama won't log it (at least not by default), and this isn't
something we expect to see, so log it.

BACK-2559
Previously a glucose.Glucose was used, which was fine in that we only
look at the UserID and UploadID fields, but there are other
incompatible fields that can "spoil" the deserialization.

BACK-2559
This logger has some fields in it that can be useful for debugging.

BACK-2559
Fields named message are quietly dropped by the platform log
package. Also, logging []byte isn't super useful.

BACK-2559
@ewollesen ewollesen requested a review from toddkazakov March 21, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants