Alerts feature bugs and feedback #1186

R4PH1 · 2025-01-21T16:16:14Z

Hi, thanks for the alerts feature in 3.17

First issue:

We use an internal mailing relay on port 25 mail.domain.local.
When I try to send the test mail I did not receive anything. I tried 3 different configurations.
Could it be that Anonymous authentication is not working with dbadash?

Second issue:

I configured a rule for DriveSpace. I somehow ended with an alert counter of 8 but only 4 alerts were visible.
I created the rule with the defaults values and then changed it to 5000 (5GB I assumed) without the percent checkbox.
I then tried to delete the rule and had a foreign key reference conflict and saw there were 4 entries in Alert.ActiveAlerts which i didn´t see in the gui.

Third issue:

When I edited the rule the "Apply To (Tag)" {ALL} was gone/empty.

I hope this is somehow reproducible on your side otherwise I can try to provide some more details.

DavidWiseman · 2025-01-21T16:57:09Z

Hi, thanks for providing feedback.

Issue 1:

I believe a fix is required for anonymous authentication to work. I think it's just be a case of removing the call to AuthenticateAsync when there is no username/password supplied. I'll get this fixed.

This await client.AuthenticateAsync(UserName, Password); from here

Note: If you click the notification count link you should be able to see the error message associated with the failed notifications.

Issue 2:

Where did you see the alert counter of 8 - on the menu bar on the top right? This calls Alert.AlertCount_Get which is just doing a COUNT(*) from Alert.ActiveAlerts.
The alerts should display in the GUI if they are in the ActiveAlerts table. The tab does filter the list of instances though so if you are not at root level you might not see all the alerts. The tab might also need a refresh.

For deleting the rule - you will get the FK error if there are any active alerts referencing the rule. I think this can be improved by automatically closing any associated alerts.

Issue 3:

This is just a UI issue - the rule will still apply to all tags. I'll see if I can fix this in the next build though.

Thanks!

The Apply To (Tag) is set to {All} when creating a rule but when editing a rule it's blank. Fixed display issue so it shows as {All} when editing. trimble-oss#1186

Close any existing alerts that reference a rule before deleting it. trimble-oss#1186

DavidWiseman · 2025-01-21T21:58:19Z

Hi, an updated build is available here.

This should fix the email issue. If you can, please test and let me know if it fixes the issue.

The tag issue is also fixed and the FK error (but this fix won't get deployed unless the Deploy/Update Database is ran from the service config tool. The deploy only runs automatically if there is a version change.)

R4PH1 · 2025-01-22T09:49:15Z

Thanks a lot for the quick fixes.

The email is now working as expected.

I now noticed why the counter on the top right has a different value from what is shown in the alerts menu. It also includes the "show hidden" instances from the options but the GUI obviously does not display them (as expected). I don´t know what the perfect solution would be for this. For me hidden instances should not report alerts but it might be relevant for other people. Honestly I forgot a bit about them as those hidden ones are mostly test servers / test instances. I could work around that also with the tags. A more reliable solution could be a separate checkbox in the rules configuration to enable or disable it on hidden instances.

The foreign key conflict is now also resolved

The UI issue is also resolved.

I noticed the history did not populate once a alert was resolved by itself, but I guess I gave it too less time when I tested it.

I need to play around a bit more to fully understand the feature and find out what alerts make sense for me

The Apply To (Tag) is set to {All} when creating a rule but when editing a rule it's blank. Fixed display issue so it shows as {All} when editing. #1186

Close any existing alerts that reference a rule before deleting it. #1186

DavidWiseman · 2025-01-22T11:01:39Z

Hi, thanks for confirming that the email is working. 🎉

I think in most cases it would make sense to exclude hidden instances from generating alerts. I usually use this feature when I'm in the process of provisioning a new instance or when decommissioning an old one (Where I want quick access to the old instance for a period of time but no longer want it to appear on the summary page).

An alert will only show up in the history after it's closed rather than resolved. Closing an alert moves it from the Alert.ActiveAlerts table to Alert.ClosedAlerts.

Keeping a resolved alert in the Alert.ActiveAlerts table allows you to see recent issues that have resolved but might require investigation. Also if an alert keeps going from resolved to active, this prevents the alert from sending too many notifications. Alerts will close automatically after 24hrs by default (configurable in Options\Repository Settings). If an alert is closed automatically (or manually) it should be visible in the history.

Add Apply To Hidden option to alert rules. By default hidden instances will no longer generate alerts. trimble-oss#1186

R4PH1 · 2025-01-22T14:13:35Z

Thanks for the explanation.

I found another issue which I think is somewhere in the [Alert].[DriveSpaceAlert_Upd]
When I try to setup an alert for DriveSpace with "Threshold Is Percentage?" to False and set a value like 5000 I don´t get any alerts generated. (I, sadly, for sure have drives with less dann 5GB in the Storage/Drives tab)

Once I set the percentage to enabled and a value of 20 new alerts come in.

Currently I am not sure if it is smart at all to edit existing rules or just delete and create new ones instead. I didn´t have enough time to check the source codes yet.

I think I messed around a bit to much too. What would be the correct way to reset everything configured in alerts? Truncate all the Tables in the Alert Schema (while the collector is stopped)?

DavidWiseman · 2025-01-22T14:26:22Z

I see the bug on the drive free space - I'll get it fixed. It's looking for drives with more than 5GB free instead of less. The status of the drive might prevent alerts for drives with >5GB free if use critical status option is checked.

I have a fix that will exclude hidden instances by default. Rules will have an option to apply to hidden instances but I don't expect many people will need this.

If you want to reset everything, just delete the rules created in the GUI. Or just delete or edit the rules you no longer want. Deleting the rules will now clear the active alerts if you deployed the DB. If you just want to close the alerts that are resolved, use the option in the Actions menu.

Add Apply To Hidden option to alert rules. By default hidden instances will no longer generate alerts. #1186

Fix issue alerting on drive space when using MB threshold instead of %. trimble-oss#1186

Fix issue alerting on drive space when using MB threshold instead of %. #1186

DavidWiseman · 2025-01-22T18:38:16Z

3.17.1 is now available which should fix the reported issues. As there is a version bump, the DB will be upgraded automatically when the service starts.

R4PH1 · 2025-01-23T14:29:33Z

Thank you David! All those fixes seem now be working really well.

Another issue I encountered but not entirely sure about:

If a limit is specified for the Drive Space alert e.g. 20000 / (percentage false) and the Use critical status is enabled somehow
previously shown alerts are resolved if the configuration in Storage/Configure Root Thresholds is configured to percentage.
Initially I had percentage values in the drive root thresholds, not in the alert rule. Once I switch it to GB 10 alert/20 warning they came active. I have not tried it with individual configuration of those thresholds per drive which would be also possible. It might be a logic error in the code.

I hope you are still interested in my feedback as I can see more potential for the alerts feature.

The alert history can get full pretty quickly.
Some things which would be nice to have for it:

automatic cleanup via your retention logic (sorry if this is already in place, didn´t see it in the data retention tab)
Pagination for the alert history window

I have not yet played around with the blackouts but we have weekly maintenance windows, mostly on the weekends. Is it possible to configure a blackout without end date for "infinite" repeats? I still can set them as a workaround also to year 2099.
It would be still nice to be able to copy them like the rules. (copy would also be nice for the notification channels)

Alerts for blocks are not optimal or I didn´t find the proper way to do them yet.
What my workaround alert looks like at the moment:

What I would like to use is the blocked queries counter from the running queries summary but for a timespan of like 2-5 minutes

Another alert which could be useful would be to check against the longest running query. It could be nice to see if someone utilizes queries over half an hour, especially on production critical instances. I didn´t exactly find something fitting in the counters or waits.

DavidWiseman · 2025-01-24T15:07:58Z

If the use critical status is true, an alert will only be generated if the status of the drive is critical based on the drive threshold configuration on the Drives tab. In a addition to that critical status, the threshold on the alert must also be met. If use critical status is false, only the threshold on the alert is considered. e.g.

	WHERE (DS.Status = 1 OR T.UseCriticalStatus=0)
	AND (DS.PctFreeSpace <= (T.Threshold/100.0) OR T.Threshold IS NULL OR T.IsThresholdPercentage=0)
	AND (DS.FreeGB <= T.Threshold/1024.0 OR T.IsThresholdPercentage=1 OR T.Threshold IS NULL)

The use critical status might be useful if you have a critical status set to 5% on the alert. You might have a 16TB drive with 800GB free space that falls just under then 5%. This drive might no longer be growing and you adjusted the critical drive threshold to 500GB. In this case, the drive won't alert if the use critical threshold is true - even though it's under the 5%.
If a rule changes or you clear space on a drive so it's no longer in an alert status, the alert will automatically be resolved.

Ideally, alerts should only be generated when an urgent issue requires your attention. The alert history can fill up though - particularly as you are figuring out which alerts/thresholds work for your environment.

The alert history display is currently limited to 1000 rows in the GUI. The alerts tab at instance level will be filtered for that instance, so you will get a more complete history for an instance at this level. I might improve this with a configurable row limit & maybe some paging.

The alert history currently has no retention options. I added a notes feature to alerts which can be useful. If an alert comes up, you could check the history to see if the alert has occurred previously and what the RCA was. This is potentially valuable data that you might not want to purge. I'll probably add some retention settings at some point. It might default to never or some large number of days. It might have an option to exclude alerts with notes.
Note: Retention works for most things by truncating old partitions which is efficient but doesn't provide any fine grained control. The ClosedAlerts table currently isn't partitioned.

For blackout periods with infinite repeats, just set the date to something in the distant future as you suggested. I might consider extending the copy feature.

For blocking, you could also consider an alert based on wait type using LCK% as the wait type to alert on. I did consider adding a blocking alert based off Running Queries - it would make sense to look at the most recent snapshot (if the blocking is not in the most recent snapshot, the issue is resolved. Or for lots of small blocking events, Waits would be more suitable.) Long running queries could also be useful to alert on.

Thanks

Allow blackout period start/end dates to be NULL. This makes it easier to configure recurring blackout periods. trimble-oss#1186

Allow blackout period start/end dates to be NULL. This makes it easier to configure recurring blackout periods. #1186

Add options for data retention to Alert.ClosedAlert table. Option to exclude alerts with notes from data retention. trimble-oss#1186

Add options for data retention to Alert.ClosedAlert table. Option to exclude alerts with notes from data retention. #1186

DavidWiseman · 2025-02-04T09:33:54Z

Some of the suggestions have been implemented in 3.7.2.

Null blackout period start/end dates
Data retention for ClosedAlerts table

DavidWiseman added a commit to DavidWiseman/dba-dash that referenced this issue Jan 21, 2025

Fix issue deleting a rule with active alerts

6278523

Close any existing alerts that reference a rule before deleting it. trimble-oss#1186

DavidWiseman added a commit that referenced this issue Jan 22, 2025

Alert rule Apply To (Tag) display fix

574210b

The Apply To (Tag) is set to {All} when creating a rule but when editing a rule it's blank. Fixed display issue so it shows as {All} when editing. #1186

DavidWiseman added a commit that referenced this issue Jan 22, 2025

Fix issue deleting a rule with active alerts

fe9fc00

Close any existing alerts that reference a rule before deleting it. #1186

DavidWiseman added a commit to DavidWiseman/dba-dash that referenced this issue Jan 22, 2025

Alert Rule - Apply To Hidden

54690de

Add Apply To Hidden option to alert rules. By default hidden instances will no longer generate alerts. trimble-oss#1186

DavidWiseman mentioned this issue Jan 22, 2025

Alert Rule - Apply To Hidden #1191

Merged

DavidWiseman added a commit that referenced this issue Jan 22, 2025

Alert Rule - Apply To Hidden

0ad73c3

Add Apply To Hidden option to alert rules. By default hidden instances will no longer generate alerts. #1186

DavidWiseman added a commit to DavidWiseman/dba-dash that referenced this issue Jan 22, 2025

Drive Space Alert fix

701917b

Fix issue alerting on drive space when using MB threshold instead of %. trimble-oss#1186

DavidWiseman mentioned this issue Jan 22, 2025

Drive Space Alert fix #1192

Merged

DavidWiseman added a commit that referenced this issue Jan 22, 2025

Drive Space Alert fix

04f5b2a

Fix issue alerting on drive space when using MB threshold instead of %. #1186

R4PH1 changed the title ~~Alerts feature bugs~~ Alerts feature bugs and feedback Jan 24, 2025

DavidWiseman added a commit to DavidWiseman/dba-dash that referenced this issue Jan 28, 2025

Alert Config - Allow NULL Blackout Period Start/End Date

6c74791

Allow blackout period start/end dates to be NULL. This makes it easier to configure recurring blackout periods. trimble-oss#1186

DavidWiseman mentioned this issue Jan 28, 2025

Alert config improvements #1198

Merged

DavidWiseman added a commit that referenced this issue Jan 28, 2025

Alert Config - Allow NULL Blackout Period Start/End Date

8baf11c

Allow blackout period start/end dates to be NULL. This makes it easier to configure recurring blackout periods. #1186

DavidWiseman added a commit to DavidWiseman/dba-dash that referenced this issue Jan 28, 2025

Data Retention - ClosedAlert table

5bd8000

Add options for data retention to Alert.ClosedAlert table. Option to exclude alerts with notes from data retention. trimble-oss#1186

DavidWiseman added a commit that referenced this issue Jan 29, 2025

Data Retention - ClosedAlert table

985ff72

Add options for data retention to Alert.ClosedAlert table. Option to exclude alerts with notes from data retention. #1186

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alerts feature bugs and feedback #1186

Alerts feature bugs and feedback #1186

R4PH1 commented Jan 21, 2025

DavidWiseman commented Jan 21, 2025

DavidWiseman commented Jan 21, 2025

R4PH1 commented Jan 22, 2025

DavidWiseman commented Jan 22, 2025

R4PH1 commented Jan 22, 2025

DavidWiseman commented Jan 22, 2025

DavidWiseman commented Jan 22, 2025

R4PH1 commented Jan 23, 2025

DavidWiseman commented Jan 24, 2025

DavidWiseman commented Feb 4, 2025

Alerts feature bugs and feedback #1186

Alerts feature bugs and feedback #1186

Comments

R4PH1 commented Jan 21, 2025

DavidWiseman commented Jan 21, 2025

DavidWiseman commented Jan 21, 2025

R4PH1 commented Jan 22, 2025

DavidWiseman commented Jan 22, 2025

R4PH1 commented Jan 22, 2025

DavidWiseman commented Jan 22, 2025

DavidWiseman commented Jan 22, 2025

R4PH1 commented Jan 23, 2025

DavidWiseman commented Jan 24, 2025

DavidWiseman commented Feb 4, 2025