-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alert - System UpTime #1212
Comments
For now, the best option is to use the CollectionDates alert. Base the alert on the Instance collection reference - or one of the performance ones that run every 1min. You can set the threshold to something like 5min - so if the service isn't able to obtain data from the instance for 5min you get a notification. The AGHealth alert might also detect a node going offline. We collect the instance start time in DBA Dash, but we don't get this until the instance comes back online. It doesn't make sense to alert on this as we want the notification when it goes offline - not when it comes back. The service needs to have better detection for the instance being unavailable and report this back to the repository DB - then we can alert on it. For now, the collection dates is a good proxy. |
for collection dates i see the threshold but i don't see any references to greater than minutes? |
It generated millions of alarms .
…On Mon, Feb 10, 2025, 7:57 AM David Wiseman ***@***.***> wrote:
The threshold is in minutes. Your configuration of the rule should look
something like this.
image.png (view on web)
<https://github.com/user-attachments/assets/76939a57-4544-431b-8a21-ee0262bd4504>
—
Reply to this email directly, view it on GitHub
<#1212 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BGN5RX3NHF6Y7Q7JL2YBHM32PDD7FAVCNFSM6AAAAABWUNEL3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBYGQ4DMMRZGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
What did you set as the Collection Reference? You need to use a collection that is run frequently. e.g. Instance, CPU. You could also use this in combination with the critical status. On the checks node in the tree, select the Collection Dates tab. This should highlight any collections with a critical or warning status. If you are not using the critical status, select the Critical/Warning dropdown and select check all. Find the reference you want to use in the grid and right click, filter by value. Then you can sort by snapshot age to see if any are over the threshold that you are trying to set for the alert. You will get 1 alert generated per instance that is over the threshold. |
@DavidWiseman Then realized that its not working (mentioned here: #1240) After a minute I realized that I'm getting alerts for all instances, looks that collections stopped responding in some way. I restarted DBA Dash service, and it started to work wihout any changes to alert definition. |
How do I setup an alert on the most important thing that we all care which is SQL server uptime/reboot?
Thanks for adding the alarm feature. Game changer.
The text was updated successfully, but these errors were encountered: