Alert - System UpTime #1212

aachaemenes · 2025-02-06T20:24:19Z

How do I setup an alert on the most important thing that we all care which is SQL server uptime/reboot?

Thanks for adding the alarm feature. Game changer.

DavidWiseman · 2025-02-08T09:09:28Z

For now, the best option is to use the CollectionDates alert. Base the alert on the Instance collection reference - or one of the performance ones that run every 1min. You can set the threshold to something like 5min - so if the service isn't able to obtain data from the instance for 5min you get a notification. The AGHealth alert might also detect a node going offline.

We collect the instance start time in DBA Dash, but we don't get this until the instance comes back online. It doesn't make sense to alert on this as we want the notification when it goes offline - not when it comes back. The service needs to have better detection for the instance being unavailable and report this back to the repository DB - then we can alert on it. For now, the collection dates is a good proxy.

aachaemenes · 2025-02-10T15:50:47Z

for collection dates i see the threshold but i don't see any references to greater than minutes?

DavidWiseman · 2025-02-10T15:57:15Z

The threshold is in minutes. Your configuration of the rule should look something like this.

aachaemenes · 2025-02-10T16:07:03Z

It generated millions of alarms .

…

On Mon, Feb 10, 2025, 7:57 AM David Wiseman ***@***.***> wrote: The threshold is in minutes. Your configuration of the rule should look something like this. image.png (view on web) <https://github.com/user-attachments/assets/76939a57-4544-431b-8a21-ee0262bd4504> — Reply to this email directly, view it on GitHub <#1212 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BGN5RX3NHF6Y7Q7JL2YBHM32PDD7FAVCNFSM6AAAAABWUNEL3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBYGQ4DMMRZGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

DavidWiseman · 2025-02-10T16:19:28Z

What did you set as the Collection Reference? You need to use a collection that is run frequently. e.g. Instance, CPU. You could also use this in combination with the critical status.

On the checks node in the tree, select the Collection Dates tab. This should highlight any collections with a critical or warning status. If you are not using the critical status, select the Critical/Warning dropdown and select check all. Find the reference you want to use in the grid and right click, filter by value. Then you can sort by snapshot age to see if any are over the threshold that you are trying to set for the alert. You will get 1 alert generated per instance that is over the threshold.

FuriousDBA · 2025-02-21T13:13:08Z

It generated millions of alarms .
…

@DavidWiseman
I replicated the issue mentioned above. At first, I set the alarm like that:

Then realized that its not working (mentioned here: #1240)
So I changed that like that:

After a minute I realized that I'm getting alerts for all instances, looks that collections stopped responding in some way. I restarted DBA Dash service, and it started to work wihout any changes to alert definition.

DavidWiseman mentioned this issue Feb 8, 2025

Alert - System UpTime #1215

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alert - System UpTime #1212

Alert - System UpTime #1212

aachaemenes commented Feb 6, 2025

DavidWiseman commented Feb 8, 2025

aachaemenes commented Feb 10, 2025

DavidWiseman commented Feb 10, 2025

aachaemenes commented Feb 10, 2025 via email

DavidWiseman commented Feb 10, 2025

FuriousDBA commented Feb 21, 2025

Alert - System UpTime #1212

Alert - System UpTime #1212

Comments

aachaemenes commented Feb 6, 2025

DavidWiseman commented Feb 8, 2025

aachaemenes commented Feb 10, 2025

DavidWiseman commented Feb 10, 2025

aachaemenes commented Feb 10, 2025 via email

DavidWiseman commented Feb 10, 2025

FuriousDBA commented Feb 21, 2025