Production - [Alerting] DotNetEng Status Failed Requests/Hour alert #4920

dotnet-eng-status · 2025-02-05T18:55:54Z

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:
union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

failuresCount 191.2

Go to rule

@dotnet/dnceng, @dotnet/prodconsvcs, please investigate

Automation information below, do not change

Grafana-Automated-Alert-Id-d2dd705a6c724ed68fcf6955561c06dd

The text was updated successfully, but these errors were encountered:

dotnet-eng-status · 2025-02-05T18:56:15Z

💔 Metric state changed to alerting

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:
union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

failuresCount 191.2

Go to rule

dotnet-eng-status · 2025-02-05T19:41:10Z

💚 Metric state changed to ok

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:
union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

Go to rule

dotnet-eng-status · 2025-02-05T19:41:44Z

💚 Metric state changed to ok

The number of failed DotNetEng Status requests per hour is above 20. This may indicate a systemic problem that needs to be investigated.
To intially investigate prod, run the following query in DotNetEng-Status-Prod, and to investigate staging, run the query in DotNetEng-Status-Staging:
union exceptions, traces
| project timestamp, operation_Name, customDimensions, message, problemId, details
| order by timestamp asc

Go to rule

haruna99 · 2025-02-11T01:34:30Z

The number of failed DotNetEng Status requests per hour is below 20. Closing issue

garath · 2025-02-11T01:49:39Z

The number of failed DotNetEng Status requests per hour is below 20. Closing issue

Did the failed requests fall low because the overall request count also went low? Or perhaps there is a class of request that's failing that spiked in that window but continues to fail. It would be good to dig a bit to understand what was failing.

haruna99 · 2025-02-11T16:43:00Z

The number of failed DotNetEng Status requests per hour is below 20. Closing issue

Did the failed requests fall low because the overall request count also went low? Or perhaps there is a class of request that's failing that spiked in that window but continues to fail. It would be good to dig a bit to understand what was failing.

I will conduct further investigation to better understand the cause of the failure.

dotnet-eng-status bot added Active Alert Issues from Grafana alerts that are now active Critical Grafana Alert Issues opened by Grafana Ops - First Responder Production Tied to the Production environment (as opposed to Staging) labels Feb 5, 2025

dotnet-eng-status bot added Inactive Alert Issues from Grafana alerts that are now "OK" and removed Active Alert Issues from Grafana alerts that are now active labels Feb 5, 2025

haruna99 self-assigned this Feb 11, 2025

haruna99 closed this as completed Feb 11, 2025

haruna99 reopened this Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Production - [Alerting] DotNetEng Status Failed Requests/Hour alert #4920

Production - [Alerting] DotNetEng Status Failed Requests/Hour alert #4920

dotnet-eng-status bot commented Feb 5, 2025

dotnet-eng-status bot commented Feb 5, 2025

dotnet-eng-status bot commented Feb 5, 2025

dotnet-eng-status bot commented Feb 5, 2025

haruna99 commented Feb 11, 2025

garath commented Feb 11, 2025

haruna99 commented Feb 11, 2025

Production - [Alerting] DotNetEng Status Failed Requests/Hour alert #4920

Production - [Alerting] DotNetEng Status Failed Requests/Hour alert #4920

Comments

dotnet-eng-status bot commented Feb 5, 2025

dotnet-eng-status bot commented Feb 5, 2025

dotnet-eng-status bot commented Feb 5, 2025

dotnet-eng-status bot commented Feb 5, 2025

haruna99 commented Feb 11, 2025

garath commented Feb 11, 2025

haruna99 commented Feb 11, 2025