Skip to content

Conversation

@Hardikl
Copy link
Contributor

@Hardikl Hardikl commented Dec 10, 2025

No description provided.

@Hardikl Hardikl marked this pull request as draft December 10, 2025 13:31
@Hardikl Hardikl linked an issue Dec 10, 2025 that may be closed by this pull request
@Hardikl
Copy link
Contributor Author

Hardikl commented Dec 11, 2025

While using the dynamic threshold via Config from query results,

  • It's consuming config query and not field, meaning we need pass Query A and this query and it's fields will never be used in further transformations, which is an issue when we populate few fields from Query A.
  • As we can't re-use Query A, which means same query can not be used to populate threshold data for 2 metric via Config from query results.

Hence, made these changes,

  • Generated send % and receive % based on the speed in go plugin for nic, and applied threshold to that percent value as 50, 75 and 90 and removed guage from send and received bytes.
  • Generated send % and receive % based on the link speed in go plugin for ifgrp, and applied threshold to that percent value as 50, 75 and 90 and removed guage from send and received bytes.

I manually increase the value of send bytes in nic table, then table would look like this,
image

I manually increase the value of receive bytes in LAG table, then table would look like this,
image

@Hardikl Hardikl marked this pull request as ready for review December 11, 2025 14:05
@cgrinds cgrinds requested a review from Copilot December 11, 2025 16:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates network monitoring dashboards to use dynamic percentage-based thresholds instead of fixed absolute byte values for link speed monitoring. The change affects two network tables (NIC ports and Link Aggregation Groups) in the Grafana dashboard by introducing new percentage metrics (nic_ifgrp_rx_perc, nic_ifgrp_tx_perc, nic_rx_percent, nic_tx_percent) and updating the visualization thresholds to use percentage mode with configurable warning levels (50%, 75%, 90%).

Key changes:

  • Added new percentage-based metrics for network interface group receive/transmit bandwidth utilization
  • Updated Grafana dashboard to display bandwidth usage as percentages with dynamic thresholds
  • Modified backend collectors to calculate and populate percentage metrics for port aggregation groups

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
mcp/metadata/ontap_metrics.json Added metric definitions for nic_ifgrp_rx_perc and nic_ifgrp_tx_perc
grafana/dashboards/cmode/network.json Updated dashboard configuration to include percentage-based visualizations with dynamic thresholds, added new query references for percentage metrics, and reorganized column ordering
docs/ontap-metrics.md Added documentation for the new percentage metrics including API endpoints and Grafana dashboard references
cmd/tools/generate/counter.yaml Added counter definitions for nic_ifgrp_rx_perc and nic_ifgrp_tx_perc metrics
cmd/collectors/zapiperf/plugins/nic/nic.go Extended PortData population to include percentage values for read/write operations
cmd/collectors/restperf/plugins/nic/nic.go Extended PortData population to include percentage values for read/write operations
cmd/collectors/commonutils.go Added ReadPerc and WritePerc fields to PortData struct and updated aggregation logic to sum percentage values

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +524 to +530
rxp := nData.GetMetric("rx_perc")
rxpv, _ := rxp.GetValueFloat64(ifgroupInstance)
rxp.SetValueFloat64(ifgroupInstance, readPerc+rxpv)

txp := nData.GetMetric("tx_perc")
txpv, _ := txp.GetValueFloat64(ifgroupInstance)
txp.SetValueFloat64(ifgroupInstance, writePerc+txpv)
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summing percentage values across ports in an aggregation group produces incorrect results. Percentages should be averaged or the maximum taken, not summed. For example, if two ports are each at 50% utilization, summing them would incorrectly report 100% instead of the actual 50% average or 50% max utilization.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Network dashboard thresholds

2 participants