Skip to content

Add knob to disable slow io notifications #17477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oshogbo
Copy link
Contributor

@oshogbo oshogbo commented Jun 20, 2025

Motivation and Context

Adding a knob allows disabling the slow_io check on a single vdev. Some users have reported that it breaks their enterprise configuration when one or more vdevs are using fiber channels with redundancy. Another reason to disable the check might be when a vdev is damaged, but we still want to force some resilvering from it.

Description

Add a knob to disable slow I/O event generation for a single vdev.

How Has This Been Tested?

New test has been added.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Jun 24, 2025
@amotin
Copy link
Member

amotin commented Jun 24, 2025

I wonder if this feature partially duplicates slow_io_n/slow_io_t? Though I may not understand their meaning.

} else if (vd->vdev_leaf_zap != 0) {
objid = vd->vdev_leaf_zap;
} else {
if (vdev_prop_get_objid(vd, &objid) != 0)
panic("unexpected vdev type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move the panic() into vdev_prop_get_objid() to save having to test it everywhere it is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it in vdev_prop_get_int, and then it is used for example here to generate an error.

So I would leave it as is for a consumer to decide if it should panic or not.

@oshogbo oshogbo force-pushed the oshogbo/n_knob branch 2 times, most recently from 3cdaf1b to a5a9411 Compare July 9, 2025 09:57
@oshogbo
Copy link
Contributor Author

oshogbo commented Jul 9, 2025

I wonder if this feature partially duplicates slow_io_n/slow_io_t? Though I may not understand their meaning.

I see this new feature as a supplement to slow_io_n/slow_io_t. Those settings serve two purposes: they let zpool status report how many slow I/Os have occurred, and they generate events for ZED/zfsd so it can decide whether to downgrade the pool. However, sometimes you don’t want the pool downgraded, but you still might want some accounting on slow I/O.

For example, in multipath environments (as has been pointed out multiple times on X/Twitter), pools have unexpectedly been downgraded. From my point of view, this feature is useful when you know a device is failing but still want to extract as much throughput as possible during recovery.

Simply raising slow_io_n/slow_io_t to very high values isn’t a great solution, and seems like an bypass not solution — how high is “high enough”? A single knob to control this behavior makes more sense in these scenarios.

Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that
allows users to disable notifications for slow devices.
This prevents ZED and/or ZFSD from degrading the pool due to slow
I/O.

Signed-off-by: Mariusz Zaborski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants