Skip to content

Commit 3ec7906

Browse files
Meir Elishapeluse
authored andcommitted
md/raid5: fix parity corruption on journal failure
When operating in write-through journal mode, a journal device failure can lead to parity corruption and silent data loss. This occurs because the current implementation continues to update parity even when journal writes fail, violating the write-through consistency guarantee. Signed-off-by: Meir Elisha <[email protected]>
1 parent c321d71 commit 3ec7906

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

drivers/md/raid5.c

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1146,9 +1146,21 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
11461146

11471147
might_sleep();
11481148

1149+
/* Successfully logged to journal */
11491150
if (log_stripe(sh, s) == 0)
11501151
return;
11511152

1153+
/*
1154+
* Journal device failed. Only abort writes if we have
1155+
* too many failed devices to maintain consistency.
1156+
*/
1157+
if (conf->log && r5l_log_disk_error(conf) &&
1158+
s->failed > conf->max_degraded &&
1159+
(s->to_write || s->written)) {
1160+
set_bit(STRIPE_HANDLE, &sh->state);
1161+
return;
1162+
}
1163+
11521164
should_defer = conf->batch_bio_dispatch && conf->group_cnt;
11531165

11541166
for (i = disks; i--; ) {
@@ -3672,6 +3684,13 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
36723684
* still be locked - so just clear all R5_LOCKED flags
36733685
*/
36743686
clear_bit(R5_LOCKED, &sh->dev[i].flags);
3687+
/* Clear R5_Want* flags to prevent stale operations
3688+
* from executing on retry.
3689+
*/
3690+
clear_bit(R5_Wantwrite, &sh->dev[i].flags);
3691+
clear_bit(R5_Wantcompute, &sh->dev[i].flags);
3692+
clear_bit(R5_WantFUA, &sh->dev[i].flags);
3693+
clear_bit(R5_Wantdrain, &sh->dev[i].flags);
36753694
}
36763695
s->to_write = 0;
36773696
s->written = 0;

0 commit comments

Comments
 (0)