Skip to content

Conversation

@maradini77
Copy link

Fixed size_of_dir_entry to recursively calculate the size of directories. Previously, the function only returned the metadata size of directory entries (typically a few KB), completely ignoring files within subdirectories. This caused size_of_dir to return incorrect sizes for any directory tree with nested folders.

The fix checks if an entry is a directory and recursively calls size_of_dir to sum up all file sizes within the directory tree.

@cla-assistant
Copy link

cla-assistant bot commented Nov 21, 2025

CLA assistant check
All committers have signed the CLA.

@michaelsproul
Copy link
Member

Please rebase your change on unstable.

@michaelsproul michaelsproul changed the base branch from stable to unstable December 15, 2025 04:56
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little hesitant to merge this change, because:

  1. We don't need it. The structure of the beacon chain database directory is completely flat (files inside a single directory).
  2. I was concerned that this change could open the door to stack overflows or infinite loops.

Concern (2) is more important. Looking at the docs for DirEntry::metadata and Metadata::is_dir we see that it would not return true for a symlinked directory. This prevents the infinite loop attack.

The risk of a deeply nested directory creating a DoS via stack overflow (or very long run time) still exists, although it would require the datadir to be writable by an attacker, which is likely only possible if permissions are incorrectly set, or there is a remote code execution vulnerability (which creates more serious problems beyond a simple DoS).

I'm leaning towards saying that this slightly increased risk is acceptable, given that it can be mitigated by setting file permissions correctly. This is despite the positive impact being low (no bug in LH's current behaviour). I think fixing the behaviour for the general case is still valuable, as it removes a footgun and saves us from potentially introducing a bug in the case where the assumption about a flat data directory is no longer true.

@michaelsproul
Copy link
Member

@maradini77 Do you have any thoughts on the above?

@michaelsproul michaelsproul added the waiting-on-author The reviewer has suggested changes and awaits thier implementation. label Dec 15, 2025
@maradini77
Copy link
Author

@michaelsproul Thanks for the careful review. Agreed: symlinks won’t cause an infinite loop because is_dir is false on them. I added recursion to avoid hidden bugs if the data directory ever stops being flat (e.g., future tools/plugins). To reduce DoS risk from deep nesting, I can switch to an iterative walk with a depth/entry cap and skip symlinks. Would you be comfortable if I add those protections?

@michaelsproul
Copy link
Member

@maradini77 Yeah let's make it iterative and capped, sounds good!

@maradini77
Copy link
Author

@maradini77 Yeah let's make it iterative and capped, sounds good!

@michaelsproul updated

@michaelsproul michaelsproul added ready-for-review The code is ready for review low-hanging-fruit Easy to resolve, get it before someone else does! and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Dec 18, 2025
@mergify
Copy link

mergify bot commented Dec 18, 2025

Some required checks have failed. Could you please take a look @maradini77? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Dec 18, 2025
@maradini77
Copy link
Author

@michaelsproul fmt fuxed

@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Jan 6, 2026
@mergify
Copy link

mergify bot commented Jan 6, 2026

Some required checks have failed. Could you please take a look @maradini77? 🙏

@mergify mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Jan 6, 2026
Copy link
Member

@macladson macladson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed an inconsistency with the comments. Also, I think some tests would be useful to here to make sure that the behaviour we want is being enforced (particularly with regards to symlinks).

tempfile is a good crate to use for testing this and is already used elsewhere so it's already in our workspace dependencies. You can also create symlinks with the symlink function: https://doc.rust-lang.org/std/os/unix/fs/fn.symlink.html

@maradini77 maradini77 requested a review from macladson January 6, 2026 08:48
@maradini77 maradini77 requested a review from macladson January 6, 2026 12:15
@macladson
Copy link
Member

@maradini77 are you able to write some tests for this function?

Something like this:

#[cfg(test)]
mod tests {
    use super::*;
    use tempfile::TempDir;

    #[test]
    fn size_of_dir_empty() {
        let temp_dir = TempDir::new().unwrap();
        let size = size_of_dir(temp_dir.path());
        assert_eq!(size, 0);
    }

    // ... etc
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

low-hanging-fruit Easy to resolve, get it before someone else does! waiting-on-author The reviewer has suggested changes and awaits thier implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants