Skip to content

feat!: hardlink deduplications #291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 221 commits into from
Jul 26, 2025
Merged

feat!: hardlink deduplications #291

merged 221 commits into from
Jul 26, 2025

Conversation

KSXGitHub
Copy link
Owner

@KSXGitHub KSXGitHub commented Jul 12, 2025

TODO:

  • Fix the algorithm.
  • Report the number of unrecorded links: stats.nlink() - links.len().
  • More efficient data structure to insert detected hardlinks (Vec randomly pauses).
  • Resolve the TODO comments.
  • Add detected hardlinks to the JSON output.
  • Print detected hardlinks from JSON input.
  • Test.
  • Multi arguments.
  • Fix: duplicated arguments.
  • Update the README.
  • Add examples to --help.
  • Add competing benchmarks.
  • ci(benchmark): update dust to include --no-progress #297
  • Refactor: Rename Hook to RecordHardlinks, DoNothing to Ignorant.

Copy link

github-actions bot commented Jul 12, 2025

Performance Regression Reports

commit: 4d4ceea

There are no regressions.

@KSXGitHub KSXGitHub marked this pull request as ready for review July 26, 2025 07:48
@KSXGitHub KSXGitHub requested a review from Copilot July 26, 2025 08:16
Copilot

This comment was marked as outdated.

@KSXGitHub KSXGitHub requested a review from Copilot July 26, 2025 08:59
Copilot

This comment was marked as outdated.

@KSXGitHub KSXGitHub requested a review from Copilot July 26, 2025 09:12
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements hardlink deduplication functionality to accurately calculate disk usage when hardlinks are present. When enabled via --deduplicate-hardlinks, the tool detects files with multiple links and adjusts the reported sizes to avoid double-counting the same data.

Key changes include:

  • Detection and recording of hardlinks during filesystem traversal
  • Deduplication logic that reduces sizes in the data tree for hardlinked files
  • JSON output enhancement to include hardlink information with summary and detailed reporting

Reviewed Changes

Copilot reviewed 48 out of 49 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/usual_cli.rs Updated test constructors to use new FsTreeBuilder signature with hardlink recorder
tests/json.rs Modified JSON tests to handle new JsonTree structure with hardlink information
tests/hardlinks_*.rs Added comprehensive test suites for hardlink functionality with and without deduplication
tests/cli_errors.rs Updated error test to use new FsTreeBuilder signature
tests/_utils.rs Added utilities for creating test workspaces with hardlinks and symlinks
src/runtime_error.rs Added error types for unsupported features on non-POSIX platforms
src/reporter/ Enhanced progress reporting to include hardlink detection statistics
src/lib.rs Added new modules for hardlink and inode handling
src/json_data.rs Extended JSON data structure to include hardlink information
src/inode.rs New module for inode number handling
src/hardlink/ Complete hardlink detection, recording, and deduplication system
src/fs_tree_builder.rs Updated to support hardlink recording during tree construction
src/data_tree/ Added hardlink deduplication logic to data tree
src/args.rs Added CLI flags for hardlink deduplication and JSON output control
src/app/ Updated application logic to handle hardlink processing and reporting
Comments suppressed due to low confidence (1)

src/hardlink/aware.rs:105

  • Silently ignoring errors from hardlink recording could mask important issues and make debugging difficult. Consider logging the error or handling it more explicitly.
            .map(|values| (*values.size(), values.paths().clone()))

@KSXGitHub KSXGitHub merged commit fa02a7c into master Jul 26, 2025
13 checks passed
@KSXGitHub KSXGitHub deleted the hardlink-awareness branch July 26, 2025 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant