Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: resolve git file attrs #150

Merged
merged 17 commits into from
Jun 11, 2024
Merged

Conversation

tisonkun
Copy link
Member

@tisonkun tisonkun commented Jun 10, 2024

This is somehow expensive. I test against OpenDAL which has ~1000 files to format and 2600+ commits to analyze, it takes about 4 seconds to finish. So by default, I set the config value to disable.

fmt/src/document/factory.rs Outdated Show resolved Hide resolved
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
This reverts commit e576dad.
Signed-off-by: tison <[email protected]>
@tisonkun tisonkun merged commit 9d86fe5 into korandoru:main Jun 11, 2024
16 checks passed
@tisonkun tisonkun deleted the git-file-attrs branch June 11, 2024 03:09
@@ -141,7 +141,7 @@ pub fn resolve_file_attrs(repo: &Repository) -> anyhow::Result<HashMap<String, G
Ok::<_, Infallible>(Default::default())
},
)?;
prev_commit = this_commit;
prev_tree = tree;
cache.clear_resource_cache();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem no help. Because we still find object for tree? And iter over all the changes on every location.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For e576dad

Copy link
Contributor

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the invite - I just had time for a glimpse.

&prev_commit.tree()?,
&mut cache,
|change| {
let filepath = workdir.join(change.location.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you want to use gix::path::from_bstr(change.location), never use String when anything path related is happening as they can't represent everything that's possible on the filesystem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice. I try to write:

                let filepath = gix::path::from_bstring(change.location);
                match attrs.entry(filepath) {

But the filepath is something like "fmt/tests/tests.rs" which cannot match the selections "/Users/tison/Brittani/hawkeye-native/fmt/tests/tests.rs" on the later get.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling filepath.canonicalize() here will result in IO error No such file or directory. I guess it's because no base dir specified.

Perhaps I can use still workdir.join(filepath)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git tracks paths relative to the repository root, hence one will have to join them with the worktree root before using them on the filesystem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Thanks for your information.

This changes iteration currently take about 4 seconds on a repo having ~1300 files and ~2600 commits.

I wonder what is the major performance factor and whether we can improve it (e.g., does git work well in such situations?).

It seems most cycles are used to parse the tree data which we can do nothing to improve.

BTW, gitoxide already out performance than git command as:

for i in $(git ls-files); do git --no-pager log --follow --format=%ad -- $i > /dev/null; done

takes 1 minute and 5 seconds to finish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to get an apples-apples comparison since Git definitely does way more work here as it goes through the whole history once per file, but the Rust code only has to go through once. From that point of view, Git's performance is impressive.

I wonder what is the major performance factor and whether we can improve it (e.g., does git work well in such situations?).

Object-access performance is critical, and there are some variables regarding caches that can be set. They can improve performance by a couple of percent, but it's nothing more significant.
But wait, in this line the max-performance feature can be added, it should be noticable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for different years Add file create year and file modified year
2 participants