Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the NODEFS filesystem in statfs when available #23912

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bgrgicak
Copy link
Contributor

@bgrgicak bgrgicak commented Mar 13, 2025

🚧 WIP

statfs always returns default values with NODEFS, because the default file system is MEMFS which doesn't support statfs.

In this PR we detect if NODEFS is available and use it as the node when calling statfsNode.
This ensures statfsNode has access to NODEFS and the requested path.

statfs is only supported in NODEFS and NODERAWFS, but this change won't affect NODERAWFS because when used it's the default file system and it has it's own implementation of statfsNode.

Testing instructions

I tested this by running ./test/runner other.test_unistd_fstatfs, but that only ensures we are getting values, the test doesn't check for real values.

@hoodmane
Copy link
Collaborator

Can you give an example of the behavior that you are trying to fix? test_unistd_fstatfs runs in NODEFS as well as NODERAWFS so there is basic coverage for this there. A good start point would be to update that test to add whatever additional requirements you're after and show it failing on NODEFS but succeeding on NODERAWFS.

@hoodmane
Copy link
Collaborator

the test doesn't check for real values

Right, if you could update test/unistd/fstatfs.c to have assertions about the values returned that'd be great. You may need something like:

#if NODEFS || NODERAWFS
// In nodefs and noderawfs assert that the values returned are correct.
// In wasmfs and memfs, `fstatfs` is a stub so maybe the values there are expected to be wrong?
#endif

@bgrgicak
Copy link
Contributor Author

Can you give an example of the behavior that you are trying to fix?

I'm running into this issue in WordPress Playground. When PHP calls disk_total_space, it uses statfs to fetch stats, but we always get the default value.

Right, if you could update test/unistd/fstatfs.c to have assertions about the values returned that'd be great. You may need something like:

I would love to add a test, but I'm not sure how to do it in Emscripten.
I tested it in WordPress Playground by comparing the result of statfs of statfsSync('/') (from node:fs).

In the test each FS will return the default values. I checked it by adding assert(buf.f_blocks == 1000000); the test.

To properly test NODEFS and NODERAWFS I would need to have real OS data in the response, but I'm not sure how test that.

test_unistd_fstatfs runs in NODEFS as well as NODERAWFS so there is basic coverage for this there. A good start point would be to update that test to add whatever additional requirements you're after and show it failing on NODEFS but succeeding on NODERAWFS.

The issue doesn't come up in NODERAWFS because when used it's the default FS, unlike NODEFS which isn't the default.

@bgrgicak
Copy link
Contributor Author

I see ./test/runner other.test_unistd_fstatfs_nodefs is failing, this is probably a good place for me to look at. 👀

@bgrgicak
Copy link
Contributor Author

I definitely need to research this more.
Test number of block in statfs adds tests that pass on main and confirm that fstatfs and statfs can return real values, but on this branch, fstatfs(f, &buf) starts failing.

@bgrgicak
Copy link
Contributor Author

I will get back to this PR next week.
First I want to demonstrate in a test how NODEFS is inaccessible when not mounted and how this breaks statfs.

After I have the test, I will explore how it could be improved.

src/lib/libfs.js Outdated
// file system which doesn't have a statfs function.
// To ensure statfsNode has access to the statfs function, we pass in
// the NODEFS object.
if (this.filesystems.NODEFS) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statfs should figure out which filesystem the path is in and then return the filesystem info for that filesystem.

i.e. if path is in the MEMFS you should get the memfs info. If path is a NODEFS you should get the node FS info.

The presense of NODEFS alone should not be a factor, right? The question should be "is path part of a node FS mount" I think,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this for a week and I'm not sure how to handle it, so let me start with our usecase and why the above approach doesn't work for us.

WordPress Playground mounts all files into MEMFS because some files like the WordPress code are being downloaded from remote URL while a plugin might be imported from a local folder. So, the resulting codebase that we are running inside Emscription is composed out of multiple data sources.

Curently when the code asks for available disk space using statfs("/") it get's the default values because / is a MEMFS directory.

I was able to work around this by overriding statfs during onRuntimeInitialized.

PHPRuntime.FS.root.node_ops = {
	...PHPRuntime.FS.root.node_ops,
	statfs: PHPRuntime.FS.filesystems.NODEFS.node_ops.statfs,
};
PHPRuntime.FS.root.mount.opts.root = '/';	

Checking what FS the path belongs to, makes sense overall, but in some cases like the above it would be great if we could override it and force a FS to be used.

Specifically for statfs this makes sense to force NODEFS, because it only works in NODEFS (and NODERAWFS).If the requested path is available in that NODEFS it would be reasonable to override MEMFS and return the result from NODEFS.node_ops.statfs.statfs.

What do you think? How would you approach our usecase?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curently when the code asks for available disk space using statfs("/") it get's the default values because / is a MEMFS directory.

Which code is doing this? Asking for the available disk space by using statfs("/") seem odd since normally the root FS has very little space in it right? Where is that code that is doing this? Why is it using /?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But won't the statfs from node give very different results if you stat / vs /tmp vs /home, for example. What is the code in question interested in exactly?

} catch (e) {
// Older versions of Node don't support statfsSync
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only needed for the one test I would just do this in fstatfs.c itself. You can just do it in an EM_ASM or EM_JS block on startup?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants