Skip to content

Conversation

@sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Sep 26, 2025

Instead we use pip to install the dev dependencies in out/python_deps directory.
This means that the emscripten compiler itself does not have access to them,
only the test code which explictly adds this path.

This means we can perform the installation as part of the bootstrap script
(just like we already do for node dev dependencies) which in turn means
we can consistently rely on dev dependencies to be available in test code
(without needing to include an opt out mechanism).

@sbc100 sbc100 force-pushed the python_bootstrap_deps branch 3 times, most recently from 958ae17 to 0bc31ad Compare September 26, 2025 22:09
@sbc100 sbc100 marked this pull request as draft September 26, 2025 22:16
@sbc100 sbc100 force-pushed the python_bootstrap_deps branch 8 times, most recently from a73b170 to 5ac23bf Compare November 5, 2025 22:56
@sbc100 sbc100 marked this pull request as ready for review November 5, 2025 23:02
@sbc100 sbc100 requested review from dschuff, juj and kripken November 5, 2025 23:02
@juj
Copy link
Collaborator

juj commented Nov 5, 2025

Previously I have been able to test end user setup by leaving out installing the dev dependencies.

But because bootstrap is mandatory, and after this PR it will install python dev dependencies, it looks like I will need to develop some kind of delete step to remove the dev dependency packages from Python for testing and shipping.

This would be a divergence between node.js vs python, where we don't install node.js dev dependency packages via boostrap either?

Also, on Linux e.g. on Debian where pip install is forbidden system-wide, won't Emscripten stop working for end users unless one operates Emscripten from inside a python virtualenv sandbox?

__rootpath__ = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, __rootpath__)
# Add `out/python_deps` to ensure that we can import our dev dependencies.
sys.path.insert(0, os.path.join(__rootpath__, 'out', 'python_deps'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, isn't this causing a divergence between "what is tested" vs "what user has installed"?

I.e. before this PR, when running the test suite, it would test the user's Python installation, which could be for example the Python we ship via emsdk?

But after this change, the Emscripten test runner would stop testing the package setup from user's Python installation, and start testing a local sandboxed Python out/python_deps package path that is set up just for local testing purposes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only things out/python_deps are the packages listed in requirements-dev.txt.

The emscripten compiler itself does not depend on any of these. In fact it has zero dependencies on non-standard packages.

On advantage of this change is that we no longer install our test-only/dev-only packages in the system python (or in the emsdk python) which means that when we run out test there is no longer any risk that emscripten itself will accidentally depend on one of the dev dependencies.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 5, 2025

Previously I have been able to test end user setup by leaving out installing the dev dependencies.

I designed this change specifically with your workflow in mind.

IIRC the reason you didn't want to install the dev (test) dependencies was that you were worried about emscripten accidenatlly depending on these. With this system that risk has been removed. The emscripten compiler code itself cannot see or use these dependencies.

But because bootstrap is mandatory, and after this PR it will install python dev dependencies, it looks like I will need to develop some kind of delete step to remove the dev dependency packages from Python for testing and shipping.

This would be a divergence between node.js vs python, where we don't install node.js dev dependency packages via boostrap either?

The bootstrap script does install the node dev dependencies. The place were with using --omit-dev to avoid dev node modules being installed is in tools/install.py.

Also, on Linux e.g. on Debian where pip install is forbidden system-wide, won't Emscripten stop working for end users unless one operates Emscripten from inside a python virtualenv sandbox?

Indeed, and we are not installing anything system-wide install. Everything is being put in emscripten/out.

That is another advantage of this PR is that now we don't depend on installing anything in the system python path.

@sbc100 sbc100 changed the title Perform pip install as part of bootstrap.py Avoid installing dev dependnecies in system python path. NFC Nov 5, 2025
@sbc100 sbc100 changed the title Avoid installing dev dependnecies in system python path. NFC Avoid installing test dependencies in system python path. NFC Nov 5, 2025
@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 5, 2025

I re-titled, and updated the description to be a little bit more explicit about the intent here.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 6, 2025

Anther potential upside here is that we add new test dependencies without effecting the distributed version of emscripten, or introducing new dependencies for our users.

For example, we could starting using new packages in our python test code such that could improve the testing experience. Right now we are artificially limited in the packages we can use in our test framework because we want to avoid new product dependencies. This change makes a clear separation and allows us to move forward without risking new deps for our users.

Also, install pip packages in local `out/python_deps` directory instead
of installing system wide.

This means we can consistently expect dev dependencies to be available
in test code without needing to include an opt out mechanism.

We were already doing this for `psutil`, but for `websockify` we made
it optional.

This means that only python scripts that explicitly add
`out/python_deps` to their python path will be able use
the packages, and in particular it means that the emscripten compiler
itself won't end up implicitly/accidentally depending on them.
@juj
Copy link
Collaborator

juj commented Nov 6, 2025

I designed this change specifically with your workflow in mind.

Thanks, that is much appreciated.

With this system that risk has been removed. The emscripten compiler code itself cannot see or use these dependencies.

I suppose the question I have is - how do we test and verify that this will remain to be the case? Since it is not possible to run tests without running bootstrap, then it is no longer possible to launch the test runner in a mode that does not have these dependencies installed?

What I would like is to keep running the Emscripten test suite in a mode that does not require dev dependencies. Those tests prove that Emscripten does not depend on unwanted packages.

I.e. even if we would statically say that Emscripten does not depend on dev packages, it would not automatically mean that we would be testing to verify that it does not depend on those packages? So we wouldn't/couldn't catch an error if that assumption regresses?

The bootstrap script does install the node dev dependencies.

That does not (fortunately) seem to be the case. Or at least if one is installing via emsdk. It is important for us to be able to do a bootstrap that does not install dev dependencies.

C:\emsdk\emscripten\main>npm ci --production
npm warn config production Use `--omit=dev` instead.
npm warn deprecated [email protected]: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
npm warn deprecated [email protected]: Glob versions prior to v9 are no longer supported

added 215 packages, and audited 216 packages in 3s

17 packages are looking for funding
  run `npm fund` for details

1 low severity vulnerability

To address all issues, run:
  npm audit fix

Run `npm audit` for details.

C:\emsdk\emscripten\main>npm list
main@ C:\emsdk\emscripten\main
+-- @babel/[email protected]
+-- @babel/[email protected]
+-- @babel/[email protected]
+-- UNMET DEPENDENCY @eslint/eslintrc@^3.3.1
+-- UNMET DEPENDENCY @eslint/js@^9.36.0
+-- [email protected]
+-- UNMET DEPENDENCY es-check@^9.4.4
+-- UNMET DEPENDENCY eslint-config-prettier@^10.1.8
+-- UNMET DEPENDENCY eslint@^9.36.0
+-- UNMET DEPENDENCY globals@^16.4.0
+-- [email protected]
+-- [email protected]
+-- UNMET DEPENDENCY prettier@^3.6.2
+-- UNMET DEPENDENCY rollup@^4.52.3
+-- UNMET DEPENDENCY [email protected]
+-- UNMET DEPENDENCY typescript@^5.9.3
+-- UNMET DEPENDENCY vite@^7.1.7
+-- UNMET DEPENDENCY webpack-cli@^6.0.1
+-- UNMET DEPENDENCY webpack@^5.102.0
`-- UNMET DEPENDENCY ws@^8.18.3

npm error code ELSPROBLEMS
npm error missing: @eslint/eslintrc@^3.3.1, required by main@
npm error missing: @eslint/js@^9.36.0, required by main@
npm error missing: es-check@^9.4.4, required by main@
npm error missing: eslint-config-prettier@^10.1.8, required by main@
npm error missing: eslint@^9.36.0, required by main@
npm error missing: globals@^16.4.0, required by main@
npm error missing: prettier@^3.6.2, required by main@
npm error missing: rollup@^4.52.3, required by main@
npm error missing: [email protected], required by main@
npm error missing: typescript@^5.9.3, required by main@
npm error missing: vite@^7.1.7, required by main@
npm error missing: webpack-cli@^6.0.1, required by main@
npm error missing: webpack@^5.102.0, required by main@
npm error missing: ws@^8.18.3, required by main@
npm error A complete log of this run can be found in: C:\Users\clb\AppData\Local\npm-cache\_logs\2025-11-06T21_22_53_946Z-debug-0.log

C:\emsdk\emscripten\main>bootstrap
Up-to-date: npm packages
Up-to-date: create entry points
Up-to-date: git submodules

C:\emsdk\emscripten\main>npm list
main@ C:\emsdk\emscripten\main
+-- @babel/[email protected]
+-- @babel/[email protected]
+-- @babel/[email protected]
+-- UNMET DEPENDENCY @eslint/eslintrc@^3.3.1
+-- UNMET DEPENDENCY @eslint/js@^9.36.0
+-- [email protected]
+-- UNMET DEPENDENCY es-check@^9.4.4
+-- UNMET DEPENDENCY eslint-config-prettier@^10.1.8
+-- UNMET DEPENDENCY eslint@^9.36.0
+-- UNMET DEPENDENCY globals@^16.4.0
+-- [email protected]
+-- [email protected]
+-- UNMET DEPENDENCY prettier@^3.6.2
+-- UNMET DEPENDENCY rollup@^4.52.3
+-- UNMET DEPENDENCY [email protected]
+-- UNMET DEPENDENCY typescript@^5.9.3
+-- UNMET DEPENDENCY vite@^7.1.7
+-- UNMET DEPENDENCY webpack-cli@^6.0.1
+-- UNMET DEPENDENCY webpack@^5.102.0
`-- UNMET DEPENDENCY ws@^8.18.3

npm error code ELSPROBLEMS
npm error missing: @eslint/eslintrc@^3.3.1, required by main@
npm error missing: @eslint/js@^9.36.0, required by main@
npm error missing: es-check@^9.4.4, required by main@
npm error missing: eslint-config-prettier@^10.1.8, required by main@
npm error missing: eslint@^9.36.0, required by main@
npm error missing: globals@^16.4.0, required by main@
npm error missing: prettier@^3.6.2, required by main@
npm error missing: rollup@^4.52.3, required by main@
npm error missing: [email protected], required by main@
npm error missing: typescript@^5.9.3, required by main@
npm error missing: vite@^7.1.7, required by main@
npm error missing: webpack-cli@^6.0.1, required by main@
npm error missing: webpack@^5.102.0, required by main@
npm error missing: ws@^8.18.3, required by main@
npm error A complete log of this run can be found in: C:\Users\clb\AppData\Local\npm-cache\_logs\2025-11-06T21_23_05_251Z-debug-0.log

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 6, 2025

I suppose the question I have is - how do we test and verify that this will remain to be the case? Since it is not possible to run tests without running bootstrap, then it is no longer possible to launch the test runner in a mode that does not have these dependencies installed?

The idea is that the test runner has a hard dependency on these dev packages, but emscripten itself does not.

I.e. even if we would statically say that Emscripten does not depend on dev packages, it would not automatically mean that we would be testing to verify that it does not depend on those packages? So we wouldn't/couldn't catch an error if that assumption regresses?

The emscripten compiler itself is never run with python_deps in its PYTHON_PATH, so trying to import psutil, for example within the compiler itself would simply fail.

I suppose its possible that someone could defeat that by adding sys.path.append to emcc... but that seems very contrived. If you like can add and some assertion to emcc.py that enforce that this path is not in sys.path? We could even assert that sys.path contains not paths that fall under the emscripten project directory?

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 6, 2025

I added some assertion to emcc.py to ensure that the compiler itself doesn't run with those python libs in its path.

@sbc100 sbc100 force-pushed the python_bootstrap_deps branch from 5ac23bf to 996aa63 Compare November 6, 2025 23:05
@juj
Copy link
Collaborator

juj commented Nov 6, 2025

The idea is that the test runner has a hard dependency on these dev packages

I guess this is the part that I feel off about. Currently there does not exist a hard dependency in the test runner to dev packages, and so I can test the majority of the test suite using the end user python configuration.

If we require a hard dependency to dev packages to be installed when testing, then we lose guarantee that we are testing the end user configuration.

In our packaging process on the CI, what I am currently doing is:

  1. use emsdk install to install Emscripten. (emsdk runs bootstrap)
  2. adjust the directory structure of the installed emsdk artifacts a bit to adhere to the Unity directory structure.
  3. delete some files we don't want to ship to end users (mostly files from the third_party/ directory that have a separate license)
  4. with emsdk active, run the test suite (ensures that my adjusted dir layout and filtered files still retain a working copy)
  5. if tests pass, zip up the emsdk directory as the Emscripten SDK artifact to Unity users.

It is beneficial to do this testing without dev dependencies installed. That ensures that the tested Python setup matches the production one.

I suppose I could introduce a second delete pass between steps 4 and 5 to delete the dev dependencies from running Python unit tests (nuke out/ dir). However the thought worries me that the unit test runner installs Python packages before running the tests, so we will have tested a different Python setup that Emscripten uses.

The emscripten compiler itself is never run with python_deps in its PYTHON_PATH, so trying to import psutil, for example within the compiler itself would simply fail.

What I worry is that not all tests start with invoking production emcc. The majority of test checks are run on the test Python side, and not via launching the production Python interpreter. Some test code (the recent binary encode test comes to mind) do not launch emcc at all, so might be running a test only in the context of the dev packages.

Or another way an issue might occur is if test Python has a PYTHON_PATH, that launching a child tool (not necessarily just emcc, but maybe other .py subtool), might inherit the PYTHON_PATH in env, and leak the subprocess launch to use the dev dependencies.

Installing the python dev dependencies to out/python_deps does sound good, then it will demarcate dev packages from the production Python better.. though could we avoid making this installation a hard dependency?

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 6, 2025

What I worry is that not all tests start with invoking production emcc. The majority of test checks are run on the test Python side, and not via launching the production Python interpreter.

I'm not sure what you mean by this. 99% of the test run emcc from the outside. i.e. black box testing. That is the only way that any of the test compile anything all. They do things like run_process(EMCC... which launches a completely new instance of python. Not only that, it gets launched by the .bat / .sh launcher scripts, it doesn't even use the current sys.interpreter used to run the tests.

Or another way an issue might occur is if test Python has a PYTHON_PATH, that launching a child tool (not necessarily just emcc, but maybe other .py subtool), might inherit the PYTHON_PATH in env, and leak the subprocess launch to use the dev dependencies.

There are two reason I think this cannot happen:

  1. We are not setting PYTHONPATH anywhere. Modification to sys.path do not effect PYTHONPATH or subprocesses.
  2. Our .bat and .sh launcher for all our python tools run python -E which explicitly ignores PYTHONPATH. i..e the compiler itself always run kind of hermetically anyway.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 6, 2025

It is beneficial to do this testing without dev dependencies installed. That ensures that the tested Python setup matches the production one.

Its seems like we already have hard dependency on psutil in test code in test/browser_common.py... how does your CI handle that one?

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 7, 2025

5. if tests pass, zip up the emsdk directory as the Emscripten SDK artifact to Unity users.

I think you certainly want to remove the out/ directory before creating your release

@juj
Copy link
Collaborator

juj commented Nov 7, 2025

Its seems like we already have hard dependency on psutil in test code in test/browser_common.py... how does your CI handle that one?

Both pywin32 and psutil are production dependencies as per https://github.com/emscripten-core/emsdk/blob/7b4e60e4bfcba326025e373024369eaa9904af55/scripts/update_python.py#L77-L78 .

This has been the case for a very long time because the emrun tool requires the process scanning to be able to track browser shutdown. (and now more recently we extended that requirement to the parallel browser test harness)

I'm not sure what you mean by this. 99% of the test run emcc from the outside

And the remaining 1% are unlabeled, which we don't have a good grasp of which would be subject to a possible regression.

Currently the known discrepancies are explicitly tracked and flagged with the EMTEST_SKIP_PYTHON_DEV_PACKAGES env. var.

By construction, we know that the tests will run on the very same Python setup that goes out to the end users.

The two points you mention, require tacit knowledge. For example, I did not know about the -E flag to provide that safeguard that you mention. So if a refactor to that would go through the cracks in review, nothing would flag this scenario.

It looks for example that this test is not passing the -E flag. If someone refactored os.environ['PYTHONPATH'], it could inherit to that subprocess call. So someone "needs to know" to not do that.

It seems simpler to set up Python the way that end users will have it, and then test that? Then if there are dev packages that add extra, those would be managed with EMTEST_SKIP_PYTHON_DEV_PACKAGES label to be able to see what the discrepancy is.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 7, 2025

Both pywin32 and psutil are production dependencies as per https://github.com/emscripten-core/emsdk/blob/7b4e60e4bfcba326025e373024369eaa9904af55/scripts/update_python.py#L77-L78 .

But on linux we won't have python package in emsdk. How does it work on linux? I guess the OS python always supplies this?

The emun usage of psutil is specifically guarded so that it can run on systems without psutil.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 7, 2025

By construction, we know that the tests will run on the very same Python setup that goes out to the end users.

But this construction is very new and only exists in your CI. The emscripten CI on github and emsripten-releases bots have always run with dev dependencies installed.

In all the years we have been doing that we have not (as far as I remember) run into an issue where the compiler code ended up accidentally depending on a dev package.

But I agree it is possible, which is why I'm going to all these lengths to make it close to impossible. I think the mitigations in this PR against this happening are very strong.

In summary, I think we have low risk and high level of mitigation.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 7, 2025

BTW I hope is that this PR will bring the emscripten CI and emscripten-releases builders closer to your CI, in that they will no longer run dev dependencies visible to the compiler code.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 7, 2025

I suppose we can land this change without removing the opt-out, and then we and continue to discuss the opt out separately.

@juj
Copy link
Collaborator

juj commented Nov 7, 2025

That sounds good, if you have the cycles to slice the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants