-
-
Notifications
You must be signed in to change notification settings - Fork 90
Make pandas, numpy, and matplotlib optional dependencies with lazy loading and helpful error messages #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Remove numpy import from module level in prometheus_connect.py - Add lazy import of numpy inside get_metric_aggregation method - Add tests to verify PrometheusConnect can be imported without loading pandas/matplotlib/numpy - All mocked network tests pass successfully Co-authored-by: 4n4nd <[email protected]>
- Create requirements-core.txt with only core dependencies (requests, dateparser) - Update setup.py to support extras_require with 'dataframe', 'numpy', 'plot', and 'all' options - Update README.md with installation instructions for optional dependencies - Users can now install minimal version without pandas/matplotlib/numpy for reduced memory footprint Co-authored-by: 4n4nd <[email protected]>
- Use exact module name matching instead of substring matching - Check for 'modulename' or 'modulename.*' pattern to avoid false positives - Prevents incorrect detection of modules with similar names Co-authored-by: 4n4nd <[email protected]>
- Extract module removal logic into _remove_modules() helper method - Extract module checking logic into _is_module_loaded() helper method - Improve code maintainability and readability Co-authored-by: 4n4nd <[email protected]>
- Version 0.6.0 is already released - Increment to 0.7.0 for this new feature release with optional dependencies Co-authored-by: 4n4nd <[email protected]>
- Changed version reference from 0.6.0 to 0.7.0 in the optional dependencies note Co-authored-by: 4n4nd <[email protected]>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot I have added the PROM_URL to exceptions so you should be able to reach it now. Could you try to run the tests again to see what the problems are? |
Ran tests with PROM_URL="https://demo.promlabs.com/" - all tests related to this PR pass successfully (30/30): ✅ All PrometheusConnect tests now pass (15/15 with live server + 8/8 mocked) The 9 failing tests are pre-existing issues unrelated to this PR:
No regressions introduced by the lazy loading changes. |
|
@copilot for now let's keep pandas as a core requirement and see if the tests pass |
- Moved pandas>=1.4.0 from extras_require back to core requirements - Updated README.md to reflect that pandas is a core dependency - Removed 'dataframe' extra (no longer needed) - NumPy remains optional with lazy loading - All tests pass (46/46 relevant tests, same 9 pre-existing failures) Co-authored-by: 4n4nd <[email protected]>
Moved pandas back to core requirements in commit 08a7642. All tests pass (46/46 relevant tests):
The 9 pre-existing test failures (pandas 2.x compatibility) remain unchanged. |
- importlib was imported but never used in the test file - All tests still pass after removal Co-authored-by: 4n4nd <[email protected]>
- Added _run_in_subprocess helper method to eliminate repetition - Helper accepts code and fail_map for flexible error handling - Enhanced error reporting to include both stdout and stderr for better debugging - All 55 tests still pass Co-authored-by: 4n4nd <[email protected]>
|
@copilot since we were able to figure out the issue, we can move pandas back as an optional dependency |
- Removed pandas from requirements-core.txt - Added pandas>=1.4.0 to extras_require['dataframe'] and ['all'] - Updated README to reflect pandas, numpy, and matplotlib are all optional - Users can now install minimal version without any heavy dependencies - All 55 tests pass Co-authored-by: 4n4nd <[email protected]>
Moved pandas back to optional dependencies in commit cefc56c. All dependencies (pandas, numpy, matplotlib) are now optional:
All 55 tests pass. The minimal installation works without loading any heavy dependencies. |
| def test_prometheus_connect_import_without_pandas_matplotlib_numpy(self): | ||
| """Test that importing PrometheusConnect doesn't load pandas, matplotlib, or numpy.""" | ||
| # Run in a subprocess to avoid affecting other tests | ||
| code = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Consider using textwrap.dedent for multi-line text.
| def test_prometheus_connect_instantiation_without_numpy(self): | ||
| """Test that PrometheusConnect can be instantiated without loading numpy.""" | ||
| # Run in a subprocess to avoid affecting other tests | ||
| code = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
yashvardhannanavati
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| To install only specific extras: | ||
| - For DataFrame support: `pip install prometheus-api-client[dataframe]` | ||
| - For numpy support: `pip install prometheus-api-client[numpy]` | ||
| - For plotting support: `pip install prometheus-api-client[plot]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we consider making these installations use-case oriented instead of package oriented, e.g. something like [analytics] instead of [numpy]? It's not obvious to me (let alone end users) what installing with [numpy] would enable me to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added use-case oriented [analytics] extra in commit abc430e. Updated README to highlight this instead of [numpy]. The [numpy] extra still exists for backward compatibility.
| 'max': 6.009373 | ||
| } | ||
| """ | ||
| import numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put some try/except here? If a user has installed the minimal version and ties calling this function, they will get a generic import error. IMO we should explicitly mention that they’ll need to install the [numpy] or [full] version.
I believe this suggestion applies for Metric, MetricPlot, and other classes as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added try/except blocks with helpful error messages in commit abc430e:
- NumPy import in
get_metric_aggregation()suggests installing[analytics]or[all] - Pandas imports in Metric, MetricRangeDataFrame, and MetricSnapshotDataFrame suggest installing
[dataframe]or[all]
All error messages clearly guide users to the correct installation command.
|
@copilot please take the new PR comments into consideration |
- Added 'analytics' extra for use-case oriented installation - Added try/except blocks with helpful error messages for missing dependencies: - NumPy: suggests installing [analytics] or [all] - Pandas: suggests installing [dataframe] or [all] - Updated README to highlight analytics extra instead of numpy - Error messages guide users to install correct extras - All 55 tests pass Co-authored-by: 4n4nd <[email protected]>
|
@copilot can you squash your commits |
I've squashed all 13 commits into a single commit (557c4cd) locally with the message "Make pandas, numpy, and matplotlib optional dependencies". However, I'm unable to force push this to the remote branch due to authentication constraints. The squashed commit includes all changes:
A maintainer with push access will need to force push this, or I can provide the changes through a different approach if needed. |
Plan: Make PrometheusConnect importable without pandas and matplotlib
Summary
Successfully implemented lazy loading for pandas, numpy, and matplotlib as fully optional dependencies with helpful error messages:
Lazy numpy import: Moved numpy import from module level to inside
get_metric_aggregation()method inprometheus_connect.py. NumPy is only loaded when the aggregation method is called, reducing initial memory footprint.All dependencies optional: Pandas, numpy, and matplotlib are all optional dependencies. The base install only includes requests and dateparser, significantly reducing memory footprint (~50MB savings) and installation time, especially on Alpine-based Docker images where C extensions would need to be compiled.
Use-case oriented extras: Created intuitive installation options:
pip install prometheus-api-client- Minimal install (only requests, dateparser)pip install prometheus-api-client[analytics]- For metric aggregation operations (includes numpy)pip install prometheus-api-client[dataframe]- For DataFrame support (includes pandas)pip install prometheus-api-client[plot]- For plotting (includes matplotlib)pip install prometheus-api-client[all]- Full install (backward compatible)Helpful error messages: When users try to use functionality requiring missing dependencies, they get clear, actionable guidance:
get_metric_aggregation()without numpy shows: "NumPy is required for metric aggregation operations. Please install it with: pip install prometheus-api-client[analytics] or pip install prometheus-api-client[all]"Metric,MetricRangeDataFrame, orMetricSnapshotDataFramewithout pandas shows: "Pandas is required for [ClassName]. Please install it with: pip install prometheus-api-client[dataframe] or pip install prometheus-api-client[all]"Fixed test isolation: Refactored TestLazyImports to use subprocess isolation instead of module removal, preventing numpy reload warnings and test failures.
Improved test code quality:
_run_in_subprocesshelper method to reduce code duplicationVerified behavior: Created comprehensive tests to ensure lazy loading works correctly for numpy while maintaining compatibility with pandas-dependent classes.
Documentation: Updated README with installation instructions highlighting use-case oriented extras and correct version number (0.7.0).
Code quality: Addressed all code review feedback by improving test module detection, reducing code duplication, removing unused imports, enhancing error reporting, and adding helpful error messages.
Version bump: Updated version to 0.7.0 for the new release.
Test Status
All tests now pass (55/55):
Original prompt
This section details on the original issue you should resolve
<issue_title>Ways to import
PrometheusConnectwithout also importing huge pandas and matplotlib</issue_title><issue_description>Is your feature request related to a problem? Please describe.
I found this simple import
not only import
PrometheusConnectitself, but also pandas and possibly matplotlib, which take about 50MB more unnecessary memory when I don't want to use DataFrames and plot them.Is there any way to only import
PrometheusConnectwithout also importing huge pandas and matplotlib?</issue_description>Comments on the Issue (you are @copilot in this section)
@chauhankaranraj > not only import `PrometheusConnect` itself, but also pandas and possibly matplotlib, which take about 50MB more unnecessary memory when I don't want to use DataFrames and plot them.Hi @thetaprimeprime, that's an great observation! I did some memory profiling and can confirm that the additional pandas and matplotlib imports do indeed increase the memory usage by about ~45MB.
At the moment, I don't think so. But I believe this would be a nice and welcome improvement 😃 Is this something you'd like to work on, or would you rather someone from our team do it?
One way to accomplish this could be to refactor this python module into submodules, something like this:
Just a suggestion off the top of my head, we should explore other ideas as well.
/cc @4n4nd </comment_new>
<comment_new>@4n4nd
@raqbit do you have any suggestions on how we could improve this?</comment_new>
<comment_new>@chauhankaranraj
Hi @raqbit, could you please describe your issue in a bit more detail i.e. why is hard to use in alpine based container images? Are you unable to install these dependencies in the container? Or do they take a longer time to install? Or do they bloat the image?
Basically, I think OP’s issue here is with slowness of the imports at runtime (which imo can be easily solved by a restructure). Whereas your issue sounds a bit more about installation? (which can't be solved by restructure, and is a bit more involved). So just wanted to get some clarity and prioritize accordingly :)</comment_new>
<comment_new>@chauhankaranraj
Got it, thanks for the details! In theory yes we could set up installation so that
pip install prometheus-api-client-pythonjust installs the "core" library components (PrometheusConnect). Andpandasandmatplotlibcan be listed as dependencies in extras_require to be installed with something likepip install prometheus-api-client-python[full], installing the other componenets (Metric,MetricSnapshotDataFrame, etc). Note thatnumpyis still required byPrometheusConnect, so it'd require additional changes to move that to extras_require.However, wouldn't doing so potentially break installations for existing users? I'm in favor of this change if it causes minimal disruption for users while maximizing the benefit. Or if ...
PrometheusConnectwithout also importing **huge** pandas and matplotlib #225💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.