-
Notifications
You must be signed in to change notification settings - Fork 574
feat: Enhance process_systems to recursively search all paths in systems list #5033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the process_systems function to handle lists of system directory paths by iterating over each path and applying expansion logic individually. Previously, when systems was a list, it was simply copied; now each item is processed through the same expansion logic as single string inputs.
Key changes:
- Modified
process_systemsto iterate over list items and expand each path individually - Updated documentation for better clarity on how the function handles both strings and lists
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| deepmd/utils/data_system.py | Refactored process_systems to expand each list item individually, improved docstring clarity, and added explicit type checking with error handling |
| deepmd/utils/argcheck.py | Updated documentation strings to clarify behavior for string and list inputs for both training and validation data |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Warning Rate limit exceeded@OutisLi has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 16 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughUpdated documentation for training and validation data argument descriptions to clarify system path handling. Modified Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
deepmd/utils/data_system.py (2)
808-817: Consider TypeError and add list item validation.Two suggestions for improvement:
Use TypeError for invalid types: When the input type is incorrect,
TypeErroris more semantically appropriate thanValueError(as suggested by static analysis).Validate list items are strings: Currently, if
systemsis a list containing non-string items (e.g.,["/path", None]or["/path", 123]), the error will occur later inexpand_sys_strorrglob_sys_str, producing a potentially confusing error message.Apply this diff to improve type handling:
# Normalize input to a list of paths to search if isinstance(systems, str): search_paths = [systems] elif isinstance(systems, list): + # Validate all items are strings + for idx, item in enumerate(systems): + if not isinstance(item, str): + raise TypeError( + f"All items in systems list must be str, but systems[{idx}] is {type(item).__name__}." + ) search_paths = systems else: # Handle unsupported input types - raise ValueError( - f"Invalid systems type: {type(systems)}. Must be str or list[str]." + raise TypeError( + f"systems must be str or list[str], got {type(systems).__name__}." )
819-829: Consider deduplicating the final result for edge cases.The current implementation may return duplicate system paths if:
- Multiple paths in the input list overlap (e.g.,
["/data", "/data/subset1"])- The same path appears multiple times in the input list
While
rglob_sys_strdeduplicates within each path search,expand_sys_strdoes not, and duplicates across different search paths are not removed.If deduplication is desired, you could modify the return statement:
result_systems.extend(expanded_paths) -return result_systems +# Deduplicate while preserving order +seen = set() +deduplicated = [] +for system in result_systems: + if system not in seen: + seen.add(system) + deduplicated.append(system) +return deduplicatedAlternatively, if order doesn't matter:
-return result_systems +return list(dict.fromkeys(result_systems)) # Preserves order in Python 3.7+Or simply:
-return result_systems +return list(set(result_systems)) # May change order
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/utils/argcheck.py(2 hunks)deepmd/utils/data_system.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Always run
ruff check .andruff format .before committing changes to Python code
Files:
deepmd/utils/data_system.pydeepmd/utils/argcheck.py
🧬 Code graph analysis (1)
deepmd/utils/data_system.py (1)
deepmd/common.py (2)
expand_sys_str(191-208)rglob_sys_str(211-232)
🪛 Ruff (0.14.2)
deepmd/utils/data_system.py
815-817: Prefer TypeError exception for invalid type
(TRY004)
815-817: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (47)
- GitHub Check: CodeQL analysis (python)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test C++ (true)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test C++ (false)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test C++ (true)
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (3)
deepmd/utils/argcheck.py (2)
2994-2998: LGTM! Clear documentation of systems parameter behavior.The updated documentation accurately describes the new behavior where both string and list inputs undergo recursive directory search, with each list item processed identically to a standalone string input.
3075-3079: LGTM! Consistent documentation across training and validation data.The validation data documentation mirrors the training data documentation, maintaining consistency and clarity.
deepmd/utils/data_system.py (1)
787-807: LGTM! Updated docstring accurately reflects new behavior.The docstring clearly describes the new uniform treatment of both string and list inputs, where each path is recursively searched for system directories.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #5033 +/- ##
=======================================
Coverage 84.23% 84.23%
=======================================
Files 709 709
Lines 70073 70079 +6
Branches 3619 3620 +1
=======================================
+ Hits 59026 59032 +6
- Misses 9879 9881 +2
+ Partials 1168 1166 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
This PR modifies the
process_systemsutility function to change how it handles list inputs.Previously, if the
systemsargument was astr, the function would recursively search that path for systems. However, ifsystemswas alist, the function would return the list as-is, assuming it was already a complete list of system paths.This update unifies the logic. The function now treats every string path—whether it's a single
strinput or an item within alist—as a directory to be recursively searched. It also refactors the internal logic to first normalize the input into a list of paths and then process them uniformly, improving code clarity and maintainability.Motivation and Justification
The original implementation's inconsistent handling of
strversuslistinputs caused two significant problems:input.jsonlike"systems": ["/path/to/training_data"], would fail. The function would not search inside/path/to/training_datafor the actual system directories (e.g.,set.000,set.001, etc.)."systems": ["/path/to/dataset_A", "/path/to/dataset_B"].This change solves both problems by ensuring that paths provided in a list are searched recursively, just as a single string path would be.
Benefits
input.json.stror alist[str].Summary by CodeRabbit
Release Notes
Documentation
Improvements