-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Python: Extract files in hidden dirs by default #19424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
eef5a49
to
4d00556
Compare
Changes the default behaviour of the Python extractor so files inside hidden directories are extracted by default. Also adds an extractor option, `skip_hidden_directories`, which can be set to `true` in order to revert to the old behaviour. Finally, I made the logic surrounding what is logged in various cases a bit more obvious. Technically this changes the behaviour of the extractor (in that hidden excluded files will now be logged as `(excluded)`, but I think this makes more sense anyway.
4d00556
to
5cf3242
Compare
c3ac796
to
be88e61
Compare
be88e61
to
2ded42c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR changes the Python extractor to include files in hidden directories by default, adds an option to revert to the old behavior, and makes the logging logic more explicit.
- Default extraction now includes files in hidden directories.
- New
skip_hidden_directories
extractor option to restore previous behavior. - Refactored logging branches and updated tests, change notes, and schema.
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
python/ql/test/extractor-tests/filter-option/Test.expected | Added expected line for hidden_foo.py script. |
python/ql/test/2/extractor-tests/hidden/test.expected | Updated output to include files under hidden directories. |
python/ql/lib/change-notes/2025-04-30-extract-hidden-files-by-default.md | Documented new default behavior and skip_hidden_directories option. |
python/extractor/semmle/traverser.py | Reordered exclusion checks, added env‑var‑driven is_hidden override. |
python/extractor/cli-integration-test/hidden-files/test.sh | Added CLI integration tests for default and skipped hidden behavior. |
python/extractor/cli-integration-test/hidden-files/repo_dir/foo.py | Base script for the CLI test. |
python/extractor/cli-integration-test/hidden-files/query.ql | Query to list extracted file names. |
python/extractor/cli-integration-test/hidden-files/query-default.expected | Expected output when hidden files are included. |
python/extractor/cli-integration-test/hidden-files/query-skipped.expected | Expected output when skip_hidden_directories=true . |
python/codeql-extractor.yml | Added skip_hidden_directories option to extractor schema. |
Comments suppressed due to low confidence (3)
python/codeql-extractor.yml:47
- [nitpick] Consider defining
skip_hidden_directories
as a boolean type instead of a string with a pattern, to align with typical boolean option conventions.
skip_hidden_directories:
python/extractor/semmle/traverser.py:103
- There is no test for the Windows-specific
is_hidden
branch whenskip_hidden_directories=true
onos.name == 'nt'
. Consider adding a Windows-mode test to cover that path.
if os.environ.get("CODEQL_EXTRACTOR_PYTHON_OPTION_SKIP_HIDDEN_DIRECTORIES", "false") == "false":
python/extractor/cli-integration-test/hidden-files/repo_dir/foo.py:1
- The CLI integration test expectations reference
.hidden_file.py
andvisible_file_in_hidden_dir.py
, but those files are not present inrepo_dir
. Add these files to matchquery-default.expected
andquery-skipped.expected
.
print(42)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK. However, setting is_hidden
to always return false is almost "too clever" as it becomes semantically a bit confusing. I would have preferred a more direct structure; something like
is_hidden
stays as is- define
is_excluded
(oris_traversed
) to deal with the logic involvingexclude_paths
,is_hidden
and the configuration regarding traversing hidden files - use
is_excluded
on line 86
is_excluded
would have to return a reason for logging so could not be a clean boolean, but it still feels like a "simpler" solution.
That's a fair point. In fact, originally my intention was to have the "do we skip hidden dirs or not" logic be located inside the Really, what I think we should do is get rid of To me, at least, it seems like a much better default behaviour to say "we extract all Python files" rather than have weird heuristics for which ones to extract or not.
I feel abstracting the exclusion logic into an Alternatively, how would you feel if I renamed I'm going to test if excluding hidden directories can be done using the existing mechanism for file filtering. If it does, then I'll just get rid of |
Changes the default behaviour of the Python extractor so that files inside hidden directories are extracted by default.
Also adds an extractor option,
skip_hidden_directories
, which can be set totrue
in order to revert to the old behaviour.Finally, I made the logic surrounding what is logged in various cases a bit more obvious.
Technically this changes the behaviour of the extractor (in that hidden excluded files will now be logged as
(excluded)
, but I think this makes more sense anyway.