Save files in a directory tree of bounded degree#171
Merged
kralka merged 5 commits intogoogle:mainfrom May 23, 2025
Merged
Conversation
Limit the number of files in a directory and the number of subdirectories during random file path generation.
Pull Request Test Coverage Report for Build 15191406455Details
💛 - Coveralls |
jmichelp
reviewed
May 23, 2025
| # No exception in the top level | ||
| for _ in range((max_branching**levels) + 10): | ||
| n = generator.get_path() | ||
| assert n.count("/") == levels - 1 |
Collaborator
There was a problem hiding this comment.
Does the test pass on Windows or do you need to change that for os.pathsep?
Collaborator
Author
There was a problem hiding this comment.
[obsolete, generating Path]
Yes, tests passed (https://github.com/google/sedpack/blob/main/.github/workflows/pytest.yml#L20C18-L20C24). Pathlib string to path should work with forward slashes on all platforms, right?
| assert all(len(part) == name_length for part in p.parts[:-1]) | ||
|
|
||
| # Enforce format: name/name/name/long_name | ||
| for l in range(1, levels): |
Collaborator
There was a problem hiding this comment.
Isn't that a duplicate from the assert above?
Collaborator
Author
There was a problem hiding this comment.
[obsolete, generating Path]
This was useful later for checking number of subdirectories.
jmichelp
previously approved these changes
May 23, 2025
| name_length=name_length, | ||
| ) | ||
|
|
||
| seen_paths: set[str] = set() |
wsxrdv
approved these changes
May 23, 2025
wsxrdv
added a commit
to wsxrdv/sedpack
that referenced
this pull request
Aug 25, 2025
Pull request google#171 introduced a problem with benchmarking code rendering it useless. This commit traverses the dataset directory recursively and thus again introducec meaningful benchmarking. Also it makes sure that we traverse the expected number of shards. This commit includes also test and holdout since when we have that data we might use it. This will introduce a regression compared to the first benchmarks and definitely one compared to the empty benchmarks.
Merged
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Aug 25, 2025
Pull request #171 introduced a problem with benchmarking code rendering it useless. This commit traverses the dataset directory recursively and thus again introduces meaningful benchmarking. Also it makes sure that we traverse the expected number of shards. This commit includes also test and holdout splits to benchmarking since when we have that data we might use it. This will introduce a regression compared to the first benchmarks and definitely one compared to the empty benchmarks. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Challenge-Cyber
pushed a commit
to Challenge-Cyber/zhu
that referenced
this pull request
Sep 29, 2025
Pull request google/sedpack#171 introduced a problem with benchmarking code rendering it useless. This commit traverses the dataset directory recursively and thus again introduces meaningful benchmarking. Also it makes sure that we traverse the expected number of shards. This commit includes also test and holdout splits to benchmarking since when we have that data we might use it. This will introduce a regression compared to the first benchmarks and definitely one compared to the empty benchmarks. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Limit the number of files in a directory and the number of subdirectories of a directory during random file path generation.