Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process archive in mememory #75

Merged
merged 1 commit into from
Mar 12, 2025
Merged

process archive in mememory #75

merged 1 commit into from
Mar 12, 2025

Conversation

pablochacin
Copy link
Contributor

@pablochacin pablochacin commented Mar 11, 2025

Fixes multiple issues when trying to extract the archive and process it with k6pack, including issues with windows paths.
Also ensures that only the content of the archive is processed.

Closes #70

Signed-off-by: Pablo Chacin <[email protected]>
@pablochacin pablochacin marked this pull request as ready for review March 11, 2025 17:30
@pablochacin pablochacin requested a review from a team as a code owner March 11, 2025 17:30
@pablochacin pablochacin requested review from szkiba and removed request for a team March 11, 2025 17:30
Copy link
Collaborator

@szkiba szkiba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@olegbespalov olegbespalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Left non-blocking suggestions

// analizeMetadata extracts the dependencies from the metadata.json file
func analizeMetadata(content []byte) (analyzer, error) {
metadata := archiveMetadata{}
if err := json.Unmarshal(content, &metadata); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, we could use json.NewDecoder and then there is no need to pass bytes, but we could use io.Reader directly here

continue
}
content := &bytes.Buffer{}
if _, err := io.CopyN(content, reader, maxFileSize); err != nil && !errors.Is(err, io.EOF) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a corner case, but still, I'm thinking isn't that right for the files that are bigger than 10 MB 🤔 Shoudld we maybe write a warning log in that case.

If the other suggestion about analizeMetadata will be accepted, we could probably move these lines closer to the scriptAnalyzer

And perhaps not for that PR, but it's worth investigation if we can instead of copying content.Bytes() just pass around readers, in other words make possible to scriptAnalyzer also work with buffers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a corner case, but still, I'm thinking isn't that right for the files that are bigger than 10 MB 🤔 Shoudld we maybe write a warning log in that case.

Not sure what you refer to in "isn't that right". Could you please elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding that this line will copy at maximum 10MB, silently ignoring the rest, and I'm questioning if that's a right way to do. Yes, the risks that we processing bigger files where module usage located after 10MB of data is low, but still

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps not for that PR, but it's worth investigation if we can instead of copying content.Bytes() just pass around readers, in other words make possible to scriptAnalyzer also work with buffers

I definitely will do this. I started but realized it requires significant changes in other parts of the code and prefer to make this in a follow-up PR.

Copy link
Contributor

@olegbespalov olegbespalov Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, like I said, both my comments are non-blocking, feel free to merge this as it is and continue in follow-up PRs 👍

@pablochacin pablochacin merged commit 2050984 into main Mar 12, 2025
6 checks passed
@pablochacin pablochacin deleted the process-archive-in-memory branch March 12, 2025 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce file system usage
3 participants