Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better security-wise style bot GH Action #2914

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

hanouticelina
Copy link
Contributor

following @glegendre01's feedback here (private), this PR addresses the following points:

I think the style bot action need to be in a dedicated repo to completely be secure (currently anybody can edit setup.py on the head branch). Could it be like docbuilder ?

Instead, I added a validation step at the beginning of the workflow to verify that setup.py and Makefile haven't been modified. Since the following steps only involve running pip install with extras and executing make commands, these appear to be the main "protected" files.

Did you test the context.payload.comment.user.login even on worklow trigger by dispatch ? (just want to be sure GH is passing correctly the context....)

I added the possibility to run the workflow manually in 1831f66. you can see the context being correctly retrieved in this example run. However, if i understand correctly, the context structure is different between the two trigger types. For workflow_dispatch, we use context.actor, while for issue_comment, we use context.payload.comment.user.login.

I'm not sure if the manual trigger is useful, I can remove it if not.

I'm not very comfortable with input style_command. We can run random code, and it's not a good practice

the ability to run arbitrary style commands has been removed in favor of hardcorded commands (make style && make quality,make style, make quality) controlled through a case switch.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some comments.

Comment on lines +45 to +52
let comment_user;
if (context.eventName === 'workflow_dispatch') {
comment_user = context.actor;
console.log('Workflow triggered manually by:', comment_user);
} else {
comment_user = context.payload.comment.user.login;
console.log('Workflow triggered by comment from:', comment_user);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this only be reserved for users having admin privileges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's already checked here :

const authorized = permission.permission === 'admin';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(both for workflow_dispatch and issue comment)

Copy link
Contributor

@ydshieh ydshieh Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I am not sure about the details, but I kind feel admin is not guaranteed to every maintainer if we are talking about HF repositories.

For example, for transformers, I think admin is not guaranteed to say @molbap @qubvel etc. I am admin because I need to access transformers repository setting page.

It might be a good idea to double check here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my statement above is correct, what I did so far is a lazy approach by a hard coded list of user name .... (but that is not good neither for security as people may leave the team someday)

https://github.com/huggingface/transformers/blob/2c2495cc7b0e3e2942a9310f61548f40a2bc8425/.github/workflows/self-comment-ci.yml#L32

const modifiedFiles = pr.map(file => file.filename);
console.log("Modified files:", modifiedFiles);

const protectedFiles = ["setup.py", "Makefile"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, why do you check this file in the workflow? I mean Makefile and setup.py are checked because they directly affect command execution so I wonder why check_doc_toc.py is also protected here.

quickly checked other HF repos, it seems that this file is not common in other repos (except for transformers), maybe we can add a parameter additional_protected_files so that you can pass utils/check_doc_toc.py as well. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_doc_toc.py is a part of the make style && make quality process. So, in the case it was modified, it exposes security risks.

Copy link
Contributor

@Wauplin Wauplin Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of checking for protected files, what do you think of allowing the process to run only if the @bot /style comment has been written by a maintainer/reviewer? Since this bot is mostly there to help reviewers with stuck PRs, I don't think it's that much of a problem if we forbid external contributors to run it (we can always do it for them). And on the contrary it makes things more secure + reduces the workflow complexity by always requiring a pair of eyes "from HF" to validate.

To make the logic simple, I'd suggest adding a maintainers parameter to the GH workflow:

# example for huggingface_hub
jobs:
  style:
    uses: ./.github/workflows/style-bot-action.yml
    with:
      python_quality_dependencies: "[quality]"
      style_command_type: "style_only"
      pr_number: ${{ fromJSON(inputs.pr_number || '0') }}
    secrets:
      bot_token: ${{ secrets.GITHUB_TOKEN }}
    maintainers: hanouticelina wauplin julien-c etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought this was already implemented. In huggingface/diffusers#10931, Dhruv from our team had already implemented it.

But accidentally, if the @bot /style command is triggered somehow we still have the risk. So, I would prefer to have the modification check even if there's complexity.

Copy link
Contributor Author

@hanouticelina hanouticelina Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, @Wauplin reminded me that, if we check for protected files, we also have a couple of scripts in utils/ for huggingface_hub that we should check too.
if we keep this step, we can check Makefile, setup.py and the folders utils/ and scripts/.

@glegendre01 what do you think about this?

Copy link
Contributor

@Wauplin Wauplin Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see! Then no need for extra maintainers field. Not sure it makes sense to have the concept of "protected files" then. What I fear with this added layer is that:

  1. if someone makes legit changes to a protected file, we can't use the bot (in huggingface_hub, this can happen for files under https://github.com/huggingface/huggingface_hub/tree/main/utils)
  2. it is easy to forget to add a file to this "protected files" list. Having a manual step (i.e. writing @bot /style) is similar to the "approve and run workflows" button from GH. I feel that not having a "protected files" list shifts the responsibility on the admins who should feel more engaged/involved to really check what's inside the PR before approving it
    1. related to 2., I feel that the "protected files list" adds a broader scope of logic to think of, meaning more weaknesses. Typically someone that find a way to execute custom code when a script is triggered, without modifying this script. Since the range of existing tools and scripts is large, it's harder to account for all potential cases. Whereas requiring an extra check from admins (by letting them know that they have full responsibility, not half-responsibility) makes it much harder to dodge the system. Or at least, as hard as any change in a test file that is also executed by the CI.

No strong opinion overall if you or @glegendre01 really think this is a requirement to improve security, but IMO we should not do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see your point. If the workflow is aimed to be used across numerous repos, then of course having the admins check the file changes first is easier considering the trade-offs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that this was only my 2c, I'm not security expert here. I just find that the filter might lower attention without completely removing all the attack scope. And that requiring CI approval for "make style" is quite similar to requiring CI approval for the tests suite. Both can execute malicious code if inadvertently approved. "half responsibility" is not the best way to phrase it but didn't find a better terminology 😬)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiring CI approval for "make style" is quite similar to requiring CI approval for the tests suite. Both can execute malicious code if inadvertently approved.

agree! 👍

@ydshieh
Copy link
Contributor

ydshieh commented Mar 13, 2025

I also agree @Wauplin 's points: I'm afraid of the notion of a list of protected files is not scalable, if I look at the Makefile in transformers, and it's likely to miss some file in the future.

Ensure only maintainers could triggering the workflow makes more sense to me.

And a nit : fromJSON(inputs.pr_number || '0' doesn't make a lot of sense to me if I understand correctly. It should be a required field.

@hanouticelina
Copy link
Contributor Author

hanouticelina commented Mar 13, 2025

And a nit : fromJSON(inputs.pr_number || '0' doesn't make a lot of sense to me if I understand correctly. It should be a required field.

@ydshieh The pr_number input isn't required because the workflow can be triggered in two ways:

  • through an issue comment, where the PR number is automatically extracted from the comment context, and in that case we don't need the pr_number input.
  • through manual workflow dispatch, where the PR number needs to be provided as input.

i can also make the pr_number required for manual workflow dispatch and use a condition to pass the PR number only when needed.

@ydshieh
Copy link
Contributor

ydshieh commented Mar 13, 2025

If we put pr_number as required in both workflow_dispatch and workflow_call, and in
.github/workflows/style-bot.yml we prepare pr_number in a more complete way, it would be nice.

For the current version, the concern I have is: when workflow_dispatch trigger the run, and pr_number is not set in workflow_dispatch (as it's not required), then in workflow_call, it get's 0 in const prNumber = context.eventName === 'workflow_dispatch', which will lead to something strange.

Any approach that could avoid pr number not corresponding the desired one would work for me 🙏

(well, I agree it's edge case, so if you want to move on, also OK)

@hanouticelina
Copy link
Contributor Author

@ydshieh I addressed your comment in 3fc12e3

I removed the "protected" files checking step as discussed here, the workflow is run only if the @bot /style comment has been written or triggered manually by an admin. Let's wait for @glegendre01 review and opinion on this and then we will be good to merge.

…com:huggingface/huggingface_hub into fix-security-vulnerability-style-bot-action
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants