A better security-wise style bot GH Action #2914

hanouticelina · 2025-03-07T16:45:11Z

following @glegendre01's feedback here (private), this PR addresses the following points:

I think the style bot action need to be in a dedicated repo to completely be secure (currently anybody can edit setup.py on the head branch). Could it be like docbuilder ?

Instead, I added a validation step at the beginning of the workflow to verify that setup.py and Makefile haven't been modified. Since the following steps only involve running pip install with extras and executing make commands, these appear to be the main "protected" files.

Did you test the context.payload.comment.user.login even on worklow trigger by dispatch ? (just want to be sure GH is passing correctly the context....)

I added the possibility to run the workflow manually in 1831f66. you can see the context being correctly retrieved in this example run. However, if i understand correctly, the context structure is different between the two trigger types. For workflow_dispatch, we use context.actor, while for issue_comment, we use context.payload.comment.user.login.

I'm not sure if the manual trigger is useful, I can remove it if not.

I'm not very comfortable with input style_command. We can run random code, and it's not a good practice

the ability to run arbitrary style commands has been removed in favor of hardcorded commands (make style && make quality,make style, make quality) controlled through a case switch.

HuggingFaceDocBuilderDev · 2025-03-07T16:49:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Thanks! Left some comments.

sayakpaul · 2025-03-08T04:09:54Z

.github/workflows/style-bot-action.yml

+            let comment_user;
+            if (context.eventName === 'workflow_dispatch') {
+              comment_user = context.actor;
+              console.log('Workflow triggered manually by:', comment_user);
+            } else {
+              comment_user = context.payload.comment.user.login;
+              console.log('Workflow triggered by comment from:', comment_user);
+            }


Should this only be reserved for users having admin privileges?

yes it's already checked here :

huggingface_hub/.github/workflows/style-bot-action.yml

Line 65 in 3fc12e3

const authorized = permission.permission === 'admin';

(both for workflow_dispatch and issue comment)

well, I am not sure about the details, but I kind feel admin is not guaranteed to every maintainer if we are talking about HF repositories.

For example, for transformers, I think admin is not guaranteed to say @molbap @qubvel etc. I am admin because I need to access transformers repository setting page.

It might be a good idea to double check here.

If my statement above is correct, what I did so far is a lazy approach by a hard coded list of user name .... (but that is not good neither for security as people may leave the team someday)

https://github.com/huggingface/transformers/blob/2c2495cc7b0e3e2942a9310f61548f40a2bc8425/.github/workflows/self-comment-ci.yml#L32

sayakpaul · 2025-03-08T04:11:15Z

.github/workflows/style-bot-action.yml

+            const modifiedFiles = pr.map(file => file.filename);
+            console.log("Modified files:", modifiedFiles);
+
+            const protectedFiles = ["setup.py", "Makefile"];


We check for this too:
https://raw.githubusercontent.com/huggingface/diffusers/refs/heads/main/utils/check_doc_toc.py

out of curiosity, why do you check this file in the workflow? I mean Makefile and setup.py are checked because they directly affect command execution so I wonder why check_doc_toc.py is also protected here.

quickly checked other HF repos, it seems that this file is not common in other repos (except for transformers), maybe we can add a parameter additional_protected_files so that you can pass utils/check_doc_toc.py as well. wdyt?

check_doc_toc.py is a part of the make style && make quality process. So, in the case it was modified, it exposes security risks.

instead of checking for protected files, what do you think of allowing the process to run only if the @bot /style comment has been written by a maintainer/reviewer? Since this bot is mostly there to help reviewers with stuck PRs, I don't think it's that much of a problem if we forbid external contributors to run it (we can always do it for them). And on the contrary it makes things more secure + reduces the workflow complexity by always requiring a pair of eyes "from HF" to validate.

To make the logic simple, I'd suggest adding a maintainers parameter to the GH workflow:

# example for huggingface_hub jobs: style: uses: ./.github/workflows/style-bot-action.yml with: python_quality_dependencies: "[quality]" style_command_type: "style_only" pr_number: ${{ fromJSON(inputs.pr_number || '0') }} secrets: bot_token: ${{ secrets.GITHUB_TOKEN }} maintainers: hanouticelina wauplin julien-c etc.

Oh I thought this was already implemented. In huggingface/diffusers#10931, Dhruv from our team had already implemented it.

But accidentally, if the @bot /style command is triggered somehow we still have the risk. So, I would prefer to have the modification check even if there's complexity.

Also, @Wauplin reminded me that, if we check for protected files, we also have a couple of scripts in utils/ for huggingface_hub that we should check too.
if we keep this step, we can check Makefile, setup.py and the folders utils/ and scripts/.

@glegendre01 what do you think about this?

Oh I see! Then no need for extra maintainers field. Not sure it makes sense to have the concept of "protected files" then. What I fear with this added layer is that:

if someone makes legit changes to a protected file, we can't use the bot (in huggingface_hub, this can happen for files under https://github.com/huggingface/huggingface_hub/tree/main/utils)

it is easy to forget to add a file to this "protected files" list. Having a manual step (i.e. writing @bot /style) is similar to the "approve and run workflows" button from GH. I feel that not having a "protected files" list shifts the responsibility on the admins who should feel more engaged/involved to really check what's inside the PR before approving it

related to 2., I feel that the "protected files list" adds a broader scope of logic to think of, meaning more weaknesses. Typically someone that find a way to execute custom code when a script is triggered, without modifying this script. Since the range of existing tools and scripts is large, it's harder to account for all potential cases. Whereas requiring an extra check from admins (by letting them know that they have full responsibility, not half-responsibility) makes it much harder to dodge the system. Or at least, as hard as any change in a test file that is also executed by the CI.

No strong opinion overall if you or @glegendre01 really think this is a requirement to improve security, but IMO we should not do it.

Okay I see your point. If the workflow is aimed to be used across numerous repos, then of course having the admins check the file changes first is easier considering the trade-offs.

(Note that this was only my 2c, I'm not security expert here. I just find that the filter might lower attention without completely removing all the attack scope. And that requiring CI approval for "make style" is quite similar to requiring CI approval for the tests suite. Both can execute malicious code if inadvertently approved. "half responsibility" is not the best way to phrase it but didn't find a better terminology 😬)

requiring CI approval for "make style" is quite similar to requiring CI approval for the tests suite. Both can execute malicious code if inadvertently approved.

agree! 👍

ydshieh · 2025-03-13T13:56:56Z

I also agree @Wauplin 's points: I'm afraid of the notion of a list of protected files is not scalable, if I look at the Makefile in transformers, and it's likely to miss some file in the future.

Ensure only maintainers could triggering the workflow makes more sense to me.

And a nit : fromJSON(inputs.pr_number || '0' doesn't make a lot of sense to me if I understand correctly. It should be a required field.

hanouticelina · 2025-03-13T14:16:41Z

And a nit : fromJSON(inputs.pr_number || '0' doesn't make a lot of sense to me if I understand correctly. It should be a required field.

@ydshieh The pr_number input isn't required because the workflow can be triggered in two ways:

through an issue comment, where the PR number is automatically extracted from the comment context, and in that case we don't need the pr_number input.
through manual workflow dispatch, where the PR number needs to be provided as input.

i can also make the pr_number required for manual workflow dispatch and use a condition to pass the PR number only when needed.

ydshieh · 2025-03-13T15:28:12Z

If we put pr_number as required in both workflow_dispatch and workflow_call, and in
.github/workflows/style-bot.yml we prepare pr_number in a more complete way, it would be nice.

For the current version, the concern I have is: when workflow_dispatch trigger the run, and pr_number is not set in workflow_dispatch (as it's not required), then in workflow_call, it get's 0 in const prNumber = context.eventName === 'workflow_dispatch', which will lead to something strange.

Any approach that could avoid pr number not corresponding the desired one would work for me 🙏

(well, I agree it's edge case, so if you want to move on, also OK)

hanouticelina · 2025-03-14T17:37:32Z

@ydshieh I addressed your comment in 3fc12e3

I removed the "protected" files checking step as discussed here, the workflow is run only if the @bot /style comment has been written or triggered manually by an admin. Let's wait for @glegendre01 review and opinion on this and then we will be good to merge.

…com:huggingface/huggingface_hub into fix-security-vulnerability-style-bot-action

hanouticelina added 4 commits March 7, 2025 16:58

better security-wise gh action

5828734

run workflow manually

1831f66

nit

61c5cbb

fix

4d42d1b

hanouticelina requested review from ydshieh, Wauplin, sayakpaul and glegendre01 March 7, 2025 16:45

sayakpaul reviewed Mar 8, 2025

View reviewed changes

Add credits comment

d81b168

hanouticelina added 3 commits March 14, 2025 18:29

make pr_number required

3fc12e3

remove file protected checking

cfb6aae

Merge branch 'main' into fix-security-vulnerability-style-bot-action

09d0b68

hanouticelina added 2 commits March 14, 2025 18:43

fix

b1568e7

Merge branch 'fix-security-vulnerability-style-bot-action' of github.…

5dd2c6e

…com:huggingface/huggingface_hub into fix-security-vulnerability-style-bot-action

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A better security-wise style bot GH Action #2914

A better security-wise style bot GH Action #2914

hanouticelina commented Mar 7, 2025

HuggingFaceDocBuilderDev commented Mar 7, 2025

sayakpaul left a comment

sayakpaul Mar 8, 2025

hanouticelina Mar 14, 2025

hanouticelina Mar 14, 2025

ydshieh Mar 14, 2025 •

edited

Loading

ydshieh Mar 14, 2025

sayakpaul Mar 8, 2025

hanouticelina Mar 10, 2025

sayakpaul Mar 10, 2025

Wauplin Mar 10, 2025 •

edited

Loading

sayakpaul Mar 10, 2025

hanouticelina Mar 10, 2025 •

edited

Loading

Wauplin Mar 10, 2025 •

edited

Loading

sayakpaul Mar 10, 2025

Wauplin Mar 10, 2025

hanouticelina Mar 10, 2025

ydshieh commented Mar 13, 2025

hanouticelina commented Mar 13, 2025 •

edited

Loading

ydshieh commented Mar 13, 2025 •

edited

Loading

hanouticelina commented Mar 14, 2025

A better security-wise style bot GH Action #2914

Are you sure you want to change the base?

A better security-wise style bot GH Action #2914

Conversation

hanouticelina commented Mar 7, 2025

HuggingFaceDocBuilderDev commented Mar 7, 2025

sayakpaul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh Mar 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wauplin Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanouticelina Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Wauplin Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh commented Mar 13, 2025

hanouticelina commented Mar 13, 2025 • edited Loading

ydshieh commented Mar 13, 2025 • edited Loading

hanouticelina commented Mar 14, 2025

ydshieh Mar 14, 2025 •

edited

Loading

Wauplin Mar 10, 2025 •

edited

Loading

hanouticelina Mar 10, 2025 •

edited

Loading

Wauplin Mar 10, 2025 •

edited

Loading

hanouticelina commented Mar 13, 2025 •

edited

Loading

ydshieh commented Mar 13, 2025 •

edited

Loading