-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A better security-wise style bot GH Action #2914
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left some comments.
let comment_user; | ||
if (context.eventName === 'workflow_dispatch') { | ||
comment_user = context.actor; | ||
console.log('Workflow triggered manually by:', comment_user); | ||
} else { | ||
comment_user = context.payload.comment.user.login; | ||
console.log('Workflow triggered by comment from:', comment_user); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this only be reserved for users having admin privileges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's already checked here :
const authorized = permission.permission === 'admin'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(both for workflow_dispatch and issue comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I am not sure about the details, but I kind feel admin
is not guaranteed to every maintainer if we are talking about HF repositories.
For example, for transformers
, I think admin
is not guaranteed to say @molbap @qubvel etc. I am admin
because I need to access transformers repository setting page
.
It might be a good idea to double check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my statement above is correct, what I did so far is a lazy approach by a hard coded list of user name .... (but that is not good neither for security as people may leave the team someday)
const modifiedFiles = pr.map(file => file.filename); | ||
console.log("Modified files:", modifiedFiles); | ||
|
||
const protectedFiles = ["setup.py", "Makefile"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, why do you check this file in the workflow? I mean Makefile
and setup.py
are checked because they directly affect command execution so I wonder why check_doc_toc.py
is also protected here.
quickly checked other HF repos, it seems that this file is not common in other repos (except for transformers
), maybe we can add a parameter additional_protected_files
so that you can pass utils/check_doc_toc.py
as well. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_doc_toc.py
is a part of the make style && make quality
process. So, in the case it was modified, it exposes security risks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking for protected files, what do you think of allowing the process to run only if the @bot /style
comment has been written by a maintainer/reviewer? Since this bot is mostly there to help reviewers with stuck PRs, I don't think it's that much of a problem if we forbid external contributors to run it (we can always do it for them). And on the contrary it makes things more secure + reduces the workflow complexity by always requiring a pair of eyes "from HF" to validate.
To make the logic simple, I'd suggest adding a maintainers
parameter to the GH workflow:
# example for huggingface_hub
jobs:
style:
uses: ./.github/workflows/style-bot-action.yml
with:
python_quality_dependencies: "[quality]"
style_command_type: "style_only"
pr_number: ${{ fromJSON(inputs.pr_number || '0') }}
secrets:
bot_token: ${{ secrets.GITHUB_TOKEN }}
maintainers: hanouticelina wauplin julien-c etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I thought this was already implemented. In huggingface/diffusers#10931, Dhruv from our team had already implemented it.
But accidentally, if the @bot /style
command is triggered somehow we still have the risk. So, I would prefer to have the modification check even if there's complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, @Wauplin reminded me that, if we check for protected files, we also have a couple of scripts in utils/
for huggingface_hub
that we should check too.
if we keep this step, we can check Makefile
, setup.py
and the folders utils/
and scripts/
.
@glegendre01 what do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see! Then no need for extra maintainers
field. Not sure it makes sense to have the concept of "protected files" then. What I fear with this added layer is that:
- if someone makes legit changes to a protected file, we can't use the bot (in
huggingface_hub
, this can happen for files under https://github.com/huggingface/huggingface_hub/tree/main/utils) - it is easy to forget to add a file to this "protected files" list. Having a manual step (i.e. writing
@bot /style
) is similar to the "approve and run workflows" button from GH. I feel that not having a "protected files" list shifts the responsibility on the admins who should feel more engaged/involved to really check what's inside the PR before approving it- related to 2., I feel that the "protected files list" adds a broader scope of logic to think of, meaning more weaknesses. Typically someone that find a way to execute custom code when a script is triggered, without modifying this script. Since the range of existing tools and scripts is large, it's harder to account for all potential cases. Whereas requiring an extra check from admins (by letting them know that they have full responsibility, not half-responsibility) makes it much harder to dodge the system. Or at least, as hard as any change in a test file that is also executed by the CI.
No strong opinion overall if you or @glegendre01 really think this is a requirement to improve security, but IMO we should not do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see your point. If the workflow is aimed to be used across numerous repos, then of course having the admins check the file changes first is easier considering the trade-offs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note that this was only my 2c, I'm not security expert here. I just find that the filter might lower attention without completely removing all the attack scope. And that requiring CI approval for "make style" is quite similar to requiring CI approval for the tests suite. Both can execute malicious code if inadvertently approved. "half responsibility" is not the best way to phrase it but didn't find a better terminology 😬)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requiring CI approval for "make style" is quite similar to requiring CI approval for the tests suite. Both can execute malicious code if inadvertently approved.
agree! 👍
I also agree @Wauplin 's points: I'm afraid of the notion of a list of protected files is not scalable, if I look at the Makefile in transformers, and it's likely to miss some file in the future. Ensure only maintainers could triggering the workflow makes more sense to me. And a nit : |
@ydshieh The
i can also make the |
If we put For the current version, the concern I have is: when Any approach that could avoid pr number not corresponding the desired one would work for me 🙏 (well, I agree it's edge case, so if you want to move on, also OK) |
@ydshieh I addressed your comment in 3fc12e3 I removed the "protected" files checking step as discussed here, the workflow is run only if the @bot /style comment has been written or triggered manually by an admin. Let's wait for @glegendre01 review and opinion on this and then we will be good to merge. |
…com:huggingface/huggingface_hub into fix-security-vulnerability-style-bot-action
following @glegendre01's feedback here (private), this PR addresses the following points:
Instead, I added a validation step at the beginning of the workflow to verify that
setup.py
andMakefile
haven't been modified. Since the following steps only involve runningpip install
with extras and executingmake
commands, these appear to be the main "protected" files.I added the possibility to run the workflow manually in 1831f66. you can see the context being correctly retrieved in this example run. However, if i understand correctly, the context structure is different between the two trigger types. For
workflow_dispatch
, we usecontext.actor
, while forissue_comment
, we usecontext.payload.comment.user.login
.I'm not sure if the manual trigger is useful, I can remove it if not.
the ability to run arbitrary style commands has been removed in favor of hardcorded commands (
make style && make quality
,make style
,make quality
) controlled through a case switch.