-
-
Notifications
You must be signed in to change notification settings - Fork 395
AI statement for contributed code #2066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d5f260a to
a72473a
Compare
speth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for initiating this, @ischoegl. A couple of thoughts:
- Just as a formatting thing, I think we don't want to use check boxes to indicate "pick one of these three things", because that will affect the checklist summary shown on the PR list (i.e. it will show "6 of 8 tasks").
- I wonder whether it would be more clear to contributors if we made the distinction between the second and third items as "code completion tools" and "agent-based tools"
- Categorizing GitHub Copilot is a little bit tricky, as I think that branding now covers both a code completion tool and an agentic tool.
- On the last option, I think it's too much to ask the contributor to ensure that the AI model didn't spit out any proprietary or license-incompatible code (presumably, this means something like a block of code from a GPL project). While this would be ideal, I don't think there's any way for the user of an LLM to evaluate this.
a72473a to
a1b2068
Compare
|
Thanks for the feedback, @speth. I agree on the formatting, and adopted a HTML comment based approach instead. Regarding your comments on licensing, I took another look at material already out there, specifically linuxfoundation.org and Matplotlib. For that reason, I ended up adding language to |
75880bc to
f776b0d
Compare
f776b0d to
96d1408
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2066 +/- ##
=======================================
Coverage 76.75% 76.75%
=======================================
Files 457 457
Lines 53744 53745 +1
Branches 9122 9122
=======================================
+ Hits 41250 41251 +1
Misses 9386 9386
Partials 3108 3108 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bryanwweber
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing this forward @ischoegl I had some questions and suggestions about the wording. I'm overall in favor of the direction, though!
| reviewed and understood by the contributor. Examples: Output from agentic coding | ||
| tools and/or substantial refactoring by LLMs (web-based or local). | ||
| For additional information on Cantera's AI policy, see https://github.com/Cantera/cantera/blob/main/CONTRIBUTING.md --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you add the section to the link?
| - Do not post output from Large Language Models or similar generative AI as comments on | ||
| GitHub or our User Group, as such comments tend to be formulaic and low content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to make a statement about LLM content. Also, many people use LLMs to help translate to English from their native tongue, and I'm loath to discourage that use if it enables more contributions. I'd prefer to soften this and leave more discretion to individuals (us and contributors).
| - Do not post output from Large Language Models or similar generative AI as comments on | |
| GitHub or our User Group, as such comments tend to be formulaic and low content. | |
| - Please be considerate posting content generated by AI models. Make sure that the content is valuable, concise, and expresses the message you want to share. |
Or something similar
| the project. To preserve precious core developer capacity, we reserve the right to | ||
| rigorously reject seemingly AI generated low-value contributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd simplify this a bit. Low value is low value regardless of source.
| the project. To preserve precious core developer capacity, we reserve the right to | |
| rigorously reject seemingly AI generated low-value contributions. | |
| the project. To preserve core developer capacity, we reserve the right to | |
| reject low-value contributions. |
| Just taking some input, feeding it to an AI and posting the result is not of value to | ||
| the project. To preserve precious core developer capacity, we reserve the right to | ||
| rigorously reject seemingly AI generated low-value contributions. | ||
| (_adopted from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's some misplaced _ here.
| - **Extensive:** AI suggested the approach, algorithm, or structure; you validated it | ||
| works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a little too broad to me. I like to have LLMs suggest approaches and algorithms, especially if I can have a conversation about trade-offs, but I generally do the entire implementation myself. That feels closer to the first option to me than the second. I don't feel very strongly about this though.
| - They have the rights to license their contribution under the project’s BSD-3-Clause | ||
| terms. | ||
| - They do **not knowingly** submit AI-generated content derived from copyrighted or | ||
| license-incompatible sources (examples: GPL code, proprietary libraries). | ||
| - All required attribution and copyright notices for contributed material are preserved | ||
| as required under BSD-3-Clause. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These statements are nominally all required regardless of whether LLM or human generated the code. Technically, the last one depends on the license of the contributed code, which may or may not be BSD-3-Clause, it could be e.g., MIT and be compatible with our license.
| As generative AI systems may emit material similar to third-party codebases, | ||
| contributors must exercise caution. Contributors should exercise reasonable care in: | ||
| - verifying that AI suggestions are sufficiently original, | ||
| - avoiding verbatim snippets from incompatible licenses, and | ||
| - ensuring that any third-party content incorporated intentionally follows proper | ||
| attribution and is license-compatible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very similar to the last section, is it a duplicate? How are contributors meant to ensure they've satisfied this section, especially "verbatim snippets from incompatible licenses"? Again, this is true regardless of whether the content is AI generated.
| - request clarification about the origin of AI-assisted code, | ||
| - ask the contributor to manually rewrite suspect portions, or | ||
| - reject contributions where provenance is unclear or license compatibility cannot be | ||
| reasonably established. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to request clarification about the origin of any code. Likewise, regardless of AI status, we may reject code where provenance is unclear if we believe it is licensed improperly or incompatible.
|
I know there are many other groups working on similar statements and policies. In case it's helpful, here's some relevant discussions in progress on one I've been keeping an eye on: |
Changes proposed in this pull request
CONTRIBUTING.md.The context is that - to the best of my knowledge - the current Cantera codebase has only negligible portions that may have originated from AI. At this junction, I believe it is prudent to formulate a statement on our stance with regards to AI. This PR drafts a 'permitted with disclosure' approach.
Checklist
scons build&scons test) and unit tests address code coverageAI statement
Draft original, revised by AI for clarity, and final edits by human.
Links