Skip to content

Conversation

@ischoegl
Copy link
Member

@ischoegl ischoegl commented Dec 6, 2025

Changes proposed in this pull request

  • Add AI statement for contributed code to PR template
  • Add AI section to CONTRIBUTING.md.

The context is that - to the best of my knowledge - the current Cantera codebase has only negligible portions that may have originated from AI. At this junction, I believe it is prudent to formulate a statement on our stance with regards to AI. This PR drafts a 'permitted with disclosure' approach.

Checklist

  • The pull request includes a clear description of this code change
  • Commit messages have short titles and reference relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • Style & formatting of contributed code follows contributing guidelines
  • The pull request is ready for review

AI statement

Draft original, revised by AI for clarity, and final edits by human.

Links

@ischoegl ischoegl requested a review from a team December 6, 2025 00:49
Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for initiating this, @ischoegl. A couple of thoughts:

  • Just as a formatting thing, I think we don't want to use check boxes to indicate "pick one of these three things", because that will affect the checklist summary shown on the PR list (i.e. it will show "6 of 8 tasks").
  • I wonder whether it would be more clear to contributors if we made the distinction between the second and third items as "code completion tools" and "agent-based tools"
  • Categorizing GitHub Copilot is a little bit tricky, as I think that branding now covers both a code completion tool and an agentic tool.
  • On the last option, I think it's too much to ask the contributor to ensure that the AI model didn't spit out any proprietary or license-incompatible code (presumably, this means something like a block of code from a GPL project). While this would be ideal, I don't think there's any way for the user of an LLM to evaluate this.

@ischoegl
Copy link
Member Author

ischoegl commented Dec 6, 2025

Thanks for the feedback, @speth.

I agree on the formatting, and adopted a HTML comment based approach instead. Regarding your comments on licensing, I took another look at material already out there, specifically linuxfoundation.org and Matplotlib. For that reason, I ended up adding language to CONTRIBUTING.md. I'd be happy about feedback: what is there is based on my understanding of the context, but I will freely admit that this was assisted by AI.

@ischoegl ischoegl force-pushed the ai-statement branch 2 times, most recently from 75880bc to f776b0d Compare December 6, 2025 20:58
@codecov
Copy link

codecov bot commented Dec 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.75%. Comparing base (3d86636) to head (96d1408).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2066   +/-   ##
=======================================
  Coverage   76.75%   76.75%           
=======================================
  Files         457      457           
  Lines       53744    53745    +1     
  Branches     9122     9122           
=======================================
+ Hits        41250    41251    +1     
  Misses       9386     9386           
  Partials     3108     3108           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this forward @ischoegl I had some questions and suggestions about the wording. I'm overall in favor of the direction, though!

reviewed and understood by the contributor. Examples: Output from agentic coding
tools and/or substantial refactoring by LLMs (web-based or local).
For additional information on Cantera's AI policy, see https://github.com/Cantera/cantera/blob/main/CONTRIBUTING.md -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you add the section to the link?

Comment on lines +101 to +102
- Do not post output from Large Language Models or similar generative AI as comments on
GitHub or our User Group, as such comments tend to be formulaic and low content.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to make a statement about LLM content. Also, many people use LLMs to help translate to English from their native tongue, and I'm loath to discourage that use if it enables more contributions. I'd prefer to soften this and leave more discretion to individuals (us and contributors).

Suggested change
- Do not post output from Large Language Models or similar generative AI as comments on
GitHub or our User Group, as such comments tend to be formulaic and low content.
- Please be considerate posting content generated by AI models. Make sure that the content is valuable, concise, and expresses the message you want to share.

Or something similar

Comment on lines +109 to +110
the project. To preserve precious core developer capacity, we reserve the right to
rigorously reject seemingly AI generated low-value contributions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd simplify this a bit. Low value is low value regardless of source.

Suggested change
the project. To preserve precious core developer capacity, we reserve the right to
rigorously reject seemingly AI generated low-value contributions.
the project. To preserve core developer capacity, we reserve the right to
reject low-value contributions.

Just taking some input, feeding it to an AI and posting the result is not of value to
the project. To preserve precious core developer capacity, we reserve the right to
rigorously reject seemingly AI generated low-value contributions.
(_adopted from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's some misplaced _ here.

Comment on lines +141 to +142
- **Extensive:** AI suggested the approach, algorithm, or structure; you validated it
works.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little too broad to me. I like to have LLMs suggest approaches and algorithms, especially if I can have a conversation about trade-offs, but I generally do the entire implementation myself. That feels closer to the first option to me than the second. I don't feel very strongly about this though.

Comment on lines +148 to +153
- They have the rights to license their contribution under the project’s BSD-3-Clause
terms.
- They do **not knowingly** submit AI-generated content derived from copyrighted or
license-incompatible sources (examples: GPL code, proprietary libraries).
- All required attribution and copyright notices for contributed material are preserved
as required under BSD-3-Clause.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These statements are nominally all required regardless of whether LLM or human generated the code. Technically, the last one depends on the license of the contributed code, which may or may not be BSD-3-Clause, it could be e.g., MIT and be compatible with our license.

Comment on lines +157 to +163
As generative AI systems may emit material similar to third-party codebases,
contributors must exercise caution. Contributors should exercise reasonable care in:
- verifying that AI suggestions are sufficiently original,
- avoiding verbatim snippets from incompatible licenses, and
- ensuring that any third-party content incorporated intentionally follows proper
attribution and is license-compatible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very similar to the last section, is it a duplicate? How are contributors meant to ensure they've satisfied this section, especially "verbatim snippets from incompatible licenses"? Again, this is true regardless of whether the content is AI generated.

Comment on lines +169 to +172
- request clarification about the origin of AI-assisted code,
- ask the contributor to manually rewrite suspect portions, or
- reject contributions where provenance is unclear or license compatibility cannot be
reasonably established.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to request clarification about the origin of any code. Likewise, regardless of AI status, we may reject code where provenance is unclear if we believe it is licensed improperly or incompatible.

@TimothyEDawson
Copy link
Contributor

I know there are many other groups working on similar statements and policies. In case it's helpful, here's some relevant discussions in progress on one I've been keeping an eye on:

pyOpenSci/pyopensci.github.io#734

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants