Athena: Implement verification tests for basic approach of module_text_llm#113
Merged
Conversation
…e files in test/utils
…atted code to pep8 style
…ript for each module
Athena: Implemented verification tests for basic approach of the text module
maximiliansoelch
requested changes
May 16, 2025
Member
maximiliansoelch
left a comment
There was a problem hiding this comment.
Running the tests locally fails.
The GH test action only succeeds as it does not execute any tests.
Please update the GH action file accordingly. It would be also great to have the action log out the number of executed tests including passed and failed tests.
…t markers, ensuring only mock tests run in the pipeline and avoiding configuration issues with real tests.
…de-real so real tests can be executed when needed
maximiliansoelch
approved these changes
May 30, 2025
Member
maximiliansoelch
left a comment
There was a problem hiding this comment.
This PR adds real E2E tests against OpenAI, which can be run locally.
There are multiple improvements for future PRs:
- test files should also be included in the lint config, right now all test files have a lot of lint issues
- Instead of the unintuitive working directory changes in the test_modules.py script (see review comment), we should figure out a way to properly setup the tests
Athena: Implemented verification tests for basic approach of the text moduleAthena: Implement verification tests for basic approach of module_text_llm
alexjoham
pushed a commit
that referenced
this pull request
Jul 21, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
This PR improves the test coverage for the text feedback generation system by adding test cases that verify the LLM's ability to detect and provide feedback for simple text exercises. It is a stacked PR that extends the mock tests for the same module. Mock tests are executed in the pipeline and they pass. Real tests can be executed locally from module_text_llm dir and with using the venv of module_text_llm.
Description
Added three test cases that verify feedback generation for different types of programming exercises:
Steps for Testing
Testserver States
Note
These badges show the state of the test servers.
Green = Currently available, Red = Currently locked
Click on the badges to get to the test servers.
Screenshots