- 
                Notifications
    
You must be signed in to change notification settings  - Fork 120
 
TM031 Automated Grading Support
Goal: define and implement a (JavaScript) interface that can run a set of test suites of an assignment against a set of implementations, exporting data as needed.
(Scroll down to #Motivation for the original beginning of the document.)
This is a list of all the things that should go into pyret-lang. Everything else can, and therefore will, exist outside of Pyret.
- Finish check results API.
- Add tests to the 
checker-apibranch (see open pull request: #997). - I was initially waiting on Joe or Ben to comment on the proposed interface before doing so.
 
 - Add tests to the 
 - Get 
shared-gdriveimports working from the command line.- I believe most of the work is already done on the 
httplibbranch. 
 - I believe most of the work is already done on the 
 - Add command-line option to specify a local directory to serve as the source of 
my-gdriveimports.- Haven't done any work for this, but it should be a relatively straightforward addition.
 
 
After all that is done, I envision the usage to look like this:
To evaluate a student implementation, run something like
$ make foo-tests-ta.jarr
$ node foo-tests-ta.jarr --my-gdrive [email protected]/final/ --run-full-report > student_alpha_impl.json
To evaluate a student test, run
$ make [email protected]/sweep/foo-tests.jarr
$ node [email protected]/sweep/foo-tests.jarr --my-gdrive foo-ta-resources/ --run-full-report > student_alpha_test.json
From there, the JSON data can be processed outside of Pyret, and contains all the data one would want in order to assign grades.
The pedagogy that Brown's CS019 and CS173 have adopted involves having students hand in two files: foo-code.arr and foo-tests.arr. The former would be an implementation of some specified functions, and may contain implementation-dependent tests, while the latter would contain implementation-independent tests. Evaluation of the submission involves both checking foo-code.arr for correctness, by running the staff's test suite against it, as well as checking foo-tests.arr for its ability to classify incorrect implementations, by running it against one known-correct implementation ("gold") and some number of known-buggy implementation ("coals").
As a result, for each assignment, there's a lot of (a) iterations of Pyret-running-something that need to happen, and (b) data to be collected.
Suppose student submissions are from Captain Teach, and exporting gives you the following directory structure:
submissions/
├── [email protected]/
│   ├── sweep/
│   │   └── foo-tests.arr
│   └── final/
│       └── foo-code.arr
|       └── foo-tests.arr
│
├── [email protected]/
│   ├── sweep/
│   │   └── foo-tests.arr
│   └── final/
│       └── foo-code.arr
|       └── foo-tests.arr
├── ...
.
.
.
Then, the input could/would be:
- submissions directory: 
submissions/ :: DirectoryIdentifier - the sub-directory: 
"final" :: String - implementation name: 
"foo-code.arr" :: String - test name: 
"foo-tests.arr" :: String - the staff test suite: 
foo-tests-ta.arr :: FileIdentifier - the staff gold: 
foo-gold.arr :: FileIdentifier - the staff coals: 
[foo-coal-1.arr, foo-coal-2.arr] :: List<FileIdentifier>, orcoals/ :: DirectoryIdentifier - timeout: 
x-minutes :: Time 
From there, it should:
- For each 
$student_email, runfoo-tests-ta.arrwhere itsimport my-gdrive("foo-code.arr")resolves tosubmissions/$student_email/final/foo-code.arr - For each 
$student_email, for each$staff_impl = [gold.arr, foo-coal-1.arr, foo-coal-2.arr], runsubmissions/$student_email/final/foo-tests.arr, where itsimport my-gdrive("foo-code.arr")resolves to$staff_impl - Any time running Pyret takes longer than 
x-minutes, halt, reporttimeoutas an error, and move on. - Output organized data
 - Not require all these arguments. E.g. when grading sweeps, we can skip the first step.
 
Optionally, it could:
- Output summarized grade data for each student, based on some specified grading heuristic.
 - Enforce internal consistency: create a "submission" with 
foo-gold.arrandfoo-tests-ta.arr, make sure that "submission" gets a 100% score. - Collect data about external consistency: for each 
$student_email, runsubmissions/$student_email/final/foo-tests.arrwhere itsimport my-gdrive("foo-code.arr")resolves tosubmissions/$student_email/final/foo-code.arr. 
- Check Result API
 - Ability to have 
import my-gdrive("foo-code.arr")resolve to a specific, chosen replacement forfoo-code.arr. - Ability to have 
shared-gdriveimports resolve correctly from the command-line. 
- Awareness of and/or integration with Captain Teach, including awareness of and robustness against common hand-in issues.
 - Web interface. There's some work on the 
gradebranch ofcode.pyret.org, which was able to get the job done this semester. It wasn't great, but it worked.