Still writing this! ...
We expect there to be a wide variety of test specification formats, and they may not all be supported by all automated grading tools. Many may be programming-language-specific, and only suitable for exercises where solutions are written in that one language. However, we feel that supporting some variety here is much better than forcing everyone to conform to a single format, and limiting assessment actions (and feedback possibilities) to that one style only.
Although we're still missing a lot of the details here, we expect that this section will include (but not be limited to) at least the following:
Stdin/stdout text segments, for testing main programs that read/write text. This will probably include ClodeCoder-style regular expression support plus some Web-CAT-inspired normalization options so that less-brittle matching can be supported with little effort.
Tables of input values and expected results, which are commonly used for assignments that are more functional in nature, rather than focused on objects or whole programs. In some ways, stdin/stdout testing is a special case of flexible input/output tables (just with larger, string-only values).
Native XUnit-style executable software tests (such as JUnit, CxxTest, NUnit, pyunit, etc.). These would be language-specific, but are already used widely enough, and are expressive enough, that they play an important role.
A custom testing DSL called PEMLtest designed to express a wide variety of software tests in a more concise, lighter weight way than XUnit programming. This DSL supports writing tests for multiple programming languages, and test specs can be translated down to executable XUnit-style tests in the target language.
Generator-based strategies where inputs of some form and expected outputs are programmatically generated fresh for every attempt.
Reference solution strategies where input comes from one of the approaches covered another way, but expected output is produced by running a reference solution provided by the exercise author, instead of being described as part of the tests.
We believe that many such test expression formats can all be represented as a sequence (array) of hashes, and should naturally fit into a data model that is readily expressable in PEML, YAML, or JSON.
We also expect that a variety of test case attributes (visibility, weight, scoring, etc.) may cleanly fit in as additional keys within a hash representing a test, and thus may be applicable to many of the formats listed above.
OK, more to come later! ...