# PEML

The Programming Exercise Markup Language (PEML) is designed to provide an ultra-human-friendly authoring format for describing automatically graded programming assignments.

## Purpose

The Programming Exercise Markup Language (PEML) is intended to be a simple, easy format for CS and IT instructors of all kinds (college, community college, high school, whatever) to describe programming assignments and activities. We want it to be so easy (and obvious) to use that instructors won't see it as a technological or notational barrier to expressing their assignments.

## TL;DR: Wanna Jump In?

If you want to know more about PEML's design and motivations, read on below. Otherwise, if you just want to dive in, start with these links and come back when you want a deeper view:

Looking for software to add PEML support to your application? Consider these:

## PEML's Goals

We intend for this format to be something that authors of automated grading tools can adopt, so they can provide a very easy, low-energy onboarding path for existing instructors to get programming activities into such tools. As a result, this notation leans heavily on supporting authors and streamlining common cases, even if this may require more work on the part of tool developers--the goal is to make it super easy for authors of programming activities, not to fit into a specific auto-grader or simplify tasks for tool writers.

PEML is designed to achieve the following goals:

1. Minimal learning curve
2. Plain-text file representation
3. Supports references to external resources
4. Directory-structured organization of associated assets
5. Zip file packaging of multi-file assets with description
6. Programming language neutral
7. Minimal technology support

## Basic Format

The remainder of this description is split into two main parts: first, the format for describing key/value pairs (in this section), and second, the data model (on the following pages). We view these two as independent. As indicated in the Why Not YAML?, we view the data described for a programming assignment as directly representable in PEML, YAML, JSON, etc. We also expect that most tools will support either YAML or JSON directly for tooling purposes, and that conversions between PEML <=> YAML or PEML <=> JSON will be easy (In fact, we already have a REST service that will do it for you!). So users who strongly prefer an alternate notation can probably freely use one. However, we strongly believe that a representation optimized for human authoring of structured text consisting primarily of many multi-line text values is warranted to make authoring easier for those who don't think/write in YAML or JSON regularly.

OK, on to the format itself.

PEML uses a plain-text representation for describing exercises. This format is designed to be easy to edit in a plain text editor. It is based on ArchieML, with a few minor modifications.

### Key/Value Pairs

Like YAML, we describe a programming exercise as a series of key/value pairs. Wow, big deal.

In YAML terms, that means the top-level structure of an exercise is a mapping (a hash or dictionary).

Keys are alphanumeric identifiers (starting with a letter, and including underscores). This is more restrictive than YAML, but the more general idea of allowing any representable value to be a key has little utility here and requires more careful parsing and fancier quoting rules that only decrease writability and increase the potential learning curve ... so, PEML uses the simpler notion that is common in many programming language identifier token classes. Note that periods can be used to form dotted names to refer to nested keys, as in ArchieML.

Also as in ArchieML, each key must start at the beginning of a line and be followed by a colon (for single-valued keys; keys that map to collections will instead be either: (a) surrounded by square brackets, or (b) surrounded by curly braces, still following ArchieML).

The corresponding value follows the colon. All values are potentially multi-lined values, and extend up to the beginning of the next property. Any leading/trailing white space is trimmed (including newlines), and multi-line values (i.e., those containing embedded newline(s) after trimming) are automatically terminated with a single newline. As a result blank lines can appear immediately before any key (or before any unquoted value) for visual spacing/chunking as desired without affecting the meaning.

Like ArchieML, PEML is intended to be parsed line by line, with the first non-whitespace sequence on the line determining its role. A simple, line-oriented parsing strategy using a basic state machine should be sufficient, without requiring complex grammar-based parsing strategies.

PEML allows single-line comments using the # character, as in YAML. The # character must be the first (non-whitespace) character on the line (i.e., only whole-line comments are supported), and the corresponding line is completely ignored for the purposes of interpreting the meaning of the PEML. Any line beginning with a # character (and any leading indentation) is interpreted as a comment line, except in quoted values.

Inspired by YAML's document start and end markers, PEML uses a specific comment line ("#---", a pound sign followed by three dashes) to signal the start of a PEML description. This marker is optional for the first PEML description in a text stream, but serves as the delimiter between exercises if multiple PEML descriptions are presented in a single file or stream. The current PEML description continues until the next occurrence of this marker (signaling the beginning of a new exercise), or the end of input.

### Quoting

On occasion, one may end up including text as part of a value that might also be recognizable as the start of a key. You can see this where the word "format:" appears in the example above, as part of the value given for the key "instructions:". In those cases, PEML uses a variant of HereDoc-style syntax, adapted to be more like triple quotes in languages like Python, Scala, R, etc.:

Any key where the colon is immediately followed by three or more repetitions of the same printing character is treated as having a HereDoc-style quoted value, with the provided sequence of repeated characters serving as the delimiter. This is more flexible than triple-quoting, since triple quotes themselves may appear in program fragments for exercises using particular programming languages. This technique allows authors to choose a custom delimiter (as with HereDocs), but allows them to use repeated punctuation symbols to provide a more identifiable/scannable horizontal delimiter around the value, rather than using a custom identifier.

As with HereDocs in many programming languages, the quoted value is terminated by the first subsequent occurrence of a line containing only the delimiter character sequence.

Of course, many programming languages also use # as a comment character. In PEML, # has no special meaning inside a quoted value. As a result, we recommend HereDoc-quoting any values that contain source code from such a programming language, to prevent a program's comment lines from being interpreted as PEML comments.

### Embedding Markdown (and HTML)

Special formatting in the textual description of the exercise can be written using Markdown, which also supports embedding HTML directly in exercise descriptions. So use Markdown or HTML for adding formatting to your text. Plain, unformatted text also works, when no special formatting markup is desired. Here, we specify git's flavor of markdown.

Note: It is easy to consider adding a key for text_format:, specifying markdown as the default but allowing individual users to use other markup formats (such as reStructuredText, AsciiDoc, POD, LaTeX, etc.). In fact, this is already blocked out in the options.text_format field within the data model, although it needs more refinement.

Another Note: Actually, by using pandoc and a PEML parsing wrapper, it should be possible to create a web service that can read a PEML document using a wide variety of text markup formats and render any of them to HTML, including reStructuredText, many dialects of markdown, many wiki markup languages, Docbook, LaTeX, and even Microsoft Word docx files (!). Unfortunately, this doesn't address AsciiDoc :-(. At this point, it is plausible to consider supporting other markup formats along with the options.text_format: key if community effort can generate the necessary support for adopting tools to make use of it (i.e., adding support under the render option of our PEML REST micro-service).

### External Resources

External resources might be referenced in two different ways in PEML. First, for any key/value pair, the value to the right of the colon can be provided by using an external reference, rather than by providing the value directly in the PEML file. Values that are provided externally can be expressed as absolute or relative URLs using the "url(...)" construct (similar to its use in CSS).

While we strongly discourage the use of PDF assignment descriptions, any key value can be farmed out into an external file (or directory of files!). This approach might be most used for source code content stored in separate files, test data stored in separate files, code libraries, and so on.

Here, an absolute URL would specify the web location of the resource, while a relative URL would be resolved relative to the location of the PEML file containing it. As discussed above under Design Goals, if a PEML description is packaged in a zip file so that other resources can be transferred along with it, relative URLs could be used to refer to other contents within the zip file. Similarly, PEML files stored on local disk could refer to local files stored adjacently, and PEML files stored in git repositories or other systems could use the same technique.

Second, it is likely that some embedded Markdown or HTML content (such as the instructions for the exercise) may include HTML tags that use relative or absolute URLs. This may be appropriate for referring to images, downloadable resources accessible to the student, etc. While authors can use absolute URLs in these contexts, it may be preferable in some circumstances to bundle those resources along with the PEML description. By convention, we encourage authors to place such files in a directory called public_html/ that is located alongside the PEML file in the same folder, zip file, or repository. Within Markdown or HTML keys, relative URLs that start with "public_html/..." will then be correctly resolved to these resources. By adhering to this convention, tools can immediately determine that external web-accessible resources must also be provided and also be able to systematically rewrite URLs for user presentation.

Third, it is plausible that feedback generated when processing author-provided reference tests may wish to use similar relative URLs to point to images or other resources included as part of the feedback. Again, any such resources should be placed in the public_html/ folder.

Note: We are considering using a "convention over configuration" approach to saying that when a PEML exercise is bundled with files into a single zip, contents for nested keys can be provided implicitly by placing files under relative pathnames that mirror the key structure. Path segments that correspond to array indices can be taken from the name, title, or language key (in that preferred order) of the dictionary items inside the array, or numeric suffices can be used as path names. This could be particularly useful for adding files in places where file-based content is desirable (src and environment keys, for example, or anywhere a .files nested key appears). Avoiding the requirement that external files co-located with the PEML description be explicitly declared inside the PEML is more desirable in these cases. We need to work up some examples, though, and a more precise description of what the implicit mapping is. A short example: a public_html/ folder located alongside the PEML file can implicitly be interpreted as the set of resources/files attached to the instructions, without requiring a redundant line inside the PEML itself saying: "public_html: url(public_html)"

### Convention Over Configuration

While all of the settings and resources associated with an exercise can be directly embedded inside the PEML file itself, often it is easier to provide file-like content as, well, separate files. While you can explicitly list files as external references using the url(...) operator, it is often easier and simpler to just provide files "alongside" the PEML description as separate files themselves. To support this, PEML uses a convention for naming subdirectories to locate sets of files (which can always be overridden using an explicit url(...) expression as the value for the file set).

For example, the instructions in an exercise might need to refer to external images. The PEML format also provides a public_html key to refer to a set of files that are intended to be public web resources referenced in the instructions. Relative path names for images and links inside the instructions refer to file resources in the public_html file set. While all of these files could be explicitly listed, if no public_html key is provided in the PEML description, then by convention the subdirectory with the same name as this key "public_html/" located alongside/adjacent to this PEML description (for example, packaged in the same zip file, or located in the same directory/folder) is assumed to contain the files in the public_html file set.

Similarly, instead of specifying a src.starter.files file set, the author can just place files in the src/starter subdirectory adjacent to the PEML source (for any key representing a file set with a name ending in ".files", that suffix is omitted from the corresponding directory name). In most cases, we envision that authors will generally provide external resources by convention, rather than explicitly specifying them. In cases where the same set of physical files will be shared across multiple exercises, explicit url(...) locations can be used to refer to shared file sets without a huge effort. Placing these in a PEML fragment and using the :include directive (discussed in the next section) may also be useful.

### Splitting Up PEML Descriptions

In addition to allowing individual key values to be provided in external files, PEML adds an :include directive that allows parts of the PEML description to be included from another external location. While this directive is not strictly necessary, it might be used by some authors to factor out repeated key/value pairs (for the license, author, environment definitions, etc.) so they can be written once and reused across multiple PEML descriptions without repeating the content.

Another use for :include is to allow an author to separate out the definition of the test cases and test environment for an exercise so they are placed in a separate file. This might be useful so that the exercise description itself might be public/accessible, but the test cases or grading criteria applied to the exercise are managed separately and only available to some users.

### String Interpolation with Variable Values

Note: it is possible that some tools may choose not to implement this feature, since it has to do with use of exercises as opposed to simply parsing PEML descriptions.

In some cases, authors may wish to write "parameterized" exercise descriptions where many instances of the exercise can be produced using different parameter values. For example, a parameterized exercise may allow for individualized or unique instances of the exercise to be programmatically generated on demand for each new user/student. To allow for tools that support such features, PEML allows for parameterized contents in instructions, tests, code, etc.

PEML uses mustache-compatible notation for string interpolation, which is also compatible with a number of templating systems. It is analogous to Ruby's #{...} string interpolation syntax, Python's string interpolation syntax, and similar to using braces in string interpolation in Perl. For any exercise, the author can use any desired number of user-defined variables, and any occurrences of {{variable-name}} in the title, instructions, src code assets, or test suites will be substituted when an instance of the exercise is needed.

Since different tools implementing PEML may use different templating implementations to achieve interpolation, extensions or variants of {{mustache}} syntax might be supported, so check your tool's documentation when in doubt.

PEML does not support escaping of literal "{{" and "}}" marking interpolated values (although PEML-supporting tools may support custom notational extensions that allow this, it isn't part of the PEML definition). PEML authors are then advised to ensure that if their instructions or code use {{...}} notation, they keep the variable names used for substitution in the PEML description disjoint from those appearing natively in the text. Where necessary, use the options.interpolation.delimiters key to set the delimiters to something different (similar to the mustache set delimiter feature). The options.interpolation.enable key can be used to enable/disable interpolation if necessary (default is enabled, for tools that support this feature).

### Nested Structure

Beyond these basics, nested properties follow Archie's conventions for dotted keys (nested key structure), object blocks, and arrays. The main differences here compared to ArchieML is the use of multi-line values by default, the use of a HereDoc/triple-quote hybrid rather than a specific end marker with escaping of special characters when a delimiter is necessary, and support for comments.

Nested keys:
JSON equivalent:
Array (list):
JSON equivalent:

As in ArchieML, an array is signified with a key enclosed in square brackets ([...]), and is terminated with a part of empty square brackets ([]). ArchieML allows any trailing empty bracket pairs (or brace pairs) at the end of the file to be omitted, but all closing array delimiters have been included here for clarity.

As in YAML and JSON, structures can be arbitrarily nested in PEML. Array keys that start with a period (.) are used to indicate arrays nested inside other arrays (from ArchieML).

Nested Arrays (list of lists):
JSON equivalent:

When providing arrays, remember that PEML (like ArchieML) uses repeated occurrences of the first key provided for the first array item to mark where each new item starts, so which ever key is provided first should consistently be used to start each new item in the array.

Further details about nested mappings and sequences (and how they are terminated) are available in the ArchieML definition.

## Side by Side

The (very brief) example shown above can be directly represented in JSON (or YAML):

PEML:
JSON equivalent: