PEML Exercise Data Model

 

The Programming Exercise Markup Language (PEML) is designed to provide an ultra-human-friendly authoring format for describing automatically graded programming assignments.

Data Model Schema for PEML Exercises

This page presents the data model for PEML. While PEML is its own notation, the data model's structure is also described in the form of a JSON Schema for PEML:

http://cssplice.github.io/peml/schemas/PEML.json

Even though PEML uses its own notation, the data model's structure can easily be mapped into JSON or YAML, and a JSON Schema provides a program-checkable way of expressing the intended data model structure. Snippets from the schema are included below to show the definitions for each key/value field in PEML. PEML fields that have their own substructure are described separately as building blocks in the definitions of recurring model elements.

The main attributes of a PEML exercise description are broken down into three groups:

Required keys

Recommended keys

Optional keys

It is important to note that PEML allows the use of additional keys beyond those described here, which may be custom-supported by specific tools. THe list of keys described here is intended to provide a common vocabulary that can be used by many tools for representing programming exercises, to facilitate authoring, importing, and exporting these exercises. Some keys may relate to features or content that is not supported in every tool, but the goal is to streamline the ability of instructors (or "people" in general) to get exercises into (and potentially out of) educational tools.

Required Keys

Required keys must be present in each PEML description. We keep these to a minimum. However, to promote some aspects of interoperability and data management, considering these required elements on every exercise will help authors keep information organized when it is imported into tools.

exercise_id required: string

Schema:

"exercise_id": { "type": "string", "minLength": 1, "pattern": "^[^\s]+$" }

The exercise_id is a globally unique, human-written identifier created by the exercise author to uniquely identify this exercise on any system. Any non-empty sequence of non-whitespace unicode characters can be used. We imagine that authors might construct these identifiers in similar ways to programming package names or URLs. For example, the following ID includes a university and course identifier around an exercise name that is presumed to be unique within that context. By combining the context and the name, a globally unique identifier can be formed:

PEML example:

exercise_id: edu.vt.cs.cs1114.palindromes

By convention, exercises should start with the exercise_id key first, so that multiple exercises can be concatenated together in a single file but still sliced/parsed as separate exercises easily.

The purpose of the exercise_id is to serve as an external identifier that tools (and people) can use to determine whether two PEML descriptions describe "the same thing". Exercises with different ids should be considered as distinct entities, rather than as different versions of the "same thing". The version key is used to identify basic version info for long-lived exercises.

When a user externally edits an exercise representation and re-imports it into a tool, or exports an exercise from one tool to use in another context, tools can use the exercise_id to determine whether imported information is an update to an existing artifact or whether it defines a new exercise.

title required: string

Schema:

"title": { "type": "string", "minLength": 1 }

The title is a string name or title used as a human-readable label for the exercise. The intent is for this to be the "title" shown to students in various contexts, either when viewing a single exercise or when viewing lists of exercises. While there is no specific length limit, ideally titles should be no more than "one line" in size, because of the various contexts where they might be displayed. How much of the title is displayed (or truncated) when collections of exercises are shown is tool-dependent.

PEML example:

title: Palindromes (A Simple PEML Example)

version.timestamp required: string

Schema:

"timestamp": {
  "type": "string",
  "minLength": 1,
  "pattern": "^(?:[1-9]\\d{3}-(?:(?:0[1-9]|1[0-2])-(?:0[1-9]|1\\d|2[0-8])|(?:0[13-9]|1[0-2])-(?:29|30)|(?:0[13578]|1[02])-31)|(?:[1-9]\\d(?:0[48]|[2468][048]|[13579][26])|(?:[2468][048]|[13579][26])00)-02-29)T(?:[01]\\d|2[0-3]):[0-5]\\d:[0-5]\\d(?:\\.\\d{1,9})?(?:Z|[+-][01]\\d:[0-5]\\d)$",
}

Just as in YAML or JSON, a PEML description represents a set of key/value pairs (a dictionary, hash, or map, also called an "object" in JSON terms), where keys can map to nested structured values. In PEML, dotted names represent nesting structure. The version key maps to a nested dictionary (or object) containing one required subkey: timestamp. The other fields inside version are optional and described below under recommended keys.

The version.timestamp is a human-readable timestamp indicating the time at which this version of the exercise was last modified. For lack of a better option, at the moment this should be an RFC 3339/ISO 8601 UTC timestamp (if you know of something more user-friendly but equally unambiguous, let us know!). That format is: YYYY-MM-DDThh:mm:ss.nnn±hh:mm.

PEML example:

version.timestamp: 2018-08-25T15:23:22.635-05:00

We expect that tool-edited exercise descriptions will likely generate this field's contents automatically. Also, tools will no doubt have to cope with the fact that authors who externally edit PEML representations might make multiple edits and re-import an exercise multiple times, while "forgetting" to manually update the timestamp. The point of the timestamp is to help authors (and tools) to distinguish between multiple edits/versions of a single exercise (with one exercise_id). However, tool developers are encouraged to keep hash fingerprints of exercise descriptions internally so that when externally edited/modified PEML descriptions are re-uploaded, they can (a) detect meaningful content changes, (b) use version.timestamp values to determine whether an import is a "newer" revision and inform users if the internally stored version is newer than what is being imported, and (c) in the case of version.timestamp "ties", prompt the author/user for more information (for example, suspecting failure to update the timestamp in a changed PEML description for an exercise that has already been imported).

author required: string

Schema:

"author": { "type": "string", "minLength": 1 }

The author tag is used to identify the author of the exercise (or at least of the PEML exercise description). Recommended practice is to identify the author by a unique e-mail address. This field can be used to provide a single e-mail address. Alternatively, the key authors can be used to provide a list of one or more author e-mail addresses to identify a set of multiple authors. This can be a single string of comma-separated e-mail addresses, or the e-mail addresses can actually be provided as a PEML array (list). Both forms have the same meaning.

If the optional license key is provided and the license.owner is the same as the author, then the author key can be omitted.

PEML example:

author: edwards@cs.vt.edu

PEML example:

# Just a comma-separated string
authors: edwards@cs.vt.edu, ayaan@vt.edu


# Alternatively, can use an array structure
[authors]
* edwards@cs.vt.edu
* ayaan@vt.edu
[]

instructions recommended: string

Schema:

"instructions": { "type": "string" }

The instructions is where you can provide the exercise's instructions for the student describing the task to complete. This is the meat of the "assignment" or "exercise" in many cases. The value associated with this key is a string, but probably a long one. As with any key/value pairs in PEML, quoting can be used. Instructions written in Markdown (or, as a subset, vanilla HTML) are useful, and some tools may support other markup formats.

PEML example:

instructions:----------
Write your full assignment instructions here. Inline text instead of
a separate PDF resource is preferred.
...
----------

Note that some educational tools that support PEML may not use the instructions, or may expect that the instructions are provided through some means external to the tool. In these cases, the instructions field can be omitted, although in practice either instructions (describing the assignment) or assets (describing how a solution would be tested) are required.

If in a specific situation the exercise's instructions are intended to be accessible through a course management system, an instructor-provided website, or some other mechanism, a URL can be used:

PEML example:

instructions: url(https://canvas.myschool.edu/courses/12345/assignments/12345)

While assignments can be farmed out into external files in this way, we strongly discourage the use of PDF assignment descriptions as limiting the value/utility of a PEML resource. However, in many cases that may be the fastest/cleanest way for an author to get started, and who then may move more into embedding markup in PEML descriptions on future assignments as time permits.

assets recommended: object

The assets key maps to a nested dictionary (object) that defines the source code assets associated with the exercise. The full definition of what is included under assets is described in the code data model.

tags recommended: object

The tags key maps to a nested dictionary (object) that defines categorical/classification/metadata about the exercise. The full definition of what is included under tags is described in the building blocks definitions.

version recommended: object

Schema:

"version": { "type": "object" }

The version key maps to a nested dictionary (object) that identifies the version of this exercise that is described. At a minimum, the version should include a timestamp identifying when this description was last updated. However, many authors may use forms of version control to manage their sources (which is recommended), so additional fields under version can be provided to capture access paths to an exercise description's version history.

PEML example:

version.timestamp: 2018-08-25T15:23:22.635-05:00
version.type: git
version.id: 2ab880a
version.repository: url(https://github.com/CSSPLICE/peml.git)
version.location: url(test/peml/palindrome.peml)

Some tools may be able to deduce the type, repository, id, and location all from a single URL, such as with direct URLs to files on github.com. In such a situation, only the full location URL needs to be specified:

PEML example:

version.timestamp: 2018-08-25T15:23:22.635-05:00
version.location: url(https://github.com/CSSPLICE/peml/blob/master/test/peml/palindrome.peml)

version.type optional: string

Schema:

"id": { "type": "string", "minLength": 1 }

The version.type captures the kind of version control system or repository format used for this PEML description's version history. Examples include: git, mercurial, CVS, Visual SourceSafe, etc.

PEML example:

version.type: git

version.id optional: string

Schema:

"type": { "type": "string", "minLength": 1 }

The version.id is intended as a way to identify the commit within the repository holding this description's contents. It could be a tag, a branch name, a version number, a commit hash, etc. It's exact meaning is dependent on the nature of the version control system being used.

PEML example:

version.id: 2ab880a

version.repository optional: string

Schema:

"repository": { "type": "string", "minLength": 1 }

The version.repository is intended to provide an access path to the repository containing this PEML description's version history. This is most likely a URL (see the discussion of URLs in Design Goals), although relative URLs that are resolved relative to the location of this PEML description can be used.

PEML example:

version.repository: url(https://github.com/CSSPLICE/peml.git)

version.location optional: string

Schema:

"location": { "type": "string", "minLength": 1 }

The version.location is intended to point to the specific location of this PEML description inside its repository. It could be, for example, a relative URL. Since this key is intended to provide a location within the version control repository, if a relative URL is provided and a version.repository is also provided, the version.location should be resolved relative to the repository, rather than relative to the PEML description.

PEML example:

version.location: url(test/peml/palindrome.peml)

license recommended: object

The license key maps to a nested dictionary (object) that identifies the license that applies to use of the exercise. At a minimum, the license should include an id identifying which license governs use of the exercise. Additional information can be provided in the other optional fields if desired.

If the license is provided, both the license.id and the license.owner are required.

license.id required: string

Schema:

"id": { "type": "string", "minLength": 1 }

The license.id identifies the license used for this exercise. The id can be specified by a URL that identifies the license, or by a name (or abbreviated name) that is in common use, such as any of the license keywords used by github (an excellent source for potential license choices).

PEML example:

license.id: cc-sa-4.0

license.owner required: string

Schema:

"owner": { "type": "string", "minLength": 1 }

The license.owner identifies the person who "owns" the exercise being described, in the sense of intellectual property. This could be an individual, a publisher, a corporation, or whoever. For individual authors, unique e-mail addresses are preferred as a method of identification, although any string that unambiguously identifies the copyright holder/licenser for this work can be used here.

PEML example:

license.owner: edwards@cs.vt.edu

license.book optional: string

Schema:

"book": { "type": "string", "minLength": 1 }

In a situation where an exercise is part of a textbook or another copyrighted resource, the license.book key can be used to identify the source. In such cases, the "license" for use of the exercise is presumably the same as the license for the corresponding book or containing work. For most textbooks, reuse of resources from the book presumably requires owning a copy of the book. Such exercises should normally be limited to use in situations where the textbook is required or optional for a given pool of users (e.g., students in a course that use that textbook). The value of the license.book can either be a bibliographic-style citation for the book, or a URL that identifies the book.

PEML example:

version.book:
Cay S. Hortsmann, _Big Java: Early Objects, 5th Edition_,
Wiley, 2013. ISBN: 9788126554010

Tool developers are expected to be flexible and forgiving in terms of allowing for a wide variety of human-authored variations in specifying books, although tools should be free to "normalize" these to a standard representation internally (and even for presentation).

license.attribution optional: string

Schema:

"attribution": { "type": "string", "minLength": 1 }

The license.attribution, if provided, contains an acknowledgement string that the license owner wishes for users of the exercise to include when using the work. The license.attribution should be provided for licenses that require users to provide attribution (such as Creative Commons licenses that include the "BY" requirement, or other licenses that require attribution).

PEML example:

license.attribution:
"Palindromes (A Simple PEML Example)" by edwards@cs.vt.edu is licensed
under <a href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a>

license.acknowledgements optional: string

Schema:

"acknowledgements": { "type": "string" }

The license.acknowledgements, if provided, contains all the attributions this exercise makes for licensed use of other (separate) content. While the license.attribution contains content for the users of this exercise to include in derived works, the license.acknowledgements contains attributions acknowledging other content this exercise uses.

Both the spelling "acknowledgments" (more common in the U.S.) and "acknowledgements" (more common in Britain and elsewhere) should be supported by tools (as synonyms).

license.permissions recommended: string

Schema:

"permissions": { "enum": [
  "none",
  "read",
  "fork",
  "fork-with-tests",
  "contribute",
  "all"
] }

Although the license (as identified by the license.id governs the rights that other users have with this exercise, the license.permissions field provides a tool-processable shorthand notation to capture the access permissions granted to others by the license terms. The value should be one of:

  • none: "all rights reserved" (for the author(s)).

  • read: all other users can read/practice the exercise, use it in assignments, etc., but may not create any

  • fork: In addition to read permissions, other users can "fork" this one--that is, use this exercise as a starting point to create a new derived exercise. Forking includes access to all aspects of the exercise except the test suites and the test environment definition. Users are expected to be aware of and obey any licensing restrictions imposed by the license associated with the original exercise.

  • fork-with-tests: Implies the same access permissions as fork but all test suites and test environment details are also included.

  • contribute: In addition to full fork-with-tests access, contribute adds the ability to edit and/or import updated versions of the original exercise.

  • all: Implies full access, which is the permissions level of the author(s) and/or license owner.

Optional Keys

difficulty optional: string

Schema:

"difficulty": { "type": "integer", "minimum": 0, "maximum": 100 }

The difficulty is a subjective rating of question difficulty on an integer scale from 0 (easiest) to 100 (hardest). Difficulty is relative to the presumed level of the target audience intended for the exercise. Typically, an author might use tags to indicate the topics/skills that they expect the user to be familiar with, and also use tags to indicate the topics/skills that the intended audience would be practicing through this exercise. Together, the prerequisites and the topics for the exercise communicate the author's idea of the target audience, and difficulty should be interpreted relative to that target audience.

Intuitively, the difficulty can be thought of as a rough approximation of the percentage of the target audience who might be unable to complete the exercise successfully. One would normally imagine that extreme values of 0 or 100 would not typically be used, since exercises no one can complete (difficulty == 100), or that everyone can trivially succeed at (difficulty == 0)) may have little value. Instead, a difficulty of 50 should be thought of as "average" difficulty, where an average student may have a 50/50 chance of completing the exercise successfully.

However, don't overthink it, since difficulty ratings are both subjective and relative. The author's intuitive reaction to the question of "how hard or easy is this exercise compared to an 'average' exercise for this target audience" is a better way to quickly assign a difficulty value that can still be of value to others reading the exercise description.

PEML example:

difficulty: 60

options optional: object

Schema:

"options": { "type": "object" }

The options is ... options.text_format options.interpolate options.variables options.generator options.instances

origin optional: object

Schema:

"origin": { "type": "object" }

The origin is ... origin.derived_from origin.family

vendor optional: object

Schema:

"vendor": { "type": "object" }

The vendor key is intended to be a nested map containing any tool-specific keys or extension properties that individual educational tools might support, but that are not intended to be portable across a wide range of tools. The vocabulary and structure for the contents within this dictionary/map do not have any restrictions on how they are modeled.