Data Model Schema for PEML Exercises
This page presents the data model for PEML. While PEML is its own notation, the data model's structure is also described in the form of a JSON Schema for PEML:
Even though PEML uses its own notation, the data model's structure can easily be mapped into JSON or YAML, and a JSON Schema provides a program-checkable way of expressing the intended data model structure. Snippets from the schema are included below to show the definitions for each key/value field in PEML. PEML fields that have their own substructure are described separately as building blocks in the definitions of recurring model elements.
The main attributes of a PEML exercise description are broken down into three groups:
- author (or
authors) (at least one of
(at least one of
Keys under development
It is important to note that PEML allows the use of additional keys beyond those described here, which may be custom-supported by specific tools. THe list of keys described here is intended to provide a common vocabulary that can be used by many tools for representing programming exercises, to facilitate authoring, importing, and exporting these exercises. Some keys may relate to features or content that is not supported in every tool, but the goal is to streamline the ability of instructors (or "people" in general) to get exercises into (and potentially out of) educational tools.
Required keys must be present in each PEML description. We keep these to a minimum. However, to promote some aspects of interoperability and data management, considering these required elements on every exercise will help authors keep information organized when it is imported into tools.
exercise_id required: string
exercise_id is a globally unique, human-written
identifier created by the exercise
author to uniquely identify this exercise on any system. Any non-empty
sequence of non-whitespace unicode characters can be used. We imagine
that authors might construct these identifiers in similar ways to
programming package names or URLs. For example, the following ID includes
a university and course identifier around an exercise name that is
presumed to be unique within that context. By combining the context
and the name, a globally unique identifier can be formed:
By convention, exercises should start with the
key first, so that multiple exercises can be concatenated together in a
single file but still sliced/parsed as separate exercises easily.
The purpose of the
exercise_id is to serve as an external
identifier that tools (and people) can use to determine whether two
PEML descriptions describe "the same thing". Exercises with different
ids should be considered as distinct entities, rather than as different
versions of the "same thing". The
version key is used to
identify basic version info for long-lived exercises.
When a user externally edits an exercise representation and re-imports
it into a tool, or exports an exercise from one tool to use in another
context, tools can use the
exercise_id to determine whether
imported information is an update to an existing artifact or whether it
defines a new exercise.
title required: string
title is a string name or title used as a human-readable
label for the
exercise. The intent is for this to be the "title" shown to students
in various contexts, either when viewing a single exercise or when
viewing lists of exercises. While there is no specific length limit,
ideally titles should be no more than "one line" in size, because of
the various contexts where they might be displayed. How much of the
title is displayed (or truncated) when collections of exercises are
shown is tool-dependent.
author required: string or object
author tag is used to identify the author of the
exercise (or at least of the PEML exercise description). Recommended
practice is to identify the author by a unique e-mail address. This
field can be used to provide a single e-mail address. Alternatively,
authors can be used to provide an array of
If the optional
license key is provided and the
license.owner is the same as the
author key can be omitted.
In addition to providing an email address, the
tag can use sub-keys to specify both an email address and an author
instructions recommended: string
instructions is where you can provide the exercise's
instructions for the student describing the task to complete. This
is the meat of the "assignment" or "exercise" in many cases. The value
associated with this key is a string, but probably a long one. As with
any key/value pairs in PEML, quoting can
be used. Instructions written
in Markdown (or, as a subset, vanilla HTML) are useful, and some
tools may support other
Note that some educational tools that support PEML may not use the
instructions, or may expect that the instructions are provided through
some means external to the tool. In these cases, the
instructions field can be omitted, although in practice
instructions (describing the assignment) or
suites (describing how a solution would be tested,
either as a top-level key or nested inside one of the
systems supported) are required.
If in a specific situation the exercise's instructions are intended to be accessible through a course management system, an instructor-provided website, or some other mechanism, a URL can be used:
While assignments can be farmed out into external files or web pages in this way, we strongly discourage the use of PDF assignment descriptions as limiting the value/utility of a PEML resource. However, in many cases that may be the fastest/cleanest way for an author to get started, and who then may move more into embedding markup in PEML descriptions on future assignments as time permits.
systems recommended: object
systems key maps to an array of nested dictionaries
describe the programming language(s) or system(s) in which the exercise
can be conducted. The full
definition of what is included under
systems is described
in the code data model.
version recommended: object
Just as in YAML or JSON, a PEML description represents a set of key/value
pairs (a dictionary, hash, or map, also called an "object" in JSON terms),
where keys can map to nested structured values. In PEML, dotted names
represent nesting structure.
version key maps to a nested dictionary (object) that
identifies the version of this exercise that is described.
Many authors may use forms of
version control to manage their sources (which is recommended), so
version can be provided to
capture access paths to an exercise description's version history.
Some tools may be able to deduce the type, repository, id, and location all from a single URL, such as with direct URLs to files on github.com. In such a situation, only the full location URL needs to be specified:
version.type optional: string
version.type captures the kind of version control
system or repository format used for this PEML description's
version history. Examples include: git, mercurial, CVS, Visual
version.id optional: string
version.id is intended as a way to identify the
commit within the repository holding this description's contents.
It could be a tag, a branch name, a version number, a commit hash,
etc. It's exact meaning is dependent on the nature of the version
control system being used.
version.timestamp recommended: string
is a human-readable timestamp indicating the time at which this version
of the exercise was last modified. For lack of a better option, at
the moment this should be an RFC 3339/ISO 8601 UTC timestamp (if you
know of something more user-friendly but equally unambiguous, let
us know!). That format is: YYYY-MM-DDThh:mm:ss.nnn±hh:mm.
We expect that tool-edited exercise descriptions will likely generate
this field's contents automatically. Also, tools will no doubt have to
cope with the fact that authors who externally edit PEML representations
might make multiple edits and re-import an exercise multiple times,
while "forgetting" to manually update the timestamp. The point of the
timestamp is to help authors (and tools) to distinguish between multiple
edits/versions of a single exercise (with one
However, tool developers are encouraged to keep hash fingerprints of
exercise descriptions internally so that when externally edited/modified
PEML descriptions are re-uploaded, they can (a) detect meaningful content
changes, (b) use
version.timestamp values to determine
whether an import is a "newer" revision and inform users if the internally
stored version is newer than what is being imported, and (c) in the
version.timestamp "ties", prompt the author/user
for more information (for example, suspecting failure to update the
timestamp in a changed PEML description for an exercise that has already
version.repository optional: object or string
version.repository is is a string or object
intended to provide an
access path to the repository containing this PEML description's
version history. This is most likely a URL (see the discussion of
Design Goals), although
relative URLs that are resolved relative to the location of this
PEML description can be used.
Repositories are a recurring structure that can appear in multiple places in a PEML description. For details of how a repository can be described, see the common repository substructure definition.
license recommended: object
license key maps to a nested dictionary (object) that
identifies the license that applies to use of the exercise. At a minimum,
license should include an
id identifying which license governs use
of the exercise. Additional information can be provided in the other
optional fields if desired.
license is provided, both the
license.owner are required.
license.id required: string
license.id identifies the license used for this
exercise. The id can be specified by a URL that identifies the
license, or by a name (or abbreviated name) that is in common
use, such as any of the
keywords used by github (an excellent source for potential
license.owner required: string or object
license.owner identifies the person who "owns"
the exercise being described, in the sense of intellectual
property. This could be an individual, a publisher, a corporation,
or whoever. For individual authors, unique e-mail addresses are
preferred as a method of identification, although any string that
unambiguously identifies the copyright holder/licenser for this
work can be used here. A separate "email" sub-key and optional
"name" sub-key can be provided, as in the
license.book optional: string
In a situation where an exercise is part of a textbook or another
copyrighted resource, the
license.book key can
be used to identify the source. In such cases, the "license" for
use of the exercise is presumably the same as the license for
the corresponding book or containing work. For most textbooks,
reuse of resources from the book presumably requires owning a
copy of the book. Such exercises should normally be limited to
use in situations where the textbook is required or optional for
a given pool of users (e.g., students in a course that use that
textbook). The value of the
license.book can either
be a bibliographic-style citation for the book, or a URL
that identifies the book.
Tool developers are expected to be flexible and forgiving in terms of allowing for a wide variety of human-authored variations in specifying books, although tools should be free to "normalize" these to a standard representation internally (and even for presentation).
license.attribution optional: string
license.attribution, if provided, contains an
acknowledgement string that the license owner wishes for users
of the exercise to include when using the work. The
license.attribution should be provided for licenses
that require users to provide attribution (such as Creative
Commons licenses that include the "BY" requirement, or other
licenses that require attribution).
license.acknowledgements optional: string
license.acknowledgements, if provided, contains
all the attributions this exercise makes for licensed
use of other (separate) content. While the
license.attribution contains content for the users
of this exercise to include in derived works, the
license.acknowledgements contains attributions
acknowledging other content this exercise uses.
Both the spelling "acknowledgments" (more common in the U.S.) and "acknowledgements" (more common in Britain and elsewhere) should be supported by tools (as synonyms).
license.permissions recommended: string
Although the license (as identified by the
governs the rights that other users have with this exercise, the
license.permissions field provides a tool-processable
shorthand notation to capture the access permissions granted to
others by the license terms. The value should be one of:
none: "all rights reserved" (for the author(s)).
read: all other users can read/practice the exercise, use it in assignments, etc., but may not create any
fork: In addition to
readpermissions, other users can "fork" this one--that is, use this exercise as a starting point to create a new derived exercise. Forking includes access to all aspects of the exercise except the test suites and the test environment definition. Users are expected to be aware of and obey any licensing restrictions imposed by the license associated with the original exercise.
fork-with-tests: Implies the same access permissions as
forkbut all test suites and test environment details are also included.
contribute: In addition to full
contributeadds the ability to edit and/or import updated versions of the original exercise.
all: Implies full access, which is the permissions level of the author(s) and/or license owner.
tag optional: object
tag key maps to a nested dictionary (object) that
defines categorical/classification/metadata about the exercise. The full
definition of what is included under
tag is described
in the tagging definitions.
src optional: object
src key maps to a nested dictionary (object) that
defines the source code assets associated with the exercise. The full
definition of what is included under
src is described
in the code data model.
suites recommended: object
suites key maps to a nested dictionary (object) that
defines the test suites associated with the exercise. The full
definition of what is included under
suites is described
in the code data model.
difficulty optional: string
difficulty is a subjective rating of question
difficulty on an integer scale from 0 (easiest) to 100 (hardest).
Difficulty is relative to the presumed level of
the target audience intended for the exercise. Typically, an author
tags to indicate the topics/skills that they
expect the user to be familiar with, and also use
to indicate the topics/skills that the intended audience would be
practicing through this exercise. Together, the prerequisites
and the topics for the exercise communicate the author's idea of the
target audience, and difficulty should be interpreted relative to
that target audience.
difficulty can be thought of as a rough
approximation of the percentage of the target audience who
might be unable to complete the exercise successfully. One
imagine that extreme values of 0 or 100 would not typically be used,
since exercises no one can complete (difficulty == 100), or that
everyone can trivially succeed at (difficulty == 0)) may have little
value. Instead, a
difficulty of 50 should be thought of
as "average" difficulty, where an average student may have a 50/50
chance of completing the exercise successfully.
However, don't overthink it, since difficulty ratings are both subjective and relative. The author's intuitive reaction to the question of "how hard or easy is this exercise compared to an 'average' exercise for this target audience" is a better way to quickly assign a difficulty value that can still be of value to others reading the exercise description.
vendor optional: object
vendor key is intended to be a nested map containing
any tool-specific keys or extension properties that individual
educational tools might support, but that are not intended to be
portable across a wide range of tools. The vocabulary and structure for
the contents within this dictionary/map do not have any restrictions
on how they are modeled.
Note: This is a great place to build out keys/properties that identify grading schemes, late policies, submission contraints, options for processing pipelines, etc. It would be nice if there were examples of tool-specific encodings of these kinds of details that might be used here.
Keys Under Development
The keys in this section are still under active development and are not fully defined or implemented. Consider them as ideas for future work.
options optional: object
options key represents option settings that affect
the interpretation of the PEML exercise description itself. These
control things like what markup notation is used in text fields,
whether mustache-style variable substitution is performed, support
for random exercise generation, etc.
options.text_format optional: string
options.interpolation.enable optional: boolean
options.interpolation.delimiters optional: string
options.variables optional: object
options.generator optional: object
options.instances optional: array
origin optional: object
origin key is intended to be a nested map for
exercises that are derived from (or forked from) others. It is
intended to contain information about the original upstream
exercise that was used as the starting point for this one.
origin.derived_from optional: string
exercise_id of the exercise this
one was "forked" from, used when one exercise is created as
a derived work based on another existing exercise.
origin.family optional: string
One exercise might be created from another by changing the form of the question. For example, one exercise might be a code-writing exercise that asks "implement code that solves the following problem". From that, one might create a different style of exercise, such as "here's a buggy implementation for this problem, find and fix the bug". Or yet a third style of exercise: "what output does this code produce on the following input(s)?"
At the same time, all these exercises are related in some way if the underlying task being performed by the artifact is the same, even if the skills the user is exercising are different. We can say these different styles of questions are all part of a related "family", where the relation is the underlying task being achieved by the code artifact at the heart of the question.
Different styles of questions might commonly be created by forking an existing question and creating a derived version using a different style (code-writing, multiple-choice, output prediction, bug finding, bug fixing, etc.). The purpose of this key is to identify the family this exercise belongs to using some kind of unique identifier.
Note: We could use some nice ideas here about how to identify these.