PEML Data Model Building Blocks - PEML

This page describes some common recurring substructures present in the PEML data model.

Describing Files

A Single File

Schema:

"file": {
  "oneOf": [
    { "type": "string", "minLength": 1 },
    {
      "type": "object",
      "required": ["content"],
      "properties": {
        "content": {
          "oneOf": [
            { "type": "string" },
            { "type": "array" },
            { "type": "object" }
          ]
        },
        "name": { "type": "string", "minLength": 1 },
        "type": { "type": "string", "minLength": 1 },
        "content_encoding": { "type": "string", "minLength": 1 }
      }
    }
  ]
}

A single file is specified as either a string (consisting of a url(...) that defines the file's name, location, and content) or an object containing a series of nested keys, as described below.

PEML example:

file: url(path/to/some/file.txt)

PEML example:

file.name: icon.gif
file.type: image/gif
file.content_encoding: base64
file.content:-----
R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAQAICTAEAOw==
-----

file.name optional: string

The nested name key provides the name of the file. This key is optional, if the name is implied by other parameters, or if the tool processing the description supplies a default name.

file.type optional: string

The nested type key provides the MIME type for the file, if needed. This may be used in some cases to ensure the tool processes the file as intended, but is optional, since its use is tool-dependent.

file.content required: string or array

The nested content key allows for the content of the file to be inlined as part of the PEML description in situations where that is simpler or more convenient, or where it allows the PEML description to be used as a single self-contained resource without requiring a zip archive containing multiple files. HereDoc-style quote may be useful for inline file content.

In addition, if the >content is being represented inline in the PEML description and actually consists of structured data, the structured data can be represented directly in PEML in the form of a nested hash or an array of records.

file.content_encoding optional: string

The nested content_encoding key describes the content transfer encoding for binary file contents. For example, when inlining the content of a binary file, base64 encoding is preferred. What content encodings beyond base64 are supported is tool-dependent (such as quoted-printable, although non-text encodings used in smtp and json seem unsuitable for inlining files in a text representation, e.g., 7bit, 8bit, or binary).

A Set of Files

Schema:

"file_set": {
  "oneOf": [
    { "type": "string", "minLength": 1 },
    {
      "type": "array",
      "items": { "$ref": "#/definitions/file" },
      "minItems": 1
    }
  ]
}

Like a single file, a file set can be specified in one of two ways. First, a file set can be a single string value, consisting of a url(...) that defines the location of a directory (or just a single file, if the file set contains only a single element). When a URL is specified, the entire subtree is considered to be the set of files intended. If the URL actually refers to a file archive (*.zip, *.jar, *.tar, *.tgz), and the file set is intended to be the contents of the archive instead of that file itself, then use extract(...) in place of url(...).

Alternatively, a file set can be a plain array, where each item in the array is a single file. In most cases, the key name files is used when a file set should be provided.

When providing an array of files, remember that PEML uses repeated occurrences of the first key provided for the first array item to mark when each new item starts, so which ever key is provided first should consistently be used to start each new item in the array.

PEML example:

files: url(path/to/some/directory)

PEML example:

[files]
name: file1.txt
content: ...

name: file2.txt
content: ...
[]

Repositories

While providing remote locations for files is useful, many authors may use forms of version control to manage the files or resources referenced in an exercise (which is recommended). As a result, it may be useful to refer to repository-based locations when a URL is not sufficient. Here, we focus on supporting git repositories, although expansions to support other useful version control repository structures are welcome.

Schema:

"repository": {
  "type": "object",
  "required": ["url"],
  "properties": {
    "url": { "type": "string", "minLength": 1 },
    "path": { "type": "string", "minLength": 1 },
    "branch": { "type": "string", "minLength": 1 },
    "tag": { "type": "string", "minLength": 1 }
  }
}

PEML example:

repository.url: https://github.com/CSSPLICE/CSSPLICE.github.io
repository.branch: production
repository.path: peml/schemas

repository.url required: string

The url key for a repository object provides an access path to the repository. This could be an absolute URL referring to a net-accessible repository, or a relative URL resolved relative to the location of the PEML description.

repository.path optional: string

The path key for a repository object provides a relative path within the repository (relative to the repository's root) to identify a specific subtree or resource within the repository. This value can be encoded directly in the url value, but is provided as an optional separate key for readability/writability.

repository.branch optional: string

The branch key for a repository object names the specific branch being referenced within the repository, if a branch other than the default is desired. This value can be encoded directly in the url value, but is provided as an optional separate key for readability/writability.

repository.tag optional: string

The tag key for a repository object names the specific tag being referenced within the repository, if desired. This value can be encoded directly in the url value, but is provided as an optional separate key for readability/writability.

repository.commit optional: string

The commit key for a repository object names the specific commit being referenced within the repository, if desired. Commits are specified using the same identification scheme supported by the underlying version control system. We strongly recommend using branches or tags where possible, since hard-coding commit identities in PEML is brittle. This value can be encoded directly in the url value, but is provided as an optional separate key for readability/writability.

Environments

While many PEML descriptions provide straightforward exercises, sometimes an exercise may require specific environmental contents, setup, or support. Some tools provide a pre-defined environment for building and executing exercise answers, but others provide different means of specifying, tailoring, or extending the resources available during processing. PEML uses the idea of an "environment" to capture the set of resources available or used during processing of an answer, although which aspects of answer processing are configurable are tool-dependent.

PEML allows exercise authors to specify environmental resources used for four separate phases of answer processing in the following order, although which of these phase(s) are recognized by individual tools are tool-dependent:

The start phase represents the environment or resources given to the learner when they start an exercise, in order for them to be able to write an answer (for example, any supporting libraries, header files, data files, etc., that should be part of the learner's setup in preparing to create an answer). Normally, this environment includes all starting resources other than skeleton source files (which are provided separately via the src key).
The build phase represents the environment used to compile or build a runnable version of the exercise answer.
The run phase represents the environment used to run an exercise answer on behalf of the learner, for example, to run learner-written software tests or to perform interactive debugging executions for the learner to inspect behavior.
The test phase represents the environment used to evaluate/assess the exercise answer, for example, to run exercise-provided reference tests to judge the correctness of an answer. Normally, this environment includes all testing resources other than actual test cases (which are provided separately via the suites key).

A Single Environment

All of the environments follow the same basic structure. Some tools may support only one environment, and may pick the "phase" that best represents that tool's view of how exercises are processed. Other tools may support a separate environment for one or more phases. PEML descriptions can specify any environment using a nested object with the following structure.

Schema:

"environment": {
  "type": "object",
  "properties": {
    "inherits": { "enum": ["start", "build", "run"] },
    "files": { "$ref": "#/definitions/file_set" },
    "repository": { "$ref": "#/definitions/repository" },
    "image": { "type": "string", "minLength": 1 },
    "registry": { "type": "string", "minLength": 1 }
  }
}

PEML example:

[environment.start.files]
name: data-file1.dat
content: ...

name: data-file2.dat
content: ...
[]

environment.run.inherits: start
environment.test.image: codeworkout/python:0.0.1

Note: Are there other missing properties/features of environments that need to be covered here? For example, a command key might be useful to provide for a customizable entry/execution command used for a given phase, for tools that provide it?

inherits optional: string

The inherits key for an environment indicates it inherits all the contents of (or subsumes) another environment (presumably one that appears earlier in the phases).

Here, "inherits" basically means that all of the files (or, more generally, contents) from the parent environment are used as the starting point for this environment, so that additional file(s) are added "on top of" the contents specified in the parent environment. That allows the new environment to add to or overwrite files or resources to define a new environment by extension (inheritance).

If the new environment specifies a container image, it is taken to replace any container image defined in the parent environment (that is, images override/replace each other). File sets or repositories (which also define sets of files, accessed in a different way) are extended/overridden in the new environment by imagining the new environment's files are copied onto a root with the same contents as the parent environment.

files optional: array

The files key for an environment is a file set that identifies the files that make up the environment. In spirit, these are additions to the "root" of the environment used for the corresponding phase, although the exact interpretation is tool-dependent. While it is easy to associate the "root" with the "current working directory" for building or running an answer, some tools may interpret subdirectories within the environment specially--for example, interpreting a ./lib nested folder as representing library dependencies, or something similar.

When files and an image are both specified, the interpretation is tool-dependent (well, all of the environmental handling is tool-dependent anyway). However, we can imagine the for some tools, the files describe the contents of a virtual directory tree that might be supplied as a parameter to the corresponding container image, say through a file system virtual mount or something similar.

repository optional: object

The repository key for an environment is a repository object that identifies the files that make up the environment by pointing to a version control repository (or possibly a path within one). In spirit, these are treated exactly the same as the files, but simply accessed in a different way by going through a version control repository instead. They should also be interpreted using the same ideas: the repository contents represent additions to the "root" of the environment used for the corresponding phase, although the exact interpretation is tool-dependent. Note that if both repository and files are specified, we can imagine them being combined by treating both of them as additions to the same concept of a virtual directory root. Repository contents should be copied first, and file sets second, so that file sets override/add to the repository's contribution. If an image is specified, the repository value is treated in the same manner as any files would be with respect to the image.

image optional: string

The complete über-solution to specifying an operating environment for building or executing student code is to provide a container image, similar to the solution used by gradescope. PEML allows for environments to be specified in the form of container images, although whether this feature is supported is tool-dependent.

The image key for an environment identifies a specific docker image that encapsulates the environment. We choose docker for this, because it's easy and ubiquitous, although if anyone has clear ideas about how we can provide broader support here, please suggest! The docker container image can be specified by using a docker image identifier, optionally including a docker image repository and/or tag (basically, anything that can be used to specify a container image to a docker run command).

registry optional: string

If an image is specified for the environment, this parameter can be used to specify the docker image registry where the image can be pulled.

Note: We need to add keys here to specify authentication information and/or tokens to access non-public repositories and/or registries.

A Set of Environments

In PEML, a set of environments is one or more for the start, build, run, or test phases.

Schema:

"environment_set": {
  "type": "object",
  "properties": {
    "start": { "$ref": "#/definitions/environment" },
    "build": { "$ref": "#/definitions/environment" },
    "run":   { "$ref": "#/definitions/environment" },
    "test":  { "$ref": "#/definitions/environment" }
  }
}

PEML example:

[environment.start.files]
name: data-file1.dat
content: ...

name: data-file2.dat
content: ...
[]

environment.run.inherits: start
environment.test.image: codeworkout/python:0.0.1