dawn

day

dusk

night

now

none

XProc From Above

Brief tutorial overviews

From a height the broader outlines come into view.

XProc from Orbit

XProc is a data processing technology for digital data. While it is an XML-based technology using XML syntax, it can work with many kinds of data, including common text-based formats such as JSON.

As a language, XProc describes pipelines. A pipeline combines a sequence or set of processes and applies them to specified inputs ("sources") to create outputs ("results").

Use XProc to build and support complex workflows in document production, data conversion, and information exchange.

Pipelines and steps
How to write a pipeline

An XProc pipeline takes the form of an arrangement of steps.

We say 'arrangement' here since steps can accommodate as many inputs and outputs as needed, connecting together.

XProc From 40,000 Feet
I/O, ports and documents
Ports, steps, sources and results

Ports go in only one direction: they are for input or output, never both. One of the input ports will be designated as primary, while others are secondary. Similarly, one and only one of the output ports may be designated as primary. Not all steps have secondary ports, but some steps make little sense without them.

The conventional name for the primary input port is source. The conventional name for the primary output port is result. These names correspond with the uses of these terms in XSLT and XQuery.

It is sometimes useful for a step to have output but no input (like p:load), or input with no output (like p:sink). And not all steps produce modifications of inputs among their outputs. For example, p:directory-list has no input but produces XML on its result port listing the contents of a file system directory.

The source and result ports can carry sequences when they are defined as such on their steps - and when permitted to be sequences they may also be empty, with no documents bound to them.

While the the primary ports will ordinarily be named source and result, the names of secondary ports may be less generic, to indicate what roles they play for their steps. For example, validation steps all have a schema secondary input port, for their schemas; the p:insert step has an insertion port for the data to be inserted, and so forth.

Steps with implicit port connections

XProc syntax can be fairly concise and clean - at least, as XML-based formats go - because it has sensible fallback rules and some nice ways of keeping syntax simple.

One important example: as long as steps in your pipeline are to be applied in sequence, their connections do not have to be shown. It has been suggested they snap together.

The XProc feature in play here is called the default readable port. The concept is simple. Any step with a primary input port that is not connected explicitly, will be bound to the primary output of the immediately preceding step.

This works well, with the caveat that steps that have no input ports don't connect like this, even when given in sequence - and steps with no output ports can't be connected as inputs at all. Know your steps.

Knowing your formats

XProc comes with native support for XML and JSON reading and parsing, for plain text inputs, and for inputs defined with regular grammars using Invisible XML (ixml).

All of these types of data, and others, can be passed from one step to another, as long as both steps can accommodate the given format or media type.

Powerful embedded languages

XPath, XSLT and XQuery all play well with XProc, which is designed first and foremost to accommodate these kindred technologies.

XProc steps can either invoke or embed instances of these declarative processing languages, or others.

Options

In addition to inputs and outputs (connection ports), steps can also have options.

These provide runtime configurations when invoking steps and pipelines.

For some steps, certain options are required, for example to designate which nodes to delete on a p:delete step (using the match option).

Values assigned to options can be simple (string value flags) or complex (such as map objects, or expressions to be evaluated).

Its ports and options together provide an interface for using a step.

Options can be set on steps using abbreviated syntax (attributes) or long syntax (p:with-option)

Foundations of XProc

The more you know, the better you feel.

Minimal XProc

If your skills include XSLT, this might be all you ever need in XProc:

Take another lesson on Day Two - learn to write a pipeline and use it as a step

Three everyday utility steps