User:Krauss/ART

...THIS PLACE IS FOR A ARXIV.ORG ARTICLE COLABORATIVE COSNTRUCTION ...

Introduction
The term template, when used in the context of digital documents and word processing, refers to a "fill-in-the-blank" document that can be completed either by hand or through an automated process. Form letters are typical example: a letter written from a template, frequently used for "spam", by marketing campaigns. It allows a mass production of very similar letters. Another classic example is the "letter frame": any content is filled in the blank of a "header and footer" frame template. It allows a mass production of non-similar content letters, with a "standard frame". Both are examples of (production of) a set of "template derived documents".

Digital documents, when used in a distribution or interchange context, like in the web (web pages as documents), have another demand. Instead of a set of documents, the web designers want a unique "adaptative" dynamic document.

In the web jargon, dynamic document is a kind of document that has been prepared with fresh or customized information, for each individual viewing. It is not static because it changes with: the time (ex. a news content), the user (ex. preferences in a login session), the user interaction (ex. web page game), the context, or all of them.

Resuming, digital documents obtained from a "templating process" have two kinds of popular products: a set of "derived documents" or a unique "dynamical document".

If the context is changed from "document" to "software source code file", we emerge with another similar aplications: template metaprogramming, macro-languages, pre-processors, documentation generators, and others. Intuitivelly we can label all of then "template systems".

This article was motivated by a set of observable problems in the context of what we will more rigorousely naming "template systems",


 * Conceptual problem: there are many things that we can considerate "template systems", but there are a lack of a rigourous (mathematic) definition for it.
 * Diversity problems: ("Babel problems"),
 * observable diversty: at Wikipedia listing and comparing about 60 distint "web template engines", and another more ... not web templates...
 * methodologic diversty: how to select, when use one or another type of solution
 * terminologic diversty: conflics and/or ambiguities in terms as "template engine", "template processor", "style sheet", "template script", "template oriented program", etc.

Another authors, like Parr [1], X [2], Y [3], was working with these problems, and draw a lot of cues and partial solutions. We are "updating" and generalizing these author ideas.

From dynamic documents to templates
There are many ways to express dynamical documents, all of then requer documents peaces and logic peaces. This is a very simple HTML dynamic document:

Hello {$x}!

where  is a input variable.

...

Hooks and backgrounds
... hooks ...

Document designer perseption:
 * : document-background template language
 * : script-background template language

if the language have "ever balanced", there are no final "s?" or "t?" pattern. ....

System characterization
Motivations and methodological background

There are many systems promoted as being template systems. We need an objective criteria to select what is and what is not a template system.

To arrive at our formal definition, we first evaluate a "candidate system" and determine whether it meets a predetermined set of standards based on its input and output characteristics. For any "candidate system" to be characterized as a template system, we have to abstract it as a black box and then use Black box testing and Use cases as our methods of evaluation. This methodology ensures that our formal definition is based solely on observable elements (input and output) and thus does not rely on informally promoted definitions.

A simple template system illustration (like that below) indicates what we need to consider in our black box testing, based on specfic inputs and outputs:
 * The template engine, a process, is the Black box;
 * The template and contents are inputs;
 * The output documents are outputs.

To answer the question, What is and what is not a template system?, we need, for the Black box testing methodology, to fix a set of use case tests. A set of positive results on the tests must reflect the essential properties of a template system.

We use a mathematical system model to specify with precision a formal analytical framework to establish these properties.

Informal Template characterization
A simple template T is a "output document with holes", where holes are place-holders or macro references. A simple and well-accepted example of "hole" is a HTML document with a place-holder:

Hello {$x}!

where  is a input variable. (...)

Separation of concerns necessities (e. g. content from presentation on Web templates) require a low-level separation strategy to isolate script language from output language. Then, template syntax need special care with the "border" between languages, to avoid mixing and to supply scaping forms. There are well defined tags, marks or characteres, named "hooks", that intend to separate (and compatibilize) the two languages.

Refs X, Y, Z suggest os seguintes types of hooks: There are a hi diversity of styles for hook encoding, but a generative grammar (...) show that is a common approach.
 * Script hooks: encloses blocks of developer-supplied program logic. Examples:
 * ASP style
 * PHP style
 * ColdFusion style
 * Sub-template hooks: to fix the frontiers of the sub-template block.
 * PHPlib style
 * XSLT style
 * Expression hooks: to encode scalar variables, sub-template references, or expressions. Examples:
 * ASP style
 * XQuery style
 * XSLT style

A template T is a string that can be split (using "hook criteria") into 2 distinct, not empty, token types:
 * t: output document contiguous fragments.
 * s: script contiguous fragments, like expressions or instructions — simple instructions, or statements, or directives, or blocks of them. Note: a sequence of repeated s, like occurs with XSLT or ColdFusion, is transformed into a unique "contiguous s" block.

The resulted sequence of tokens is not arbitrary, and, theoretically the "contiguous hypothesis" enforce a pattern that avoid validation necessity.

A rigor T is supplied by a generative grammar, $$G = (N, \Sigma, Q, A)$$, with $$N = \{A, X, Y \}$$, $$\Sigma = \{t, s \}$$, $$A$$ the start symbol, and $$Q$$ the following production rules:    $$A := X|Y|s|t$$;   $$X := tsX | ts | tst $$;  $$Y := stY | st | sts$$.

Informal check-list of essential properties
Writing for Wikipedia community, the first author was suggested a little set of "essential properties", translated as:
 * 1) An empty template supplies an empty document: the template system can not add information to the empty template.
 * 2) Content independence on a template without instructions: for usual template languages, an output document file (like using an HTML file as a PHP file) is also a valid template.  On other template languages we can not only copy/paste a document to the template for this proposition, we need to add hooks (and/or headers) to express a "template container", see XSLT. To generalize the idea we can use a kind of "cleaning function" to eliminate script language from the template. This cleaning function can be exemplified with a little PHP code (see fig. 1): PHP source (all template):  Cleaned PHP (only the black portion of fig.1):.
 * 3) The output document is always a "cleaned document", without any trace of the script language. All the "template language code" is processed by the template engine in one step.
 * 4) No information is generated by the template engine.
 * 5) The output documents of templates with only variations on presentation, have also only variations on presentation, independent of the content. If we compare both, they contain the same information.

-- Note: see also FAQ about use of the definitions.

Formal characterization


To formally define, we can modeling the (dataflow) template system as a function and your parameters:
 * Inputs:
 * Template library, L = {T1, T2, … Ti, … TN}
 * There are N templates, identified by the index i. The template T1 is also used as "default root template", when engine need a pre-defined root. For express an input with a single template (N=1) the alternative notation L={T1}={T}, can be used.
 * Content, C $$\subseteq$$ D
 * C is a set (or a sequence) of incoming data values, from the content resource. C is in principle read-only, but engine can assign values for simplify the variable declaration feature. Alternative notation for express elements (attributes) C = {c1, c2, … cj, … cM}
 * D is the data model, formally a (universe) set of all possible contents. It can also specify data structure — when C is not a single set of values, like a sequence (ordered set), or a XML input.
 * Process, P(L, C)
 * It is a black box model for the template engine. Suppose P as a overloading, to express singular case, P(T, C)=P({T},C). Systems that only work as a P(T, C) process — perhaps using implicit sub-templates, but not a template library — can said "lib-less" systems.
 * Process output, R = P(L, C)
 * On web template systems, R is a web document. On (generic) template systems, R is any kind of document.

Essential properties

Let: Essential properties (for all L, La, Lb, Lnop, and C), that template systems satisfy: Notes:
 * ø: an empty content, empty template or empty document.
 * Tnop: a template with no programming operation (without template instructions).
 * Lnop = { T | T is a Tnop}
 * Ta, Tb are "presentation variants".
 * (La, Lb) is a library relationship where all Ta, i from La have a correspondent Tb, j from Lb.
 * Clean(T) is a "clean function" that extract all fragments (with respective hooks) of the script language.
 * For template languages like Haml, that specify the output language into an "alternative syntax", it is necessary that this syntax is reversible to the output language, then, the clean operator also embody this reversion.
 * I(X) is a "information content set" function, like a set of words from a txt converter. If X is a template, the I(X) process starts with Clean(X). If it is a library, it is applied for all library templates. For T on the above example, I(T) = {Hello, Bye}.
 * P' process, returns a "expanded equivalent template". A engine (or manual procedure) that only find and expand sub-template references.
 * See corresponding informal properties.
 * Template systems modeled as complex dataflows, must use simplifier hypothesis to the characterization.
 * Template systems modeling is the first and fundamental step to the characterization. Example: for characterize a documentation generator as a template system, the source code (commented or not) is modeled as (structured) content, and usually templates are internal (not customizable) to the system, the configuration files are modeled as content.
 * It is possible a P composition (pipeline), if the content, C, and output, R, are on the same format, like XML. Example: a  composition   PXSLT(L, PXQuery(T, C))   on Cocoon pipeline (see also XML transformation languages).
 * If no content on template, I(T)=ø, and script language is regular, then T is like a schema. There are cases where the template language is like a "programmable schema specification" (compare Haml with RELAX Compact_syntax).
 * For validate a "candidate system" we need the correspondent Clean and P', as specific checking tools, and I as a generic checking tool.

Important conclusions from definitions:
 * The simplest "substitution string system" can characterized as template system.
 * All template language need clear rules and "syntax facilities" (simple for human and machine) to evaluate the Clear(T) function.
 * Template files from languages like XSLT or XQuery are templates, but files like a Perl script, evaluated by a usual Pperl interpreter, that need output instruction like  and has no hook notation, are not.
 * The process cannot be arbitrary.
 * The possibility of use sub-templates is a feature for template languages, is not a general characteristic.
 * The possibility of sub-template recurrence relation is also a feature (may be formalized by P' but not is modeled to preserve the simplicity of template system definition).

Referential "driven types"
For specify projects and division of tasks, designers and programmers need to adopt objective point of view to see and to organize a template set, that result into a certain dichotomy of "referential types of systems". The types of system strategies are with respective P process, and how engines do decisions about sub-template choices:


 * 1) Script-driven template systems:
 * 2) * Designer's perception: the template engine "select template fragments and fill it with content".
 * 3) * Programmer's perception: all the logic about sub-template choices (typically a if/then or switch/case logic) are explicit into the script.
 * 4) * Examples: SSI (simplest lib-less), XQuery (sophisticated lib-less), Smarty (sophisticated).
 * 5) * Notes: need a "root template", to express the logic for select "first level sub-templates".
 * 6) Content-driven template systems:
 * 7) * Designer's perception: the template engine "select desirable content and frame it with template fragments".
 * 8) * Programmer's perception: part of the logic is implicit (not expressed on script), and is on engine as pre-defined rules, that permit the content choose (select) what sub-template will be used.
 * 9) * Example: XSLT (lib), attribute languages like Zope (lib-less on typical uses).
 * 10) * Notes: to supply the content-driven sub-template reference language feature (content choice by a dynamic context), the engine use a dispatcher or another event-driven algorithms.

There are also mixed types: a script-driven template system augmented with a kind of "match and referrer sub-template by ID" (a simple hash dispatcher can do it) instead direct control. By other hand, a content-driven template system can express traditional logic into a root template, used as a script-driven template.

Architecture characterization
The architecture of template systems, into a client-server reference model, is the main split criteria for group then. There are illustrated (see on links) three groups: Outside server systems, Server-side systems, and Distributed systems. A formal characterization avoid mistakes about systems with cache strategies and remote references.

Using the system notation (defs. for R, T, C, and P above) and adding:

Outside server systems (or "local systems")
 * R ← P (L ,C )

The system act only on local transfer process. The "global transfer process" need two stages:
 * 1) R ← P (L ,C )    Output production, with the system.
 * 2) R ← R  ← R     Publication (using another system or something like manual FTP) and distribution (e.g. HTTP browsing).

Server-side systems  there no flow between nets, all are server-side net (or server machine).
 * R ← P (L ,C )    Output "on-demand production", "on-fly publication" and  receiving (over distribution method).

Or caching on server:
 * 1) R ← Rcache  ← P (L ,C )    Output "on demand production" (first request) and caching.
 * 2) R ← Rcache     (next request), using the cache.
 * Note: system generating meta-templates (like a CMS generating a PHP output) use cache also for the first request.

Distributed systems  All other combinations, with one or more elements, but not all, on sever:
 * R ← P (L ,C )    Generic case.
 * R  ← P (L ,C )    Typical case.

There are also, on distributed systems, the possibility of use a "distributed library", L, where the templates are not at the same resource.

Note: when a system with a outside server engine do also the publication, it is characterized as a distributed system:
 * R ← R  ← P (L ,C )
 * Typical server-side systems when producing with a pre-determined demand, can also use similar strategy, to "cache by publication".

Language characterization
Informally a simple template T is a "document with holes", where holes are place-holders or macro references. A template T, from the template system reference model, is an input it self, or an element from the library.

Template characterization

A template T is a string that can be split (using "hook criteria") into 2 distinct, not empty, token types:
 * t: output document contiguous fragments.
 * s: script contiguous fragments, like expressions or instructions — simple instructions, or statements, or directives, or blocks of them. Note: a sequence of repeated s, like occurs with XSLT or ColdFusion, is transformed into a unique "contiguous s" block.

The resulted sequence of tokens is not arbitrary, and, theoretically the "contiguous hypothesis" enforce a pattern that avoid validation necessity. Technically it is validated by a regular expression:.

Formally it is supplied by a generative grammar, $$G = (N, \Sigma, Q, A)$$, with $$N = \{A, X, Y \}$$, $$\Sigma = \{t, s \}$$, $$A$$ the start symbol, and $$Q$$ the following production rules:    $$A := X|Y|s|t$$;   $$X := tsX | ts | tst $$;  $$Y := stY | st | sts$$.

Notes:
 * About convention for "embed" terminology: if the template T is generated by $$X$$ productions (starts with t), it is a template with "output language embedded with the script", else (starts with s) it is a template with a "script embedded with the output language". Languages like XQuery permits both of the "template embeddeding modes".
 * About point of view: designers see the script fragments as "holes", then, designers always see (by a background effect or viewer/editor choose) a template as a "output language embedded with a script".
 * About Parr definition: this definition is given by a generalization over "Parr split model", that must start with t and not is submitted to system context considerations.

... Somente citar Parr e outros, sem se aprofundar, remetendo a trabalhos futuros...