X-Machine Testing

The (Stream) X-Machine Testing Methodology is a complete functional testing approach to software- and hardware testing that exploits the scalability of the Stream X-Machine model of computation. Using this methodology, it is likely to identify a finite test-set that exhaustively determines whether the tested system's implementation matches its specification. This goal is achieved by a divide-and-conquer approach, in which the design is decomposed by refinement into a collection of Stream X-Machines, which are implemented as separate modules, then tested bottom-up. At each integration stage, the testing method guarantees that the tested components are correctly integrated.

The methodology overcomes formal undecidability limitations by requiring that certain design for test principles are followed during specification and implementation. The resulting scalability means that practical software and hardware systems consisting of hundreds of thousands of states and millions of transitions have been tested successfully.

Motivation
Much software testing is merely hopeful, seeking to exercise the software system in various ways to see whether any faults can be detected. Testing may indeed reveal some faults, but can never guarantee that the system is correct, once testing is over. Functional testing methods seek to improve on this situation, by developing a formal specification describing the intended behaviour of the system, against which the implementation is later tested (a kind of conformance testing). The specification can be validated against the user-requirements and later proven to be consistent and complete by mathematical reasoning (eliminating any logical design flaws). Complete functional testing methods exploit the specification systematically, generating test-sets which exercise the implemented software system exhaustively, to determine whether it conforms to the specification. In particular: This level of testing can be difficult to achieve, since software systems are extremely complex, with hundreds of thousands of states and millions of transitions. What is needed is a way of breaking down the specification and testing problem into parts which can be addressed separately.
 * Full positive testing: confirms that all desired behaviour is present in the system;
 * Full negative testing: confirms that no unintended behaviour is present in the system.

Scalable, Abstract Specifications
Mike Holcombe first proposed using Samuel Eilenberg's theoretical X-machine model as the basis for software specification in the late 1980s. This is because the model cleanly separates the control flow of a system from the processing carried out by the system. At a given level of abstraction, the system can be viewed as a simple finite-state machine consisting of a few states and transitions. The more complex processing is delegated to the processing functions on the transitions, which modify the underlying fundamental data type X. Later, each processing function may be separately exposed and characterized by another X-machine, modelling the behaviour of that system operation.

This supports a divide-and-conquer approach, in which the overall system architecture is specified first, then each major system operation is specified next, followed by subroutines, and so forth. At each step, the level of complexity is manageable, because of the independence of each layer. In particular, it is easy for software engineers to validate the simple finite-state machines against user requirements.

Incrementally Testable Specifications
Gilbert Laycock first proposed a particular kind of X-machine, the Stream X-Machine, as the basis for the testing method. The advantage of this variant was the way in which testing could be controlled. In a Stream X-Machine, the fundamental data type has a particular form: X = Out* × Mem × In*, where In* is an input stream, Out* is an output stream, and Mem is the internal memory. The transitions of a Stream X-Machine are labelled with processing functions of the form φ: Mem × In → Out × Mem, that is, they consume one input from the input stream, possibly modify memory, and produce one output on the output stream (see the associated article for more details).

The benefits for testing are that software systems designed in this way are observable at each step. For each input, the machine takes one step, producing an output, such that input/output pairs may be matched exactly. This contrasts with other approaches in which the system runs to completion (taking multiple steps) before any observation is made. Furthermore, layered Stream X-Machines offer a convenient abstraction. At each level, the tester may forget about the details of the processing functions and consider the (sub-)system just as a simple finite-state machine. Powerful methods exist for testing systems that conform to finite-state specifications, such as Chow's W-method.

Specification Method
When following the (Stream) X-Machine methodology, the first stage is to identify the various types of data to be processed. For example, a word processor will use basic types Character (keyboard input), Position (mouse cursor position) and Command (mouse or menu command). There may be other constructed types, such as Text ::= Character* (a sequence of characters), Selection ::= Position × Position (the start and end of the selection) and Document ::= Text × Selection × Boolean (the text, a possible selection, and a flag to signal if the document has been modified).

High-Level Specification
The top-level specification is a Stream X-Machine describing the main user interaction with the system. For example, the word processor will exist in a number of states, in which keystrokes and commands will have different effects. Suppose that this word processor exists in the states {Writing, Selecting, Filing, Editing}. We expect the word processor to start in the initial Writing state, but to move to the Selecting state if either the mouse is dragged, or the shift-key is held down. Once the selection is established, it should return to the Writing state. Likewise, if a menu option is chosen, this should enter the Editing or Filing state. In these states, certain keystrokes may have different meanings. The word processor eventually returns to the Writing state, when any menu command has finished. This state machine is designed and labelled informally with the various actions that cause it to change state.

The input, memory and output types for the top-level machine are now formalised. Suppose that the memory type of the simple word processor is the type Document defined above. This treats a document as a text string, with two positions marking a possible selection and a flag to indicate modification since the last save-command. A more complex word processor might support undoable editing, with a sequence of document states: Document ::= (Text × Selection)*, which are collapsed to one document every time a save-command is performed.

Suppose that the input type for the machine is: In ::= Command × Character × Position. This recognises that every interaction could be a simple character insertion, a menu command or a cursor placement. Any given interaction is a 3-tuple, but some places may be empty. For example, (Insert, 'a', ε) would represent typing the character 'a'; while (Position, ε, 32) would mean placing the cursor between characters 32 and 33; and (Select, ε, 32) would mean selecting the text between the current cursor position and the place between characters 32 and 33.

The output type for the machine is designed so that it is possible to determine from the output which processing function was executed, in response to a given input. This relates to the property of output distinguishability, described below.

Low-Level Specification
If a system is complex, then it will most likely be decomposed into several Stream X-Machines. The most common kind of refinement is to take each of the major processing functions (which were the labels on the high-level machine) and treat these as separate Stream X-Machines. In this case, the input, memory and output types for the low-level machines will be different from those defined for the high-level machine. Either, this is treated as an expansion of the data sets used at the high level, or there is a translation from more abstract data sets at the high level into more detailed data sets at the lower level. For example, a command Select at the high level could be decomposed into three events: MouseDown, MouseMove, MouseUp at the lower level.

Ipate and Holcombe mention several kinds of refinement, including functional refinement, in which the behaviour of the processing functions is elaborated in more detail, and state refinement, in which a simple state-space is partitioned into a more complex state-space. Ipate proves these two kinds of refinement to be eventually equivalent

Systems are otherwise specified down to the level at which the designer is prepared to trust the primitive operations supported by the implementation environment. It is also possible to test small units exhaustively by other testing methods.

Design-For-Test Conditions
The (Stream) X-Machine methodology requires the designer to observe certain design for test conditions. These are typically not too difficult to satisfy. For each Stream X-Machine in the specification, we must obtain:


 * Minimal Specification: The specification must be a minimal finite-state machine. This means that the state machine should not contain redundant states, that is, states in which the observable transition behaviour is identical to that in some other state.
 * Deterministic Specification: For each state of the machine, at most one of the processing functions φ should be enabled for the current memory and next input value. This ensures that the required behaviour to be tested is predictable.
 * Test Completeness: Each processing function φ must be executable for at least one input, with respect to all memory states. This ensures that there are no deadlocks, where the machine is blocked by the current state of memory.  To ensure test completeness, the domain of a function φ may be extended with special test inputs that are only used during testing.
 * Output Distinguishability: It must be possible to distinguish which processing function was invoked from its output value alone, for all memory-input pairs. This ensures that the state machine can be decoupled from the processing functions.  To ensure output distinguishability, the codomain of a function φ may be extended with special test outputs that are only relevant during testing.

A minimal machine is the machine with the fewest states and transitions for some given behaviour. Keeping the specification minimal simply ensures that the test sets are as small as possible. A deterministic machine is required for systems that are predictable. Otherwise, an implementation could make an arbitrary choice regarding which transition was taken. Some recent work has relaxed this assumption to allow testing of non-deterministic machines.

Test completeness is needed to ensure that the implementation is testable within tractable time. For example, if a system has distant, or hard-to-reach states that are only entered after memory has reached a certain limiting value, then special test inputs should be added to allow memory to be bypassed, forcing the state machine into the distant state. This allows all (abstract) states to be covered quickly during testing. Output distinguishability is the key property supporting the scalable testing method. It allows the tester to treat the processing functions φ as simple labels, whose detailed behaviour may be safely ignored, while testing the state machine of the next integration layer. The unique outputs are witness values, which guarantee that a particular function was invoked.

Testing Method
The (Stream) X-Machine Testing Method assumes that both the design and the implementation can be considered as (a collection of) Stream X-Machines. For each pair of corresponding machines (Spec, Imp), the purpose of testing is to determine whether the behaviour of Imp, the machine of the implementation, exactly matches the behaviour of Spec, the machine of the specification. Note that Imp need not be a minimal machine - it may have more states and transitions than Spec and still behave in an identical way.

To test all behaviours, it must be possible to drive a machine into all of its states, then attempt all possible transitions (those which should succeed, and those which should be blocked) to achieve full positive and negative testing (see above). For transitions which succeed, the destination state must also be verified. Note that if Spec and Imp have the same number of states, the above describes the smallest test-set that achieves the objective. If Imp has more states and transitions than Spec, longer test sequences are needed to guarantee that redundant states in Imp also behave as expected.

Testing all States
The basis for the test generation strategy is Tsun S. Chow's W-Method for testing finite-state automata, chosen because it supports the testing of redundant implementations. Chow's method assumes simple finite-state machines with observable inputs and outputs, but no directly observable states. To map onto Chow's formalism, the functions φi on the transitions of the Stream X-Machines are treated simply as labels (inputs, in Chow's terms) and the distinguishing outputs are used directly. (Later, a mapping from real inputs and memory (mem, in) is chosen to trigger each function φ, according to its domain).

To identify specific states in Imp, Chow chooses a characterization set, W, the smallest set of test sequences that uniquely characterizes each state in Spec. That is, when starting in a given state, exercising the sequences in W should yield at least one observable difference, compared to starting in any other state.

To reach each state expected in Spec, the tester constructs the state cover, C, the smallest set of test sequences that reaches every state. This can be constructed by automatic breadth-first exploration of Spec. The test-set which validates all the states of a minimal Imp is then: $$C \otimes W$$, where $$\otimes$$ denotes the concatenated product of the two sets. For example, if C = {$⟨a⟩$, $⟨b⟩$} and W = {$⟨c⟩$, $⟨d⟩$}, then $$C \otimes W = \{\langle ac \rangle, \langle ad \rangle, \langle bc \rangle, \langle bd \rangle \}$$.

Testing all Transitions
The above test-set determines whether a minimal Imp has the same states as Spec. To determine whether a minimal Imp also has the same transition behaviour as Spec, the tester constructs the transition cover, K. This is the smallest set of test sequences that reaches every state and then attempts every possible transition once, from that state. Now, the input alphabet consists of (the labels of) every function φ in Φ. Let us construct a set of length-1 test sequences, consisting of single functions chosen from Φ, and call this Φ1. The transition cover is defined as $$K :=C \cup C \otimes \Phi_{1}$$.

This will attempt every possible transition from every state. For those which succeed, we must validate the destination states. So, the smallest test-set T1 which completely validates the behaviour of a minimal Imp is given by: $$T_{1} ::=C \otimes W \cup C \otimes \Phi_{1} \otimes W$$. This formula can be rearranged as:


 * $$T_{1} ::=C \otimes\left(\Phi_{0} \cup \Phi_{1}\right) \otimes W$$,

where Φ0 is the set containing the empty sequence $\{{{angbr}}\}$. If Imp has more states than Spec, the above test-set may not be sufficient to guarantee the conformant behaviour of replicated states in Imp. So, sets of longer test sequences are chosen, consisting of all pairs of functions Φ2, all triples of functions Φ3 up to some limit Φk, when the tester is satisfied that Imp cannot contain chains of duplicated states longer than k-1. The final test formula is given by:


 * $$T_{\mathrm{k}} ::=C \otimes\left(\Phi_{0} \cup \Phi_{1} \ldots \cup \Phi_{\mathrm{k}}\right) \otimes W$$.

This test-set completely validates the behaviour of a non-minimal Imp in which chains of duplicated states are expected to be no longer than k-1. For most practical purposes, testing up to k=2, or k=3 is quite exhaustive, revealing all state-related faults in really poor implementations.