Conformance checking

Business process conformance checking (a.k.a. conformance checking for short) is a family of process mining techniques to compare a process model with an event log of the same process. It is used to check if the actual execution of a business process, as recorded in the event log, conforms to the model and vice versa.

For instance, there may be a process model indicating that purchase orders of more than one million euros require two checks. Analysis of the event log will show whether this rule is followed or not.

Another example is the checking of the so-called “four-eyes” principle stating that particular activities should not be executed by one and the same person. By scanning the event log using a model specifying these requirements, one can discover potential cases of fraud. Hence, conformance checking may be used to detect, locate and explain deviations, and to measure the severity of these deviations.

Overview
Conformance checking techniques take as input a process model and event log and return a set of differences between the behavior captured in the process model and the behavior captured in the event log. These differences may be represented visually (e.g. overlaid on top of the process model) or textually as lists of natural language statements (e.g., activity x is executed multiple times in the log, but this is not allowed according to the model). Some techniques may also produce a normalized measures (between 0 and 1) indicating to what extent the process model and the event log match each other.

The interpretation of non-conformance depends on the purpose of the model:
 * If the model is intended to be descriptive, discrepancies between model and log indicate that the model needs to be improved to capture reality better.
 * If the model is normative, then such discrepancies may be interpreted in two ways: they may expose undesirable deviations (i.e., conformance checking signals the need for a better control of the process). or may reveal desirable deviations (i.e., workers may deviate to serve the customers better or to handle circumstances not foreseen by the process model).

Techniques
The purpose of conformance checking is to identify two types of discrepancies:
 * Unfitting log behavior: behavior observed in the log that is not allowed by the model.
 * Additional model behavior: behavior allowed in the model but never observed in the log.

There are broadly three families of techniques for detecting unfitting log behavior: replay, trace alignment and behavioral alignment.

In replay techniques, each trace is replayed against the process model one event at a time. When a replay error is detected, it is reported and a local correction is made to resume the replay procedure. The local correction may be for example to skip/ignore a task in the process model or to skip/ignore an event in the log.

A general limitation of replay methods is that error recovery is performed locally each time that an error is encountered. Hence, these methods might not identify the minimum number of errors that can explain the unfitting log behavior. This limitation is addressed by trace alignment techniques. These latter techniques identify, for each trace in the log, the closest corresponding trace that can be parsed by the model. Trace alignment techniques also compute an alignment showing the points of divergence between these two traces. The output is a set of pairs of aligned traces. Each pair shows a trace in the log that does not match exactly a trace in the model, together with the corresponding closest trace(s) produced by the model.

Trace alignment techniques do not explicitly handle concurrent tasks nor cyclic behavior (repetition of tasks). If for example four tasks can occur only in a fixed order in the process model (e.g. [A, B, C, D]), but they can occur concurrently in the log (i.e. in any order), this difference cannot directly detected by trace alignment, because it cannot be observed at the level of individual traces.

Other methods to identify additional behavior are based on negative events . These methods start by enhancing the traces in the log by inserting fake (negative) events in all or some traces of the log. A negative event is inserted after a given prefix of a trace if this event is never observed preceded by that prefix anywhere in the log.

For example, if event C is never observed after prefix AB, then C can be inserted as a negative event after AB. Thereafter, the log enhanced with negative events is replayed against the process model. If the process model can replay the negative events, it means that there is behavior captured in the process model that is not captured in the log (since the negative events correspond to behavior that is never observed in the log).

Comparing footprint matrices
Footprint matrices display the causal dependency of two activities in an event log, e.g., if in an event log, activity a is followed by activity b in all traces but activity b is never followed by b. Toward this kind of dependency, a list of ordering relations is declared:

Let L be an event log associated with the list A of all activities. Let a, b be two activities in A.


 * a ᐳL b if and only if there is a trace σ in L, in which the pattern (a, b) occurs.
 * a →L b if and only if a ᐳL b and not b ᐳL a.
 * a #L b if and only if not a ᐳL b and not b ᐳL a.
 * a ||L b if and only if a ᐳL b and b ᐳL a.

For a process model, such a matrix can also be derived on top of the execution sequences by using the play-out technique. Therefore, based on the footprint matrices, one can reason that if an event log conforms with a regarded process model, the two footprint matrices representing the log and the model are identical, i.e., the behaviors recorded in the model (in this case is the causal dependency) appear at least once in the event log.

Example: Let L be: ${, }$ and a model M of L. Assume the two matrices are as follows: We can notice that, in the footprint matrix of model M, the pattern (a, d) is allowed to occur, hence, it causes a deviation in comparison with the event log. The fitness between the event log and the model is computed as follows:

$$1-\frac{\# deviations}{\#relations} $$

In this example, the fitness is $$1-\frac{2}{16} = 0.875$$.

Token-replay technique
Token-based replay is a technique that uses 4 counters (produced tokens, consumed tokens, missing tokens and remaining tokens) to compute the fitness of an observation trace based on a given process model in Petri-net notation. These 4 counters record the status of tokens when a trace is replayed on the Petri net. When a token is produced by a transition, produced tokens is increased by 1. When a token is consumed to fire a transition, consumed tokens is increased by 1. When a token is missing to fire a transition, missing tokens is increased by 1. Remaining tokens records the total remaining tokens after the trace is complete. The trace conforms with the process model if and only if there are no missing tokens during the replay and no remaining tokens at the end.

The fitness between an event log and a process model is computed as follows:

$$\frac{1}{2}\biggl(1-\frac{m}{c}\biggr) + \frac{1}{2}\biggl(1-\frac{r}{p}\biggr)$$ where m is the number of missing tokens, c is the number of consumed tokens, r is the number of remaining tokens, p is the number of produced tokens.

Alignments
Although the token-replay technique is efficient and easy to understand, the approach is designed for Petri net notation and doesn't consider the suitable path generated by the model for the unfit cases. Alignments were introduced to solve the limitations and is considered a highly accurate conformance checking technique and can be applied for any process modeling notation. The idea is that the algorithm performs an exhaustive search to find out the optimal alignment between the observed trace and the process model. Hence, it is guaranteed to find out the most related model run in comparison to the trace.