Runtime predictive analysis

Runtime predictive analysis (or predictive analysis) is a runtime verification technique in computer science for detecting property violations in program executions inferred from an observed execution. An important class of predictive analysis methods has been developed for detecting concurrency errors (such as data races) in concurrent programs, where a runtime monitor is used to predict errors which did not happen in the observed run, but can happen in an alternative execution of the same program. The predictive capability comes from the fact that the analysis is performed on an abstract model extracted online from the observed execution, which admits a class of executions beyond the observed one.

Overview
Informally, given an execution $$t$$, predictive analysis checks errors in a reordered trace $$t'$$ of $$t$$. $$t'$$ is called feasible from $$t$$ (alternatively a correct reordering of $$t$$) if any program that can generate $$t$$ can also generate $$t'$$.

In the context of concurrent programs, a predictive technique is sound if it only predicts concurrency errors in feasible executions of the causal model of the observed trace. Assuming the analysis has no knowledge about the source code of the program, the analysis is complete (also called maximal ) if the inferred class of executions contains all executions that have the same program order and communication order prefix of the observed trace.

Applications
Predictive analysis has been applied to detect a wide class of concurrency errors, including:
 * Data races
 * Deadlocks
 * Atomicity violations
 * Order violations, e.g., use-after-free errors

Implementation
As is typical with dynamic program analysis, predictive analysis first instruments the source program. At runtime, the analysis can be performed online, in order to detect errors on the fly. Alternatively, the instrumentation can simply dump the execution trace for offline analysis. The latter approach is preferred for expensive refined predictive analyses that require random access to the execution trace or take more than linear time.

Incorporating data and control-flow analysis
Static analysis can be first conducted to gather data and control-flow dependence information about the source program, which can help construct the causal model during online executions. This allows predictive analysis to infer a larger class of executions based on the observed execution. Intuitively, a feasible reordering can change the last writer of a memory read (data dependence) if the read, in turn, cannot affect whether any accesses execute (control dependence).

Partial order based techniques
Partial order based techniques are most often employed for online race detection. At runtime, a partial order over the events in the trace is constructed, and any unordered pairs of critical events are reported as races. Many predictive techniques for race detection are based on the happens-before relation or a weakened version of it. Such techniques can typically be implemented efficiently with vector clock algorithms, allowing only one pass of the whole input trace as it is being generated, and are thus suitable for online deployment.

SMT-based techniques
SMT encodings allow the analysis to extract a refined causal model from an execution trace, as a (possibly very large) mathematical formula. Furthermore, control-flow information can be incorporated into the model. SMT-based techniques can achieve soundness and completeness (also called maximal causality ), but has exponential-time complexity with respect to the trace size. In practice, the analysis is typically deployed to bounded segments of an execution trace, thus trading completeness for scalability.

Lockset-based approaches
In the context of data race detection for programs using lock based synchronization, lockset-based techniques provide an unsound, yet lightweight mechanism for detecting data races. These techniques primarily detect violations of the lockset principle. which says that all accesses of a given memory location must be protected by a common lock. Such techniques are also used to filter out candidate race reports in more expensive analyses.

Graph-based techniques
In the context of data race detection, sound polynomial-time predictive analyses have been developed, with good, close to maximal predictive capability based on a graphs.

Computational Complexity
Given an input trace of size $$n$$ executed by $$k$$ threads, general race prediction is NP-complete and even W[1]-hard parameterized by $$k$$, but admits a polynomial-time algorithm when the communication topology is acyclic. Happens-before races are detected in $$O(n\cdot k)$$ time, and this bound is optimal. Lockset races over $$d$$ variables are detected in $$O(n\cdot d)$$ time, and this bound is also optimal.

Tools
Here is a partial list of tools that use predictive analyses to detect concurrency errors, sorted alphabetically.


 * : a lightweight framework for implementing dynamic race detection engines.
 * : a dynamic analysis framework designed to facilitate rapid prototyping and experimentation with dynamic analyses for concurrent Java programs.
 * : SMT-based predictive race detection.
 * : SMT-based predictive use-after-free detection.