Interactive visual analysis

Interactive Visual Analysis (IVA) is a set of techniques for combining the computational power of computers with the perceptive and cognitive capabilities of humans, in order to extract knowledge from large and complex datasets. The techniques rely heavily on user interaction and the human visual system, and exist in the intersection between visual analytics and big data. It is a branch of data visualization. IVA is a suitable technique for analyzing high-dimensional data that has a large number of data points, where simple graphing and non-interactive techniques give an insufficient understanding of the information.

These techniques involve looking at datasets through different, correlated views and iteratively selecting and examining features the user finds interesting. The objective of IVA is to gain knowledge which is not readily apparent from a dataset, typically in tabular form. This can involve generating, testing or verifying hypotheses, or simply exploring the dataset to look for correlations between different variables.

History
Focus + Context visualization and its related techniques date back to the 1970s. Early attempts at combining these techniques for Interactive Visual Analysis occur in the WEAVE visualization system for cardiac simulation in the year 2000. SimVis appeared in 2003, and multiple Ph. D. projects have explored the concept since then - notably Helmut Doleisch in 2004, Johannes Kehrer in 2011 and Zoltan Konyha in 2013. ComVis, which is used in the visualization community, appeared in 2008.

Basics
The objective of Interactive Visual Analysis is to discover information in data which is not readily apparent. The goal is to move from the data itself to the information contained in the data, ultimately uncovering knowledge which was not apparent from looking at the raw numbers.

The most basic form of IVA is to use coordinated multiple views displaying different columns of our dataset. At least two views are required for IVA. The views are usually among the common tools of information visualization, such as histograms, scatterplots or parallel coordinates, but using volume rendered views is also possible if this is appropriate for the data. Typically, one view will display the independent variables of the dataset (e.g. time or spatial location), while the others display the dependent variables (e.g. temperature, pressure or population density) in relation to each other. If the views are linked, the user can select data points in one view and have the corresponding data points automatically highlighted in the other views. This technique, which intuitively allows exploration of higher-dimensional properties of the data, is known as linking and brushing.

The selection made in one of the views doesn't have to be binary. Software packages for IVA can allow for a gradual “degree of interest” in the selection, where data points are gradually highlighted as we move from low to high interest. This allows for an inherent “focus+context” aspect to the search for information. For instance, when examining a tumor in a Magnetic resonance imaging dataset, the tissue surrounding the tumor might also be of some interest to the operator.

The IVA loop
Interactive Visual Analysis is an iterative process. Discoveries made after brushing of the data and looking at the linked views can be used as a starting point for repeating the process, leading to a form of information drill-down. As an example, consider the analysis of data from a simulation of a combustion engine. The user brushes a histogram of temperature distribution, and discovers that one specific part of one cylinder has dangerously high temperatures. This information can be used to formulate the hypothesis that all cylinders have a problem with heat dissipation. This could be verified by brushing the same region in all other cylinders and seeing in the temperature histogram that these cylinders also have higher temperatures than expected.

Data model
The data source for IVA is usually tabular data where the data is represented in columns and rows. The data variables can be divided into two different categories: independent and dependent variables. The independent variables represent the domain of the observed values, such as for instance time and space. The dependent variables represent the data being observed, for instance temperature, pressure or height.

IVA can help the user uncover information and knowledge about data sources that have fewer dimensions as well as datasets that have a very large number of dimensions.

Levels of IVA
The IVA tools can be divided into several different levels of complexity. These levels provides the user with different interaction tools to analyze the data. For most uses, the first level will be sufficient and this is also the level that provides the user with the fastest response from the interaction. The higher levels make it possible to uncover more subtle relationships in the data. However, this requires more knowledge about the tools and the interaction process has a longer response time.

Base level
The most simple form of IVA is the base level which consists of brushing and linking. Here the user can set up several views with different dataset variables and mark an interesting area in one of the views. The data points corresponding to the selection is marked automatically in the other views. A lot of information can be derived from this level of IVA. For datasets where the relationships between the variables are reasonably simple, this technique is usually sufficient for the user to achieve the required level of understanding.

Second level
Brushing and linking with logical combination of brushes is a more advanced form of IVA. This makes it possible for the user to mark several areas in one or several views and combine these areas with the logical operators: and, or, not. This makes it is possible to explore deeper into the dataset and see more hidden information. A simple example would be the analysis of weather data: The analyst might want to discover regions that both have warm temperatures and low precipitation.

Third level
The logical combination of selections might not be sufficient to uncover meaningful information from the data set. There are multiple techniques available that make hidden relationships in the data more apparent. One of these is attribute derivation. This allows the user to derive additional attributes from the data, such as derivatives, clustering information or other statistic properties. In principle, the operator can perform any set of calculations on the raw data. The derived attributes can then be linked and brushed like any other attribute.

The second tool in level three of IVA is advanced brushing techniques, such as angular brushing, similarity brushing or percentile brushing. These brushing tools select data points in a more advanced fashion than plain "point and click" selection. Advanced brushing generates a faster response than attribute derivation, but has a higher learning curve and require a deeper understanding of the dataset.

Fourth level
The fourth level of IVA is specific to each dataset and varies dependent on the dataset and the purpose of the analysis. Any calculated attribute which is specific to the data under consideration, belongs to this category. An example from the analysis of flow data would be the detection and categorization of vortexes or other structures present in the flow data. This means that fourth-level IVA techniques must be individually tailored to the specific application. After detection of higher-order features, the calculated attributes would be connected to the original data set and subjected to the normal technique of linking and brushing.

Patterns of IVA
The "linking and brushing" (selection) concept of IVA can be used between different types of variables in the dataset. Which pattern we should use depends on which aspect of the correlations in the dataset are of interest.

Feature localization
Brushing data points from the set of dependent variables (e.g. temperature) and seeing where among the independent variables (e.g. space or time) these data points show up, is called "feature localization". With feature localization, the user can easily identify the location of features in the dataset. Examples from a meteorological dataset would be which regions have a warm climate or which times of the year have a lot of precipitation.

Local investigation
If independent variables are brushed and we look for the corresponding connection to a dependent view, this is termed "local investigation". This makes it possible to investigate the characteristics of for example a specific region or specific time. In the case of meteorological data, we could for instance discover the temperature distribution during the winter months.

Multivariate analysis
Brushing dependent variables and watching the connection to other dependent variables is called multivariate analysis. This could for example be used to find out if high temperatures are correlated with pressure by brushing high temperatures and watching a linked view of pressure distributions.

Since each of the linked views usually has two or more dimensions, multivariate analysis can implicitly uncover higher-dimensional features of the data which would not be readily apparent from e.g. a simple scatterplot.