Scan statistic

In statistics, a scan statistic or window statistic is a problem relating to the clustering of randomly positioned points. An example of a typical problem is the maximum size of a cluster of points on a line or the longest series of successes recorded by a moving window of fixed length.

Joseph Naus first published on the problem in the 1960s, and has been called the "father of the scan statistic" in honour of his early contributions. The results can be applied in epidemiology, public health and astronomy to find unusual clusters of events.

It was extended by Martin Kulldorff to multidimensional settings and varying window sizes in a 1997 paper, which is the most cited article in its journal, Communications in Statistics – Theory and Methods. This work lead to the creation of the software SaTScan, a program trademarked by Martin Kulldorff that applies his methods to data.

Recent results have shown that using scale-dependent critical values for the scan statistic allows to attain asymptotically optimal detection simultaneously for all signal lengths, thereby improving on the traditional scan, but this procedure has been criticized for losing too much power for short signals. Walther and Perry (2022) considered the problem of detecting an elevated mean on an interval with unknown location and length in the univariate Gaussian sequence model. They explain this discrepancy by showing that these asymptotic optimality results will necessarily be too imprecise to discern the performance of scan statistics in a practically relevant way, even in a large sample context. Instead, they propose to assess the performance with a new finite sample criterion. They presented three new calibration techniques for scan statistics that perform well across a range of relevant signal lengths to optimally increase performance of short signals.

The scan-statistic-based methods have been specifically developed to detect rare variant associations in the noncoding genome, especially for the intergenic region. Compared with fixed-size sliding window analysis, scan-statistic-based methods use data-adaptive size dynamic window to scan the genome continuously, and increase the analysis power by flexibly selecting the locations and sizes of the signal regions. Some examples of these methods are Q-SCAN, SCANG, WGScan.