Phase dispersion minimization



Phase dispersion minimization (PDM) is a data analysis technique that searches for periodic components of a time series data set. It is useful for data sets with gaps, non-sinusoidal variations, poor time coverage or other problems that would make Fourier techniques unusable. It was first developed by Stellingwerf in 1978 and has been widely used for astronomical and other types of periodic data analyses. Source code is available for PDM analysis. The current version of this application is available for download.

Background
PDM is a variant of a standard astronomical technique called data folding. This involves guessing a trial period for the data, and cutting, or "folding" the data into multiple sub-series with a time duration equal to the trial period. The data are now plotted versus "phase", or a scale of 0->1, relative to the trial period. If the data is truly periodic with this period a clean functional variation, or "light curve", will emerge. If not the points will be randomly distributed in amplitude.

As early as 1926 Whittiker and Robinson proposed an analysis technique of this type based on maximizing the amplitude of the mean curve. Another technique focusing on the variation of data at adjacent phases was proposed in 1964 by Lafler and Kinman. Both techniques had difficulties, particularly in estimating the significance of a possible solution.

PDM analysis
PDM divides the folded data into a series of bins and computes the variance of the amplitude within each bin. The bins can overlap to improve phase coverage, if needed. The bin variances are combined and compared to the overall variance of the data set. For a true period the ratio of the bin to the total variances will be small. For a false period the ratio will be approximately unity. A plot of this ratio versus trial period will usually indicate the best candidates for periodic components. Analyses of the statistical properties of this approach have been given by Nemec & Nemec and Schwarzenberg-Czerny.

PDM2 updates
The original PDM technique has been updated (PDM2) in several areas::
 * 1) The bin variance calculation is equivalent to a curve fit with step functions across each bin. This can introduce errors in the result if the underlying curve is non-symmetric, since deviations toward the right side and left side of each bin will not exactly cancel. This low order error can be eliminated by replacing the step function by a linear fit drawn between bin means (see figure, above), or a B-Spline fit to the bin means. In either case, the smoothed fits should not be used for frequencies in the "noise" portion of the spectrum.
 * 2) The original test of significance was based on an F test, which has been shown to be incorrect. The correct statistic is an incomplete beta distribution for well-behaved data sets, and a Fisher Randomization / Monte-Carlo analysis for "clumpy" data (i.e. data with non-uniform time distribution).
 * 3) To accommodate new data sets with many data points, a new "Rich Data" version of PDM, called PDM2b has been developed. This version uses 100 bins per period, rather than the default value of 10 bins per period. An example of this option is shown here.



See reference (2) for a detailed technical discussion, test cases, C source code, and a Windows application package.

Binless PDM
In Plavchan et al. 2008, Plavchan introduced a binless version of the phase dispersion minimization algorithm. The algorithm was further revised in 2014 in Parks, Plavchan et al. 2014, and is available for highly-parallel use online at the NASA Exoplanet Archive. The binned PDM approach is susceptible to period aliases when the cadence is semi-regular (e.g., nightly observations of a star brightness). Plavchan and colleagues avoided this aliasing by computing a box-car smoothed phased time-series, where the box-car width can be thought of as the old bin size. The original folded time-series is compared to the smoothed time-series, and the best period is found when the time-series are most similar. See the NASA Exoplanet Archive for more information on statistical significance and approaches.