User:Ahoriguchi/sandbox

ClusterZ is a tool for detecting clustering patterns in binding sites of two transcription factors.

ClusterZ takes as input the positions and on which genes the two transcription factors bind. It represents binding site position as a single positive integer and hence does not take into account how long the binding site is. It outputs two statistics (Z-score and its corresponding p-value) of how strongly the binding sites are clustered.

Definition
To calculate p-values, ClusterZ uses this null model: BS locations of a transcription factor are modelled by an inhomogeneous Poisson point process independent of the binding of other transcription factors. The interval of interest represents an upstream region. Ripley's K-function is used to score how strongly the binding sites are clustered. To detect a potential clustering pattern, the K-function is calculated for each upstream region. A normalized Z-score of the K-functions is calculated, which by Lindberg's central limit theorem approximates a standard normal distribution under the null model. Its corresponding p-value is then calculated.

Use
ClusterZ calculates how strongly the binding sites of two transcription factors are clustered. The stronger the clustering, the more likely the two transcription factors are to biologically interact in some way. Hence, given $$n$$ pairs of transcription factor inputs, one can narrow down the $$\binom{n}{2}$$ possible combinations of transcription factor combinations to find a set of candidate pairs of transcription factors. This can reduce wet-lab experiments.

Shortcomings
ClusterZ does not take into account the length of each transcription factor's binding patterns.