HRDetect

HRDetect (Homologous Recombination Deficiency Detect) is a whole-genome sequencing (WGS)-based classifier designed to predict BRCA1 and BRCA2 deficiency based on six mutational signatures. Additionally, the classifier is able to identify similarities in mutational profiles of tumors to that of tumors with BRCA1 and BRCA2 defects, also known as BRCAness. This classifier can be applied to assess the implementation of PARP inhibitors in patients with BRCA1/BRCA2 deficiency. The final output is a probability of BRCA1/2 mutation.

BRCA1/BRCA2
BRCA1 and BRCA2 play crucial roles in maintaining genome integrity, mainly through homologous recombination (HR) for DNA double-strand breaks (DSB)repair. The mutations of BRCA1 and BRCA2 can lead to a reduced capacity of HR machinery, increased genomic instability, and elicit a predisposition to malignancies. People with BRCA1 and BRCA2 deficiency have higher risks of developing certain cancers such as breast and ovarian cancers. Germline defects in BRCA1/BRCA2 genes account for up to 5% of breast cancer cases.

PARP inhibitors
Poly (ADP ribose) polymerase (PARP) inhibitors are designed to treat BRCA1- and BRCA2- defect tumors owing to their homologous recombination deficiency. These drugs have been majorly implemented in breast and ovarian cancers, and their clinical efficacy among patients with other types of cancers, such as pancreatic cancer, is still being investigated. It is vital to identify adequate patients with BRCA1/BRCA2 deficiency to utilize PARP inhibitors optimally. PARP inhibitors operate on the concept of synthetic lethality where by selectively causing cell death in BRCA-mutant cells while sparing normal cells.

HRDetect
HRDetect was implemented to detect tumors with BRCA1/BRCA2 deficiency using the data from whole-genome sequencing. This model quantitatively aggregates six HRD-associated signatures into a single score called HRDetect to accurately classify breast cancers by their BRCA1 and BRCA2 status. The machine learning algorithm assigns weight values to these signatures prior to computing the final score. The six signatures, ranked by decreasing weight, include microhomology-mediated indels, the HRD index, base- substitution signature 3, rearrangement signature 3, rearrangement signature 5, and base- substitution signature 8. Additionally, this weighted approach is able to identify BRCAness, which refers to mutational phenotypes displaying homologous recombination deficiency similar to tumors with BRCA1/BRCA2 germline defects.

Input
HRDetect requires four types of inputs:


 * 1) Counts of mutations associated with each signature of single-base substitutions
 * 2) Indels with microhomology at the indel breakpoint junction, indels at polynucleotide-repeat tracts and other complex indels as proportions
 * 3) Counts of rearrangements associated with each signature
 * 4) HRD index (Arithmetic sum of loss of heterozygosity (LOH), telomeric-allelic imbalance (TAI), and large-scale state transitions (LST) scores)

Statistical Analysis
It is based on a supervised learning method using a lasso logistic regression model to distinguish samples into those with and without BRCA 1/2 deficiency. Optimal coefficients are obtained by minimizing the objective function.

Log Transformation
To account for a high substitution count in samples, the genomic data is first log transformed:

$$ x=\ln (x+1) $$

Standardization
The transformed data is then standardized to make mutational class values comparable giving each object a mean of 0 and a standard deviation (sd) of 1:

$$ \mathrm{x}=\frac{x-\operatorname{mean}\left(x\right)}{\mathrm{s} \mathrm{d} \cdot\left(x\right)} $$

Lasso Logistical Regression Modelling
To be able to distinguish between those affected and not affected by BRCA1/BRCA2 deficiency, a lasso logistic regression model is used:

$$\min_{((\beta_0,\, \beta)) \in \mathbb{R}^{p+1}}{\left(-\left[\frac{1}{N} \sum_{i=1}^{N} y_{i} \cdot\left(\beta_{0}+x_{i}^{T} \beta\right)-\log \left(1+e^{\left(\beta_{0}+x_{i}^{T} \beta\right)}\right)\right]+\lambda\|\beta\|_{1}\right)}

$$

where:

$$y_{i}$$: BRCA status of a sample || yi = 1 for BRCA1/BRCA2-null samples || yi = 0 otherwise

$$\beta_{0}$$: Intercept, interpreted as the log of odds of $$y_{i}$$ = 1 when $$x_{i}^{T}$$ = 0

$$\beta$$: Vector of weights

$$p$$: Number of features characterizing each sample

$$N$$: Number of samples

$$x_{i}^{T}$$: Vector of features characterizing the ith sample

$$\lambda$$: Penalty promoting the sparseness of the weights

$$\|\beta\|$$: L1 norm of the vector of weights

The β weights are constrained to be positive to reflect the presence of mutational actions due to BRCA1/BRCA2 defects. Setting the constraint of nonnegative weights ensures that all samples would be scored on the basis of the presence of relevant mutational signatures associated with BRCA1/BRCA2 deficiency, irrespective of whether these signatures are the dominant mutational process in the cancer.

HRDetect Score
Lastly, the weights obtained from the lasso regression are used to give a new sample a probabilistic score using the normalized mutational data $$x_{i}^{T}$$and application of the model parameters($$\beta$$, $$\beta_{0}$$):

$$ P\left(C_{i}=B R C A\right)=\frac{1}{1+e^{-\left(\beta_{0}+x_{i}^{T} \beta\right)}} $$

where:

$$C_{i}$$ : variable encoding the status of the ith sample

$$\beta_{0}$$ : Intercept weight

$$x_{i}^{T}$$: Vector encoding features of the ith sample

$$\beta$$: Vector of weights

Interpretation
The probability value quantifies the degree of BRCA1/BRCA2 defectiveness. A cut-off probability value should be chosen while maintaining a high sensitivity. These scores can be utilized to guide therapy.

Predicting Chemotherapeutic Outcomes
Mutations in genes responsible for HR are prevalent among human cancers. The BRCA1 and BRCA2 genes are centrally involved in HR, DNAdamage repair, end resection, and checkpoint signaling. Mutational signatures of HRD have been identified in over 20% of breast cancers, as well as pancreatic, ovarian, and gastric cancers. BRCA1/2 mutations confer sensitivity to platinum-based chemotherapies. HRDetect can independently trained to predict BRCA1/2 status, and has the capacity to predict outcomes on platinum-based chemotherapies.

Breast Cancer
HRDetect was initially developed to detect tumors with BRCA1 and BRCA2 deficiency based on the data from whole-genome sequencing of a cohort of 560 breast cancer samples. Within this cohort, 22 patients were known to carry germline BRCA1/BRCA2 mutations. BRCA1/BRCA2- deficiency mutational signatures were found in more breast cancer patients than previously known. This model was able to identify 124 (22%) breast cancer patients showing BRCA1/2 mutational signatures in this cohort of 560 samples. Apart from the 22 known cases, an additional 33 patients showed deficiency with germline BRCA1/2 mutations, 22 patients displayed somatic mutation of BRCA1/2, and 47 were recognized to show functional defect without detected BRCA1/2 mutation. As a result, with an application of a probabilistic cut-off 0.7, HRDetect was able to demonstrate a 98.7% sensitivity recognizing BRCA1/2- deficient cases.

In contrast, germline mutations of BRCA1/2 are present in only 1~5% of breast cancer cases. Furthermore, these findings suggest that more breast cancer patients, as many as 1 in 5 (20%), may benefit from PARP inhibitors than a small percentage of patients currently given with the treatment. Cohort of 80 Breast cancer patients. 6 out of 7 are above HRDetect score 0.7.

 Cohort of 80 Breast Cancer Samples 

HRDetect was tested in 80 breast cancer cases with mainly ER positive and HER2 negative. The tool was able to find ones that exceed HRDetect score 0.7, including one germline BRCA1 mutation carrier, four germline BRCA2 mutation carriers and one somatic BRCA2 mutation carrier. The sensitivity of this tool also reached 86%.

Compatibility Across Cancers
HRDetect can be applied to other cancer types and yields adequate sensitivity.

Ovarian Cancer
In a cohort of 73 patients with ovarian cancer, 30 patients were known to carry BRCA1/BRCA2 mutations and 46 (63%) patients were assessed by HRDetect to have HRDetect score over 0.7. The sensitivity of detecting BRCA1/2-deficient cancer was almost 100%, with an additional 16 cases identified.

Pancreatic Cancer
In a cohort of 96 patients with pancreatic cancers, 6 cases were known to have mutation or allele loss and 11 (11.5%) patients were identified by HRDetect to an exceed cutoff of 0.7. The study observed a similar result of sensitivity approaching 100%, with five other cases identified.

Advantages and Limitations
Advantages
 * The concordance is predictions is high between low coverage and high coverage sequencing.
 * It can trained on whole exome sequencing (WES) data
 * It can be used with sequencing data from formalin fixed paraffin embedded (FFPE)
 * It can distinguish BRCA1 from BRCA2 tumors

Limitations

While it can be used with WES data, the sensitivity of detection falls considerably when not trained with such data. The sensitivity increases when training is performed with WES data however false-positive's are still identified.