User:Aa2021dna/Polygenic score

Application of polygenic scores in humans
As the number of genome-wide association studies has exploded, along with rapid advances in methods for calculating polygenic scores, its most obvious application is in clinical settings for disease prediction or risk stratification. It is important not to over- or under-state the value of polygenic scores. A key advantage of quantifying polygenic contribution for each individual is that the genetic liability does not change over an individual's lifespan. However, while a disease may have strong genetic contributions, the risk arising from one's genetics has to be interpreted in the context of environmental factors. For example, even if an individual has a high genetic risk for alcohol, that risk is obsolete if that individual is never exposed to alcohol.

Clinical utility of polygenic scores
A landmark study examining the role of polygenic risk scores in cardiovascular disease invigorated interest the clinical potential of polygenic scores. This study demonstrated that for individual with the highest polygenic risk score (top 1%) had a lifetime cardiovascular risk >10% which was comparable to those with rare genetic variants. This comparison is important because clinical practice can be influenced by knowing which individuals have this rare genetic cause of cardiovascular disease. Since this study, polygenic risk scores have shown promise for disease prediction across other traits. Polygenic risk scores have been studied heavily in obesity, coronary artery disease, diabetes, breast cancer, prostate cancer, alzheimer's disease and psychiatric diseases.


 * 1) talk about clinical utiltiy

Use of Polygenic Risk Scores in Research and Clinical Applications
Clinical utility

- This information could be useful in decisions about participation in screening programmes, lifestyle modifications, or preventive treatment, when available and appropriate. PRS may also be relevant at different points along disease diagnosis and course (Fig. 2).

- he widespread interest in PRS is illustrated by their use by direct-to-consumer genetic testing companies; for example, 23andMe now offers polygenic risk scores for T2D.

Furthermore, to translate PRS to clinical tools, relative risks that compare individuals across the PRS continuum with a baseline group will eventually need to be transformed to absolute risks for the disease [34, 35].

Risk prediction models, including a combination of clinical, biochemistry, lifestyle, and historical risk factors, are currently used to predict 10-year risk of cardiovascular disease and diabetes [36,37,38,39]. These models combining risk factors achieve a good prediction (AUCs of 80–85%) and are included in clinical guidelines for prevention and public health [40]. Polygenic risk scores have much lower AUCs, as expected from a single risk factor, and should not be considered as an alternative to these clinical risk models but as a possible addition. With the established polygenic architecture of complex disorders, the improvement of genetic and statistical methodology, and the increase of global genotyped samples, it is reasonable to anticipate that genetic prediction will improve. In the meantime, it may be timely to consider the use of PRS in specific cohorts where there is a higher prior probability of disease.

https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-020-00742-5#Sec4

Not all preventive strategies are so benign; pharmacological interventions or surgical procedures are more controversial. For example, it would be very difficult to consider prophylactic mastectomy for breast cancer prevention. Even simple decisions like increased screening may result in false positives with significant economic cost to society and unnecessary stress of the individual.

Current progress in different dsz

pysch disease -> https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-020-00742-5#Sec4

IN guiding treatment:

Pharmacogenetic studies test how genetic variants affect response to treatment, with the aim of assisting treatment choices to maximise efficacy and minimise side effects. Most progress has been made in identifying rare high-risk variants that increase risk of adverse drug events (for example, abacavir and HLA-B*57:01, carbamazepine and HLA-B*15:02), whilst prediction of treatment efficacy has largely evaded genetic dissection.

Currently, the strongest evidence for a role of PRS in treatment response is in statin use to reduce the risk of first coronary event, where studies have shown that the relative risk reduction is higher in those at high genetic risk for cardiovascular disease [66, 67]. These results are in line with the previous reporting of better efficacy of statins in high-risk samples, for example, due to diabetes, hypertension, or high CRP concentrations [68]. A recent study demonstrated a potential role of PRS for electrocardiogram parameters in predicting the cardiac electrical response to sodium channel blockade [69].

In psychiatric disorders, only weak evidence exists to suggest that the PRS for disorder susceptibility might be predictive of treatment response in depression [70, 71] or psychosis [72].

Direct to consumer testing

Direct-to-consumer (DTC) genetic testing companies give consumers easy access to their genetic data, specifically genotyping on genome-wide chips of up to 1 million variants. Estimates suggest that 26 million people had used online DTC companies such as Ancestry.com and 23andMe up to the end of 2018 ( https://www.technologyreview.com/s/612880/more-than-26-million-people-have-taken-an-at-home-ancestry-test/ ). Whilst many purchasers are initially interested in ancestry testing, customers may then move on to analyse their genetic data for health [85], downloading their raw genotype data to explore in third-party interpretation programmes. These programmes are unregulated and differ in the genetic risks provided, the explanatory information provided, and the cautions given over interpretation. Some sites allow users to calculate polygenic risk scores; for example, Impute.me ( https://www.impute.me/ ) shows users where their polygenic risk score lies against a population-specific distribution of scores. Allelica provides an online service calculating polygenic risk scores [86]. In direct-to-consumer genetic testing, MyHeritage ( https://www.myheritage.com/health/genetic-risk-reports ) provides polygenic risk scores on four traits, ‘for people who are of mainly European ancestry’. The most detailed assessment of PRS in a DTC setting is from 23andMe, whose white paper presents their epidemiological modelling and the challenges of deriving individual-level absolute disease risks from PRS [67]. 23andMe provides polygenic risk scores for type 2 diabetes; based on external validation, their models have AUC values of between 59 and 65%, similar to those obtained from research studies [87]. Their customer reports give an estimate of the remaining lifetime risk of T2D based on genetics, age, and ancestry, with additional information on how BMI, diet, and exercise habits affect T2D prevalence.

Validation methods
- can use performance metrics like ROC-AUC for binary outcomes and assess model fit (R2) for continuous traits)

-In performing a PRS analysis using your own data, it is important to distinguish between base and target data to avoid overfitting. Overfitting can be defined as fitting a model too closely to one set of data, greatly limiting its predictive ability in external data. Often, an overfit model will reflect effects beyond true biological effects, such as random noise or population-specific effects.

-If only one dataset with relevant outcome information is readily available, it is proper practice to randomly split the dataset into base and target subsets

Broadly speaking there are two methods used for PRS validation.


 * 1) Test prediction quality in a new dataset containing individuals not used in the training of the predictor. This out-of-sample validation is now a standard requirement in peer review of new genomic predictors. Ideally these individuals would have experienced a different environment than the training set (e.g., were born and raised in a different part of the world, or in different decades). Examples of large scale out-of-sample validations include: CAD in French Canadians, breast cancer, blood and urine biomarkers, among many more.
 * 2) Perhaps the most rigorous validation method is to compare siblings who have grown up together. It has been shown that PRS can predict which of two brothers or which of two sisters has a specific condition, such as heart disease or breast cancer. The predictors work almost as well in predicting sibling disease status as when comparing two random individuals from the general population who did not share family environments while growing up. This is strong evidence for causal genetic effects. These results also suggest that embryo selection using PRS can reduce disease risk for children born through IVF.

Limitations

limited transferability across population https://currentprotocols.onlinelibrary.wiley.com/doi/pdf/10.1002/cpz1.126

This is because upwards of 80% of studies including a GWAS or PRS component have been performed in populations of predominantly European ancestry (De La Vega & Bustamante, 2018; Duncan et al., 2019; Martin et al., 2019). Use of base data, including GWAS results, from Europeanancestry populations typically leads to worse predictive ability in non-European ancestry populations due to the different effects that variants may have in different groups (Duncan et al., 2019; Márquez-Luna et al., 2017; Martin et al., 2017, 2019)

.

Application of polygenic scores in non-human species
The benefit of polygenic scores is that they can be used to predict the future for crops, animal breeding, and humans alike. Although the same basic concepts underlie these areas of prediction, they face different challenges that require different methodologies. The ability to produce very large family size in nonhuman species, accompanied by deliberate selection, leads to a smaller effective population, higher degrees of linkage disequilibrium among individuals, and a higher average genetic relatedness among individuals within a population. For example, members of plant and animal breeds that humans have effectively created, such as modern maize or domestic cattle, are all technically "related". In human genomic prediction, by contrast, unrelated individuals in large populations are selected to estimate the effects of common SNPs. Because of smaller effective population in livestock, the mean coefficient of relationship between any two individuals is likely high, and common SNPs will tag causal variants at greater physical distance than for humans; this is the major reason for lower SNP-based heritability estimates for humans compared to livestock. In both cases, however, sample size is key for maximizing the accuracy of genomic prediction.

While modern genomic prediction scoring in humans is generally referred to as a "polygenic score" (PGS) or a "polygenic risk score" (PRS), in livestock the more common term is "genomic estimated breeding value", or GEBV (similar to the more familiar "EBV", but with genotypic data). Conceptually, a GEBV is the same as a PGS: a linear function of genetic variants that are each weighted by the apparent effect of the variant. Despite this, polygenic prediction in livestock is useful for a fundamentally different reason than for humans. In humans, a PRS is used for the prediction of individual phenotype, while in livestock a GEBV is typically used to predict the offspring's average value of a phenotype of interest in terms of the genetic material it inherited from a parent. In this way, a GEBV can be understood as the average of the offspring of an individual or pair of individual animals. GEBVs are also typically communicated in the units of the trait of interest. For example, the expected increase in milk production of the offspring of a specific parent compared to the offspring from a reference population might be a typical way of using a GEBV in dairy cow breeding and selection.