Better Polygenic Risk Prediction

The logo for 23andMe. It starts with an X that has one green line and one red line.

Researchers at Johns Hopkins Bloomberg School of Public Health, Harvard, the Broad Institute, the National Cancer Institute, and 23andMe have developed a method that significantly improves the performance of polygenic risk models for people of non-European ancestry.

Improving Polygenic Risk Scores

Over the last several years, these kinds of risk models, also known as polygenic risk scores, or PRS, have begun to offer up meaningful risk information on many diseases like breast cancer, type 2 diabetes, or heart disease. But the performance of these models – which rely on very large genome wide association studies and are based on hundreds of thousands or even many thousands of variants to calculate risk – perform poorly for non-Europeans. That’s in part because of the lack of genetic research that includes data from non-Europeans.

Outlined in a paper published in the journal Nature Genetics, this new method addresses that problem, improving both the performance and training and building speed of these risk models.

The New Model

Dubbed CT-SLEB, this new approach substantially improved the performance of polygenic risk models in diverse populations, especially among people with African ancestry. And these CT-SLEB models can be modeled much faster than other polygenic risk modeling methods.

“At 23andMe, we are committed to providing health value to everyone. With this collaborative study, we helped improve a method to make polygenetic risk models perform better for underrepresented populations so that everyone can have a better understanding of their future health,” said Jianan Shan, a senior scientist with 23andMe’s Product R&D. “There’s great potential for using these models in clinical care, too, but to make that a reality we need to make sure they work for people of all ancestries.”

How it works

The scientists used three distinct steps in their approach. First, they used something they called “clumping and thresholding.” This allowed them to select risk variants relevant to different populations – European, African, East Asian, South Asian, and Latino. 

Then, they used a statistical method called “Empirical-Bayes.” This statistical method uses the risk averages from the whole dataset to adjust the risk averages of each specific population. Finally, a team applied a third layer that involved a machine learning approach. Calling it “super learning,” this machine learning approach adjusts and improves the predictive accuracy of the polygenic risk model.

Then they compared their model to nine other modeling methods across five different ancestry groups. They did this to test the predictive value for seven complex traits using data from more than 3.7 million 23andMe customers who consented to participate in research. These research participants were also of different ancestries. The 3.7 million includes more than 413,000 Latinos, 117,000 African Americans, 96,000 people of East Asian descent, and another 26,000 people of South American descent.

It proved to be much faster, highly scalable and one of the most powerful methods for generating risk predictions in non-European populations, particularly among African Americans.

Some Caveats

While this new approach offers promise, the study authors noted some of the limitations. This new method substantially improved the performance of polygenic risk models in many settings, but there is still room to improve results for non-Europeans.

In addition, the authors noted that the best approach for risk prediction might involve using multiple methods and combining the results. Even with best methods, the disparity in polygenic risk model performance may remain. That is unless there are larger sample sizes from understudied populations.

Related Articles on

23andMe Adds More Detail To Central American Ancestry

23andMe Adds More Detail For Indigenous Mexican Ancestry

23andMe Adds More Details For Indigenous Caribbean Ancestry

< Return To Blog

Leave a Reply

Your email address will not be published.