| Literature DB >> 28544481 |
Binghuang Cai1, Biao Li2, Nikki Kiga1, Janita Thusberg2, Timothy Bergquist1, Yun-Ching Chen3, Noushin Niknafs3, Hannah Carter4, Collin Tokheim3, Violeta Beleva-Guthrie3, Christopher Douville3, Rohit Bhattacharya5, Hui Ting Grace Yeo3, Jean Fan3, Sohini Sengupta3, Dewey Kim3, Melissa Cline6, Tychele Turner7, Mark Diekhans6, Jan Zaucha8,9, Lipika R Pal10, Chen Cao10,11, Chen-Hsin Yu10,11, Yizhou Yin10,11, Marco Carraro12, Manuel Giollo12,13, Carlo Ferrari13, Emanuela Leonardi14, Silvio C E Tosatto12,15, Jason Bobe16, Madeleine Ball16, Roger A Hoskins17, Susanna Repo18, George Church16, Steven E Brenner17, John Moult10,19, Julian Gough9, Mario Stanke20, Rachel Karchin3,21, Sean D Mooney1.
Abstract
The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features.Entities:
Keywords: biomedical informatics; community challenge; critical assessment; genome; genome interpretation; open consent; personal genome project (PGP); phenotype
Mesh:
Year: 2017 PMID: 28544481 PMCID: PMC5645203 DOI: 10.1002/humu.23265
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.878