Literature DB >> 19455137

Personal phenotypes to go with personal genomes.

Michael Snyder, Sherman Weissman, Mark Gerstein.

Abstract

Entities: Disease Species

Year: 2009 PMID： 19455137 PMCID： PMC2694681 DOI： 10.1038/msb.2009.32

Source DB: PubMed Journal: Mol Syst Biol ISSN： 1744-4292 Impact factor: 11.429

× No keyword cloud information.

With the cost of DNA sequencing decreasing rapidly, it is likely that the genome sequences of many individuals will be determined. In fact, if half of the individuals in industrialized countries choose to have their genomes sequenced, then well over 500 million personal genome sequences will be determined. Currently, such genetic information is likely to be of limited value to the individual, as the number of loci that provide useful predictive information is quite small (probably less than 200). Indeed, recent analyses of common complex traits such as diabetes, body mass and height show that in each case the genetically identifiable contribution from multiple candidate loci (18 in the case of diabetes) is only a small percentage (less than 7%) of the total identifiable genetic load (Gaulton ; Willer ); thus, the interpretable genetic contributions that can be identified are quite minor. Presumably, either many low-frequency alleles at different loci contribute to the genetic load or perhaps the many phenotypes are because of other phenomena such as synergistic effects between variants at more than one locus or between different loci and factors in the environment, recurrent spontaneous mutations, or epigenetic defects. Regardless of which proves to be correct (likely a differing mixture of effects for different diseases), the ability to accurately correlate all bases with precise phenotypes is likely to be powerful only if a common set of phenotypes are scored. The power of 500 million sequences correlated with 500 million phenotypes can show both small contributions as well as help identify potential causative mutations. Indeed, a data set of this size would greatly exceed that of even the large genome-wide association studies that typically analyze thousands of individuals to tens of thousands of individuals (Willer ). Although in the short term this information is not likely to be helpful for prediction of common diseases, it may provide a generic tool for interpretation and prevention of a large number of individually uncommon or rare recessive disorders. In the long term, it is likely to be of enormous value to scientists for understanding which types of genes and pathways are involved in a particular biological process and for determining the underlying nature of complex diseases. Furthermore, the entire community is expected to ultimately benefit from this information, which would help in the diagnosis and treatment of disease.

The need for phenotypes to go with genotypes

Although the prospects of a large number of genome sequences might seem daunting, the biggest stumbling block for a genotype–phenotype correlation is not likely to be the acquisition of the DNA sequence, but rather the phenotypic information. Indeed, the phenotyping of large numbers of individuals might well prove to be more expensive, complex and difficult to implement than the genomic sequencing. However, without common and accurate phenotypes the power of the genome sequences will be extremely limited. Deciding exactly what to phenotype is not trivial. Some types of information, such as height, body mass, blood pressure, and many aspects of medical history (infectious and other diseases, etc) are quite common and obvious. Furthermore, much of this information is already available (albeit not always consistently obtained or available in a useful manner) (Table I). Other types of information, such as anatomical features and skeletal information, could be digitized and converted into useful format for morphometrical analysis. Phenotypes that would be particularly powerful to analyze using large data sets are behavioral (e.g. anxiety, depression) and cognitive attributes (e.g. ‘intelligence tests'). Some of these data are likely to be controversial and raise issues regarding safeguarding the privacy of information. Nonetheless, the larger the collection of phenotypes, the more powerful the genetic information. In order to be useful, these phenotypes must be stored electronically and in a manner in which quantitative information can be obtained. In this respect, having all medical records and information stored in a digital format would be extremely valuable for sharing and analyzing data.

Table 1

Examples of data types to consider for collection

General	Anatomical, height, body mass Blood pressure Morphometric Medical History (disease conditions, medical treatment, medication, etc. asthma, infections, cancer, other diseases)
Behavioural & Cognitive	Anxiety, depression, hyperactivity, sleep Cognitive attributes (learning and memory, ‘intelligence')
Molecular^a	RNA expression Proteomics (mass spectrometry; antibody profiling) Metabolomics Microbiome metagenomics

aTypes of samples to analyze: saliva, plasma, serum, urine, breath, skin (stem cells), feces (microbiology).

Perhaps, even more important than the phenotypes that should be measured are the implementation of common methods and standards for their collection. Phenotypic data are only likely to be useful if the same types of information are obtained, and only if the samples and measurements are obtained using the same methodology. Many parameters such as medical and psychiatric histories and physical examinations are not always collected under comparable conditions or with similar rigor. Although it may be difficult to have a common method used in all cases, ideally a prioritized set of standards could be prepared, and minimally it will be essential to record the types of methods used for each sample.

Molecular omic phenotypes

One way to provide a larger quantity of phenotypic information and potentially in a more standardized format is to shift from measuring macroscopic properties to analyzing molecules. In addition, the quantities of molecules are expected to be responsible for the observable bodily phenotypes, and molecules can be more directly related to the genomic sequence and its variations. Traditionally, only a limited number of molecular markers are monitored, typically during blood tests. However, it is likely that large-scale and precise measurements of gene expression or protein abundance in specific types of cells are more consistent indicators of a given organism's phenotype. One can accurately quantify the RNA levels of all genes and/or exons using DNA microarrays or RNA sequencing (Wang ), and the levels of many thousands of proteins and their modifications can be followed using mass spectrometry (Aebersold and Mann, 2003) and ultimately might be quantified using affinity reagents. Hundreds of metabolites can also be monitored using mass spectrometry (e.g. see Sreekumar ). These components can easily be measured in blood samples, and proteins and metabolites can be measured in urine. Other samples such as saliva and breath could also be possibly measured. In the future, one could even consider the analysis of patient-derived stem cells and microbiome samples from oral and fecal samples. The analysis of microbiome using metagenomic sequencing could prove to be a useful indicator of both environment and phenotype. Although RNA and metabolites might be relatively straightforward, analysis of proteins in complex samples such as plasma, sera and urine may be particularly susceptible to how the samples are prepared, which can have a significant influence on the outcome. For example, a recent proteome analysis of human sera showed significant differences in outcome depending upon the buffers and inhibitors present in the collection samples (Omenn ). Nonetheless, robust analytical procedures need to be established to ensure that the results can be reproduced in different laboratories. Furthermore, even if comprehensive monitoring of molecular markers is difficult, accurately quantifying even a large subset is likely to be extremely valuable. The collection of molecular phenotypes is expected to be extremely valuable for helping us understand the basic mechanisms involved in human disease. For example, activation of signaling pathways can be readily deduced from RNA and protein expression and post-translational modification data. This in turn can be related to the genome sequence. In addition, molecular information will greatly facilitate medical diagnostics. Currently, diagnostic tests are carried out on small numbers of proteins whose functions are usually, although not always, known. One can readily envision a future in which simple blood or urine tests involving profiling of thousands of protein and/or metabolic components will be much more valuable for both early and accurate diagnostics. Control experiments will obviously have to be carried out to account for parameters such as diet and the time of day at which the samples are collected. Nonetheless, such an information is expected to be extremely useful in conjunction with genomic and epigenomic analyses.

Moving forward

Several large consortia have formed around global genome sequencing projects such as the 1000 Genomes Project and The Cancer Atlas Project. Although smaller advisory committees have discussed the collection of common phenotypes (see Church, 2005), a large consortium is needed to decide what common phenotypes and samples should be collected and how would they be of equal impact. Arguably, the best way to accomplish this is in conjunction with organization of the large genome sequencing projects. The effort involved in obtaining a standard set of phenotypes should be no less than that expended in developing a standard set of gene functions through the Gene Ontology consortium. There is no doubt that a large number of human genome sequences will be a valuable resource. However, it will only be valuable in the context of a large number of accurate phenotypes. With the first sequences now being determined, we need to aggressively develop guidelines for deciding what phenotypes should be collected and establish common standards for collecting those phenotypes.

7 in total

Review 1. Mass spectrometry-based proteomics.

Authors: Ruedi Aebersold; Matthias Mann
Journal: Nature Date: 2003-03-13 Impact factor: 49.962

2. Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database.

Authors: Gilbert S Omenn; David J States; Marcin Adamski; Thomas W Blackwell; Rajasree Menon; Henning Hermjakob; Rolf Apweiler; Brian B Haab; Richard J Simpson; James S Eddes; Eugene A Kapp; Robert L Moritz; Daniel W Chan; Alex J Rai; Arie Admon; Ruedi Aebersold; Jimmy Eng; William S Hancock; Stanley A Hefta; Helmut Meyer; Young-Ki Paik; Jong-Shin Yoo; Peipei Ping; Joel Pounds; Joshua Adkins; Xiaohong Qian; Rong Wang; Valerie Wasinger; Chi Yue Wu; Xiaohang Zhao; Rong Zeng; Alexander Archakov; Akira Tsugita; Ilan Beer; Akhilesh Pandey; Michael Pisano; Philip Andrews; Harald Tammen; David W Speicher; Samir M Hanash
Journal: Proteomics Date: 2005-08 Impact factor: 3.984

3. Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression.

Authors: Arun Sreekumar; Laila M Poisson; Thekkelnaycke M Rajendiran; Amjad P Khan; Qi Cao; Jindan Yu; Bharathi Laxman; Rohit Mehra; Robert J Lonigro; Yong Li; Mukesh K Nyati; Aarif Ahsan; Shanker Kalyana-Sundaram; Bo Han; Xuhong Cao; Jaeman Byun; Gilbert S Omenn; Debashis Ghosh; Subramaniam Pennathur; Danny C Alexander; Alvin Berger; Jeffrey R Shuster; John T Wei; Sooryanarayana Varambally; Christopher Beecher; Arul M Chinnaiyan
Journal: Nature Date: 2009-02-12 Impact factor: 49.962

Review 4. RNA-Seq: a revolutionary tool for transcriptomics.

Authors: Zhong Wang; Mark Gerstein; Michael Snyder
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

5. Comprehensive association study of type 2 diabetes and related quantitative traits with 222 candidate genes.

Authors: Kyle J Gaulton; Cristen J Willer; Yun Li; Laura J Scott; Karen N Conneely; Anne U Jackson; William L Duren; Peter S Chines; Narisu Narisu; Lori L Bonnycastle; Jingchun Luo; Maurine Tong; Andrew G Sprau; Elizabeth W Pugh; Kimberly F Doheny; Timo T Valle; Gonçalo R Abecasis; Jaakko Tuomilehto; Richard N Bergman; Francis S Collins; Michael Boehnke; Karen L Mohlke
Journal: Diabetes Date: 2008-08-04 Impact factor: 9.461

6. The personal genome project.

Authors: G M Church
Journal: Mol Syst Biol Date: 2005-12-13 Impact factor: 11.429

7. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation.

Authors: Cristen J Willer; Elizabeth K Speliotes; Ruth J F Loos; Shengxu Li; Cecilia M Lindgren; Iris M Heid; Sonja I Berndt; Amanda L Elliott; Anne U Jackson; Claudia Lamina; Guillaume Lettre; Noha Lim; Helen N Lyon; Steven A McCarroll; Konstantinos Papadakis; Lu Qi; Joshua C Randall; Rosa Maria Roccasecca; Serena Sanna; Paul Scheet; Michael N Weedon; Eleanor Wheeler; Jing Hua Zhao; Leonie C Jacobs; Inga Prokopenko; Nicole Soranzo; Toshiko Tanaka; Nicholas J Timpson; Peter Almgren; Amanda Bennett; Richard N Bergman; Sheila A Bingham; Lori L Bonnycastle; Morris Brown; Noël P Burtt; Peter Chines; Lachlan Coin; Francis S Collins; John M Connell; Cyrus Cooper; George Davey Smith; Elaine M Dennison; Parimal Deodhar; Paul Elliott; Michael R Erdos; Karol Estrada; David M Evans; Lauren Gianniny; Christian Gieger; Christopher J Gillson; Candace Guiducci; Rachel Hackett; David Hadley; Alistair S Hall; Aki S Havulinna; Johannes Hebebrand; Albert Hofman; Bo Isomaa; Kevin B Jacobs; Toby Johnson; Pekka Jousilahti; Zorica Jovanovic; Kay-Tee Khaw; Peter Kraft; Mikko Kuokkanen; Johanna Kuusisto; Jaana Laitinen; Edward G Lakatta; Jian'an Luan; Robert N Luben; Massimo Mangino; Wendy L McArdle; Thomas Meitinger; Antonella Mulas; Patricia B Munroe; Narisu Narisu; Andrew R Ness; Kate Northstone; Stephen O'Rahilly; Carolin Purmann; Matthew G Rees; Martin Ridderstråle; Susan M Ring; Fernando Rivadeneira; Aimo Ruokonen; Manjinder S Sandhu; Jouko Saramies; Laura J Scott; Angelo Scuteri; Kaisa Silander; Matthew A Sims; Kijoung Song; Jonathan Stephens; Suzanne Stevens; Heather M Stringham; Y C Loraine Tung; Timo T Valle; Cornelia M Van Duijn; Karani S Vimaleswaran; Peter Vollenweider; Gerard Waeber; Chris Wallace; Richard M Watanabe; Dawn M Waterworth; Nicholas Watkins; Jacqueline C M Witteman; Eleftheria Zeggini; Guangju Zhai; M Carola Zillikens; David Altshuler; Mark J Caulfield; Stephen J Chanock; I Sadaf Farooqi; Luigi Ferrucci; Jack M Guralnik; Andrew T Hattersley; Frank B Hu; Marjo-Riitta Jarvelin; Markku Laakso; Vincent Mooser; Ken K Ong; Willem H Ouwehand; Veikko Salomaa; Nilesh J Samani; Timothy D Spector; Tiinamaija Tuomi; Jaakko Tuomilehto; Manuela Uda; André G Uitterlinden; Nicholas J Wareham; Panagiotis Deloukas; Timothy M Frayling; Leif C Groop; Richard B Hayes; David J Hunter; Karen L Mohlke; Leena Peltonen; David Schlessinger; David P Strachan; H-Erich Wichmann; Mark I McCarthy; Michael Boehnke; Inês Barroso; Gonçalo R Abecasis; Joel N Hirschhorn
Journal: Nat Genet Date: 2008-12-14 Impact factor: 38.330

7 in total

23 in total

1. New technology-based innovation changes surgical practice and research direction in solid cancers.

Authors: Christos Katsios; Georgios Baltogiannis; Dimitrios H Roukos
Journal: Surg Endosc Date: 2010-11 Impact factor: 4.584

2. Hope from Japan for esophagogastric cancers: esophagectomy and endoscopic submucosal dissection for gastric tube cancer.

Authors: Christos Katsios; Dimitrios H Roukos; Georgios Baltogiannis
Journal: Surg Endosc Date: 2010-11 Impact factor: 4.584

3. Personal genome sequencing: current approaches and challenges.

Authors: Michael Snyder; Jiang Du; Mark Gerstein
Journal: Genes Dev Date: 2010-03-01 Impact factor: 11.361

4. Using molecular classification to predict gains in maximal aerobic capacity following endurance exercise training in humans.

Authors: James A Timmons; Steen Knudsen; Tuomo Rankinen; Lauren G Koch; Mark Sarzynski; Thomas Jensen; Pernille Keller; Camilla Scheele; Niels B J Vollaard; Søren Nielsen; Thorbjörn Akerström; Ormond A MacDougald; Eva Jansson; Paul L Greenhaff; Mark A Tarnopolsky; Luc J C van Loon; Bente K Pedersen; Carl Johan Sundberg; Claes Wahlestedt; Steven L Britton; Claude Bouchard
Journal: J Appl Physiol (1985) Date: 2010-02-04

5. Precision medicine and molecular imaging: new targeted approaches toward cancer therapeutic and diagnosis.

Authors: Mojtaba Ghasemi; Iraj Nabipour; Abdolmajid Omrani; Zeinab Alipour; Majid Assadi
Journal: Am J Nucl Med Mol Imaging Date: 2016-11-30

Review 6. Biomechanisms of Comorbidity: Reviewing Integrative Analyses of Multi-omics Datasets and Electronic Health Records.

Authors: N Pouladi; I Achour; H Li; J Berghout; C Kenost; M L Gonzalez-Garay; Y A Lussier
Journal: Yearb Med Inform Date: 2016-11-10