Substantial efforts are being made at cataloging the genetic variation in various populations, because a better knowledge of the normal genetic variation, which is often population-specific, will become essential to interpret genetic variation in patients with specific diseases. The Dutch government, for example, has funded the “Genome of the Netherland (GoNL)” (http://www.nlgenome.nl/, Francioli et al., 2014) initiative where whole genomes from more than 750 subjects, mainly trios of two parents and one adult child, were sequenced. A similar endeavor is starting in Belgium, and in other European countries, under the aegis of the European Biobanking and Biomolecular Resources Research Infrastructure (BBMRI) (http://bbmri-eric.eu/).Scientists in the West and Far-East realized early on, before the emergence of next generation sequencing, that classifying even a single variant in a specific gene presents formidable challenges. What does a “p.M167R” variant in a particular gene mean for the patient if we cannot compare it to other patients affected by the same disease or unaffected controls? To meet this need, scientists in developed countries have established, and are still establishing, disease/locus-specific databases that contain information detailing variations in genes predisposing to specific diseases. Luckily, most have open access policies for registered academics, so that one can easily compare the variants identified in their patients to those populating the databases and make a clinical decision based on the findings (Fokkema et al., 2011).Currently, single gene-based variations are classified into categories like pathogenic, non-pathogenic, likely pathogenic, likely non-pathogenic or pathogenicity uncertain, augmented by additional genetic information from other family members, other research findings of the variant independently in patients with similar presentation, and results of functional experiments to directly document causality (Thomson et al.). Difficulties in accurately interpreting genomic variation data that complicate classification were recently highlighted in a review published in this journal (http://www.sciencedirect.com/science/article/pii/S2212066114000155). The fact that most disease-specific databases are population specific renders them less useful for a researcher based in the Middle East. Moreover, clinical knowledge is constrained by the need to surf large numbers of separate LSDBs for a similar variant deposited somewhere. This is especially true for many variants in a single gene. In 2004, the Center for Arab Genomic Studies, which is based in the United Arab Emirates (http://www.cags.org.ae/index.html), initiated a project that attempts to construct a catalog of genetic mutations in Arabs. The effort, listed 1580 Mendelian disorders and related genes mined from nearly 1140 published articles from the Arab region (Tadmouri et al., 2006). This literature based effort, although commendable, suffers from significant limitations exemplified by its dependence on published and non-curated entries as well as our inability to populate it with rich unpublished data. For researchers and clinicians alike this adds further difficulties because they must search yet another database to identify the clinical significance of a variant in question.To alleviate this we recommend a universal gene- or disease-based LSDB, which contains gene-variation data from around the world. A formidable example of such an effort is highlighted by the InSIGHT-group database (InSIGHT.org), which contains collected, curated and classified variations in 11 genes associated with hereditary colorectal cancer (HNPCC). There is no such database for Arab patients with HNPCC and we, therefore, opted not to build a new LSDB for hereditary polyposis coli (HNPCC) in Arab subjects and instead submitted our Kuwaiti HNPCC data to the InSIGHT database (http://insight-group.org/). If other researchers contribute to the InSIGHT, then a truly international database will have been established, which can benefit all researchers wherever they are situated around the world.Despite the strength of LSDBs, classification challenges will likely be magnified when whole genomes (3-billion nucleotides) or exomes (all 25,000 gene or so) are examined. In light of the absence of an Arab population wide database to be populated, decisions about where we deposit our growing amount of variant data are important not only for optimizing research and clinical utility but for ensuring open access and data sharing.Admittedly, Arab scientists began examining whole genome sequences using next generation sequencing (NGS) quite late. However, we realized early on the need to establish an Arab population specific NGS database, a move similar to the 1000 human genome project, to at least construct a platform through which additional raw sequences can be aligned and compared. Our goal faces major challenges, particularly as a less resourced country. Although Kuwait and the Gulf states are oil-rich countries, in reality only as much as 0.1–2.5% of Gross National Product (GDP) is allocated for Research and Development (Giles, 2006). Funding is therefore, a major challenge to advancing translational genomic research. Open databases are therefore crucial to our research progress. Thus, for us the impact of decisions to open or close genome databases is great.In Kuwait, we have sequenced a limited number of whole genomes and exomes from which we estimate at least 600–700 novel genetic variations have been identified and classified as damaging (they are expected to alter the protein structure and function) in each apparently normal individual. Fig. 1 shows a summary of genetic variants obtained independently from Arab subjects using exome sequencing at a depth of 70×. Our ambition to build on these efforts and establish an Arab population database is constrained, however, by the limited understanding of the importance of genetics research and the unjustified fear from policymakers, citizens and even clinicians in Arab countries about disseminating genetic sequences, which hampers our ability to gain enough funding (http://www.the-scientist.com/?articles.view/articleNo/30725/title/Another-Revolution-Needed-/, http://blogs.nature.com/tradesecrets/author/Fahd-Al-Mulla).
Fig. 1
Exome sequencing summary obtained from 8 Arab control subjects including a trio family (cases 6 –8).
Luckily, Qatar National Research Fund funded part of our venture. However, independently, they are pursuing the whole genome sequencing of 300,000 Qatari nationals (http://www.gulf-times.com/qatar/178/details/374345/qatar-launches-genome-project). The Saudis have also initiated the sequencing of about 100,000 Arabs from around the Kingdom (http://rc.kfshrc.edu.sa/sgp/). It appears likely that data generated from both ambitious projects will be unconnected because science in a number of Arab countries is not viewed as a necessity but rather is politically driven with a focus on national merit. Genome sequencing, thus, is viewed only as a national endeavor rather than a humanitarian triumph or necessity. It is vital that policymakers understand that regional genome sequences have little translational influences if not compared and contrasted to other populations worldwide.We coined the term “Locked genomes” to describe unshared NGS data. In January 2014, under the umbrella of the National Institutes of Health (NIH), key global leaders in genomic medicine convened to discuss ways of accelerating the implementation of genomics in medical practice worldwide. In the meeting, we realized that most international participants, as expected, use NGS to obtain limited variant data to aid in specific diagnosis and the rest of the potentially valuable data is discarded or stored locally. In other words, the genome sequences are locked. This is unfortunate given that the most important lesson we have learned from the human genome project and LSDBs is that sequence variants need to be compared to be clinically useful.Arguably, the process of accumulating, sharing and comparing genomic sequences is the best, if not the only, way at our disposal to facilitate our understanding of genotype–phenotype relationship. Locking genomes, as we see it, contradicts the goal of genomic research, namely that the benefits be shared globally. The Exome Variant Server project (NHLBI Exome Sequencing Project), a pioneer effort to discover novel genes and mechanisms contributing to heart, lung and blood disorders, which is a collection of exome sequences and some of the largest well phenotyped population databases in the US, highlights this point very well because it brings to our disposal allele frequencies of variants from the Caucasian and African American populations. This information is important for clinicians to know at a glance whether a variant could have pathogenic potential (rare) or is a neutral polymorphism (common).Of course it will be ideal to have a similar but more elaborate database dedicated for international researchers involved in NGS in which to deposit their sequencing data. Such an effort should contain essential but not necessarily exhaustive phenotype data and it will ensure that the benefits of genomic research are not only available to the wealthy, provided ethical guidelines are duly applied. This possible roadmap may alleviate the clinicians' suffering and unlock the genomes' wealth beyond our wildest dreams. We are at an important cusp. Genomes must be open, particularly for researchers in the less resourced countries, if research is to progress and benefits actualized. Future policy initiatives at the global and national level must support this goal.
Authors: Ivo F A C Fokkema; Peter E M Taschner; Gerard C P Schaafsma; J Celli; Jeroen F J Laros; Johan T den Dunnen Journal: Hum Mutat Date: 2011-02-22 Impact factor: 4.878
Authors: Bryony A Thompson; Amanda B Spurdle; John-Paul Plazzer; Marc S Greenblatt; Kiwamu Akagi; Fahd Al-Mulla; Bharati Bapat; Inge Bernstein; Gabriel Capellá; Johan T den Dunnen; Desiree du Sart; Aurelie Fabre; Michael P Farrell; Susan M Farrington; Ian M Frayling; Thierry Frebourg; David E Goldgar; Christopher D Heinen; Elke Holinski-Feder; Maija Kohonen-Corish; Kristina Lagerstedt Robinson; Suet Yi Leung; Alexandra Martins; Pal Moller; Monika Morak; Minna Nystrom; Paivi Peltomaki; Marta Pineda; Ming Qi; Rajkumar Ramesar; Lene Juel Rasmussen; Brigitte Royer-Pokora; Rodney J Scott; Rolf Sijmons; Sean V Tavtigian; Carli M Tops; Thomas Weber; Juul Wijnen; Michael O Woods; Finlay Macrae; Maurizio Genuardi Journal: Nat Genet Date: 2013-12-22 Impact factor: 38.330