Literature DB >> 35870891

Revised eutherian gene collections.

Marko Premzl1,2.   

Abstract

OBJECTIVES: The most recent research projects in scientific field of eutherian comparative genomics included intentions to sequence every extant eutherian species genome in foreseeable future, so that future revisions and updates of eutherian gene data sets were expected. DATA DESCRIPTION: Using 35 public eutherian reference genomic sequence assemblies and free available software, the eutherian comparative genomic analysis protocol RRID:SCR_014401 was published as guidance against potential genomic sequence errors. The protocol curated 14 eutherian third-party data gene data sets, including, in aggregate, 2615 complete coding sequences that were deposited in European Nucleotide Archive. The published eutherian gene collections were used in revisions and updates of eutherian gene data set classifications and nomenclatures that included gene annotations, phylogenetic analyses and protein molecular evolution analyses.
© 2022. The Author(s).

Entities:  

Keywords:  Comparative genomics; Eutheria; Gene data set; RRID:SCR_014401

Mesh:

Year:  2022        PMID: 35870891      PMCID: PMC9308196          DOI: 10.1186/s12863-022-01071-9

Source DB:  PubMed          Journal:  BMC Genom Data        ISSN: 2730-6844


Objective

The most recent research projects in scientific field of eutherian comparative genomics included intentions to sequence every extant eutherian species genome in foreseeable future, so that future revisions and updates of eutherian gene data sets were expected [1-13]. For example, the human protein coding gene census remained unfinished: contemporary estimates included about 20,000–21,000 protein coding genes in human genome [14-27]. In addition, the proven utility of public eutherian reference genomic sequences could become compromised by potential genomic sequence errors, including analytical and bioinformatical errors, as well as Sanger DNA sequencing method errors [28-33].

Data description

Using public eutherian reference genomic sequence assemblies and free available software, the eutherian comparative genomic analysis protocol was published as guidance against potential genomic sequence errors [34-49]. The protocol included 3 major processing steps that were integrated into one framework of eutherian gene data set descriptions: gene annotations, phylogenetic analysis and protein molecular evolution analysis. The protocol published 3 original genomics and protein molecular evolution tests. First, the test of reliability of public eutherian genomic sequences used genomic sequence redundancies of public eutherian reference genomic sequence assemblies. Second, the test of contiguity of public eutherian genomic sequences used multiple pairwise genomic sequence alignments. Third, the test of protein molecular evolution used relative synonymous codon usage statistics. The protocol was made available on Protocol Exchange [44]. In aggregate, the eutherian comparative genomic analysis protocol curated 14 eutherian gene data sets implicated in major physiological and pathological processes, including 2615 published complete coding sequences that were made available in public biological databases as third-party data gene data sets [50-63] (Table 1). The curated gene data sets were deposited in European Nucleotide Archive [7–9, 12, 13] in FASTA nucleotide sequence format. The published eutherian gene collections were used in revisions and updates of eutherian gene data set classifications and nomenclatures.
Table 1

Overview of eutherian third-party data gene data sets

LabelName of data file/data setFile types (file extension)Data repository and identifier (DOI or accession number)
Data set 1Interferon-γ-inducible GTPase genes (FR734011-FR734074)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:FR734011) [50]
Data set 2Adenohypophysis cystine-knot genes (HF564658-HF564785)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:HF564658) [51]
Data set 3Macrophage migration inhibitory factor genes (HF564786-HF564815)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:HF564786) [52]
Data set 4Ribonuclease A genes (HG328835-HG329089)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:HG328835) [53]
Data set 5Mas-related G protein-coupled receptor genes (HG426065-HG426183)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:HG426065) [54]
Data set 6Lysozyme genes (HG931734-HG931849)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:HG931734) [55]
Data set 7Growth hormone genes (LM644135-LM644234)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:LM644135) [56]
Data set 8Tumor necrosis factor ligand genes (LN874312-LN874522)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:LN874312) [57]
Data set 9Globin genes (LT548096-LT548244)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:LT548096) [58]
Data set 10Kallikrein genes (LT631550-LT631670)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:LT631550) [59]
Data set 11Adiponectin genes (LT962964-LT963174)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:LT962964) [60]
Data set 12Connexin genes (LT990249-LT990597)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:LT990249) [61]
Data set 13Fibroblast growth factor genes (LR130242-LR130508)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:LR130242) [62]
Data set 14Interferon genes (LR760818-LR761312)FASTA (.fas)European Nucleotide Archive (https://identifiers.org/ena.embl:LR760818) [63]
Overview of eutherian third-party data gene data sets

Limitations

The revisions and updates of eutherian gene data sets were contingent on primary Sanger DNA sequencing information deposited in National Center for Biotechnology Information NCBI Trace Archive [12, 13, 46, 64–66]. For example, the positive correlation was calculated between genomic sequence redundancies of 35 public eutherian reference genomic sequence assemblies respectively and curated complete coding sequence numbers.
  49 in total

1.  Comparative genomic analysis of eutherian ribonuclease A genes.

Authors:  Marko Premzl
Journal:  Mol Genet Genomics       Date:  2013-12-15       Impact factor: 3.291

2.  Database resources of the National Center for Biotechnology Information.

Authors:  Eric W Sayers; Jeffrey Beck; Evan E Bolton; Devon Bourexis; James R Brister; Kathi Canese; Donald C Comeau; Kathryn Funk; Sunghwan Kim; William Klimke; Aron Marchler-Bauer; Melissa Landrum; Stacy Lathrop; Zhiyong Lu; Thomas L Madden; Nuala O'Leary; Lon Phan; Sanjida H Rangwala; Valerie A Schneider; Yuri Skripchenko; Jiyao Wang; Jian Ye; Barton W Trawick; Kim D Pruitt; Stephen T Sherry
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

3.  Phylogenomics and the Genetic Architecture of the Placental Mammal Radiation.

Authors:  William J Murphy; Nicole M Foley; Kevin R Bredemeyer; John Gatesy; Mark S Springer
Journal:  Annu Rev Anim Biosci       Date:  2020-11-23       Impact factor: 8.923

4.  Earth BioGenome Project: Sequencing life for the future of life.

Authors:  Harris A Lewin; Gene E Robinson; W John Kress; William J Baker; Jonathan Coddington; Keith A Crandall; Richard Durbin; Scott V Edwards; Félix Forest; M Thomas P Gilbert; Melissa M Goldstein; Igor V Grigoriev; Kevin J Hackett; David Haussler; Erich D Jarvis; Warren E Johnson; Aristides Patrinos; Stephen Richards; Juan Carlos Castilla-Rubio; Marie-Anne van Sluys; Pamela S Soltis; Xun Xu; Huanming Yang; Guojie Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2018-04-24       Impact factor: 11.205

5.  An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing.

Authors:  Elliott H Margulies; Jade P Vinson; Webb Miller; David B Jaffe; Kerstin Lindblad-Toh; Jean L Chang; Eric D Green; Eric S Lander; James C Mullikin; Michele Clamp
Journal:  Proc Natl Acad Sci U S A       Date:  2005-03-18       Impact factor: 11.205

6.  The completion of the Mammalian Gene Collection (MGC).

Authors:  Gary Temple; Daniela S Gerhard; Rebekah Rasooly; Elise A Feingold; Peter J Good; Cristen Robinson; Allison Mandich; Jeffrey G Derge; Jeanne Lewis; Debonny Shoaf; Francis S Collins; Wonhee Jang; Lukas Wagner; Carolyn M Shenmen; Leonie Misquitta; Carl F Schaefer; Kenneth H Buetow; Tom I Bonner; Linda Yankie; Ming Ward; Lon Phan; Alex Astashyn; Garth Brown; Catherine Farrell; Jennifer Hart; Melissa Landrum; Bonnie L Maidak; Michael Murphy; Terence Murphy; Bhanu Rajput; Lillian Riddick; David Webb; Janet Weber; Wendy Wu; Kim D Pruitt; Donna Maglott; Adam Siepel; Brona Brejova; Mark Diekhans; Rachel Harte; Robert Baertsch; Jim Kent; David Haussler; Michael Brent; Laura Langton; Charles L G Comstock; Michael Stevens; Chaochun Wei; Marijke J van Baren; Kourosh Salehi-Ashtiani; Ryan R Murray; Lila Ghamsari; Elizabeth Mello; Chenwei Lin; Christa Pennacchio; Kirsten Schreiber; Nicole Shapiro; Amber Marsh; Elizabeth Pardes; Troy Moore; Anita Lebeau; Mike Muratet; Blake Simmons; David Kloske; Stephanie Sieja; James Hudson; Praveen Sethupathy; Michael Brownstein; Narayan Bhat; Joseph Lazar; Howard Jacob; Chris E Gruber; Mark R Smith; John McPherson; Angela M Garcia; Preethi H Gunaratne; Jiaqian Wu; Donna Muzny; Richard A Gibbs; Alice C Young; Gerard G Bouffard; Robert W Blakesley; Jim Mullikin; Eric D Green; Mark C Dickson; Alex C Rodriguez; Jane Grimwood; Jeremy Schmutz; Richard M Myers; Martin Hirst; Thomas Zeng; Kane Tse; Michelle Moksa; Merinda Deng; Kevin Ma; Diana Mah; Johnson Pang; Greg Taylor; Eric Chuah; Athena Deng; Keith Fichter; Anne Go; Stephanie Lee; Jing Wang; Malachi Griffith; Ryan Morin; Richard A Moore; Michael Mayo; Sarah Munro; Susan Wagner; Steven J M Jones; Robert A Holt; Marco A Marra; Sun Lu; Shuwei Yang; James Hartigan; Marcus Graf; Ralf Wagner; Stanley Letovksy; Jacqueline C Pulido; Keith Robison; Dominic Esposito; James Hartley; Vanessa E Wall; Ralph F Hopkins; Osamu Ohara; Stefan Wiemann
Journal:  Genome Res       Date:  2009-09-18       Impact factor: 9.043

7.  A high-resolution map of human evolutionary constraint using 29 mammals.

Authors:  Kerstin Lindblad-Toh; Manuel Garber; Or Zuk; Michael F Lin; Brian J Parker; Stefan Washietl; Pouya Kheradpour; Jason Ernst; Gregory Jordan; Evan Mauceli; Lucas D Ward; Craig B Lowe; Alisha K Holloway; Michele Clamp; Sante Gnerre; Jessica Alföldi; Kathryn Beal; Jean Chang; Hiram Clawson; James Cuff; Federica Di Palma; Stephen Fitzgerald; Paul Flicek; Mitchell Guttman; Melissa J Hubisz; David B Jaffe; Irwin Jungreis; W James Kent; Dennis Kostka; Marcia Lara; Andre L Martins; Tim Massingham; Ida Moltke; Brian J Raney; Matthew D Rasmussen; Jim Robinson; Alexander Stark; Albert J Vilella; Jiayu Wen; Xiaohui Xie; Michael C Zody; Jen Baldwin; Toby Bloom; Chee Whye Chin; Dave Heiman; Robert Nicol; Chad Nusbaum; Sarah Young; Jane Wilkinson; Kim C Worley; Christie L Kovar; Donna M Muzny; Richard A Gibbs; Andrew Cree; Huyen H Dihn; Gerald Fowler; Shalili Jhangiani; Vandita Joshi; Sandra Lee; Lora R Lewis; Lynne V Nazareth; Geoffrey Okwuonu; Jireh Santibanez; Wesley C Warren; Elaine R Mardis; George M Weinstock; Richard K Wilson; Kim Delehaunty; David Dooling; Catrina Fronik; Lucinda Fulton; Bob Fulton; Tina Graves; Patrick Minx; Erica Sodergren; Ewan Birney; Elliott H Margulies; Javier Herrero; Eric D Green; David Haussler; Adam Siepel; Nick Goldman; Katherine S Pollard; Jakob S Pedersen; Eric S Lander; Manolis Kellis
Journal:  Nature       Date:  2011-10-12       Impact factor: 49.962

8.  Third party annotation gene data set of eutherian lysozyme genes.

Authors:  Marko Premzl
Journal:  Genom Data       Date:  2014-08-20

9.  UniProt: the universal protein knowledgebase in 2021.

Authors: 
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

10.  Incomplete annotation has a disproportionate impact on our understanding of Mendelian and complex neurogenetic disorders.

Authors:  David Zhang; Sebastian Guelfi; Sonia Garcia-Ruiz; Beatrice Costa; Regina H Reynolds; Karishma D'Sa; Wenfei Liu; Thomas Courtin; Amy Peterson; Andrew E Jaffe; John Hardy; Juan A Botía; Leonardo Collado-Torres; Mina Ryten
Journal:  Sci Adv       Date:  2020-06-10       Impact factor: 14.136

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.