Literature DB >> 33270901

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

Wenjun Li1, Kathleen R O'Neill1, Daniel H Haft1, Michael DiCuccio1, Vyacheslav Chetvernin1, Azat Badretdin1, George Coulouris1, Farideh Chitsaz1, Myra K Derbyshire1, A Scott Durkin1, Noreen R Gonzales1, Marc Gwadz1, Christopher J Lanczycki1, James S Song1, Narmada Thanki1, Jiyao Wang1, Roxanne A Yamashita1, Mingzhang Yang1, Chanjuan Zheng1, Aron Marchler-Bauer1, Françoise Thibaud-Nissen1.   

Abstract

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. Published by Oxford University Press on behalf of Nucleic Acids Research 2020.

Year:  2020        PMID: 33270901     DOI: 10.1093/nar/gkaa1105

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  97 in total

1.  EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure.

Authors:  Marius Alfred Dieckmann; Sebastian Beyvers; Rudel Christian Nkouamedjo-Fankep; Patrick Harald Georg Hanel; Lukas Jelonek; Jochen Blom; Alexander Goesmann
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

Review 2.  Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes.

Authors:  Erwin Tantoso; Birgit Eisenhaber; Frank Eisenhaber
Journal:  Methods Mol Biol       Date:  2022

3.  ViroidDB: a database of viroids and viroid-like circular RNAs.

Authors:  Benjamin D Lee; Uri Neri; Caleb J Oh; Peter Simmonds; Eugene V Koonin
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

4.  Comparative Genomics Reveal the High Conservation and Scarce Distribution of Nitrogen Fixation nif Genes in the Plant-Associated Genus Herbaspirillum.

Authors:  Ana Marina Pedrolo; Filipe Pereira Matteoli; Cláudio Roberto Fônseca Sousa Soares; Ana Carolina Maisonnave Arisi
Journal:  Microb Ecol       Date:  2022-08-06       Impact factor: 4.192

5.  Corallococcus soli sp. Nov., a Soil Myxobacterium Isolated from Subtropical Climate, Chalus County, Iran, and Its Potential to Produce Secondary Metabolites.

Authors:  Zahra Khosravi Babadi; Ronald Garcia; Gholam Hossein Ebrahimipour; Chandra Risdian; Peter Kämpfer; Michael Jarek; Rolf Müller; Joachim Wink
Journal:  Microorganisms       Date:  2022-06-21

6.  The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource.

Authors:  Klaas J van Wijk; Tami Leppert; Qi Sun; Sascha S Boguraev; Zhi Sun; Luis Mendoza; Eric W Deutsch
Journal:  Plant Cell       Date:  2021-11-04       Impact factor: 12.085

7.  Genome Sequences of Neurotropic Lineage III Listeria monocytogenes Isolates UKVDL9 and 2010L-2198.

Authors:  Taylor M Albrecht; Zuzana Kucerova; Sarah E F D'Orazio
Journal:  Microbiol Resour Announc       Date:  2021-05-06

8.  Draft Genome Sequences of Four Bacterial Strains of Heterotrophic Alteromonas macleodii and Marinobacter, Isolated from a Nonaxenic Culture of Two Marine Synechococcus Strains.

Authors:  Patricia Arias-Orozco; Yunhai Yi; Oscar P Kuipers
Journal:  Microbiol Resour Announc       Date:  2021-05-13

9.  Complete Genome Sequence of Bacillus sp. Strain V3, Isolated from Mangrove Sediments in Wenzhou, China.

Authors:  Ruihua Yan; Shija Vicent Michael; Guangya Zhang; Runying Zeng; Xialan Li
Journal:  Microbiol Resour Announc       Date:  2021-05-06

10.  Analysis of the Taxonomy and Pathogenic Factors of Pectobacterium aroidearum L6 Using Whole-Genome Sequencing and Comparative Genomics.

Authors:  Peidong Xu; Huanwei Wang; Chunxiu Qin; Zengping Li; Chunhua Lin; Wenbo Liu; Weiguo Miao
Journal:  Front Microbiol       Date:  2021-07-02       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.