Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

Literature DB >> 33270901

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.

Wenjun Li¹, Kathleen R O'Neill¹, Daniel H Haft¹, Michael DiCuccio¹, Vyacheslav Chetvernin¹, Azat Badretdin¹, George Coulouris¹, Farideh Chitsaz¹, Myra K Derbyshire¹, A Scott Durkin¹, Noreen R Gonzales¹, Marc Gwadz¹, Christopher J Lanczycki¹, James S Song¹, Narmada Thanki¹, Jiyao Wang¹, Roxanne A Yamashita¹, Mingzhang Yang¹, Chanjuan Zheng¹, Aron Marchler-Bauer¹, Françoise Thibaud-Nissen¹.

Abstract

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. Published by Oxford University Press on behalf of Nucleic Acids Research 2020.

Year: 2020 PMID： 33270901 DOI： 10.1093/nar/gkaa1105

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

Keyword Cloud
Cited

97 in total

1. EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure.

Authors: Marius Alfred Dieckmann; Sebastian Beyvers; Rudel Christian Nkouamedjo-Fankep; Patrick Harald Georg Hanel; Lukas Jelonek; Jochen Blom; Alexander Goesmann
Journal: Nucleic Acids Res Date: 2021-07-02 Impact factor: 16.971

Review 2. Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes.

Authors: Erwin Tantoso; Birgit Eisenhaber; Frank Eisenhaber
Journal: Methods Mol Biol Date: 2022

3. ViroidDB: a database of viroids and viroid-like circular RNAs.

Authors: Benjamin D Lee; Uri Neri; Caleb J Oh; Peter Simmonds; Eugene V Koonin
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

4. Comparative Genomics Reveal the High Conservation and Scarce Distribution of Nitrogen Fixation nif Genes in the Plant-Associated Genus Herbaspirillum.

Authors: Ana Marina Pedrolo; Filipe Pereira Matteoli; Cláudio Roberto Fônseca Sousa Soares; Ana Carolina Maisonnave Arisi
Journal: Microb Ecol Date: 2022-08-06 Impact factor: 4.192

5. Corallococcus soli sp. Nov., a Soil Myxobacterium Isolated from Subtropical Climate, Chalus County, Iran, and Its Potential to Produce Secondary Metabolites.

Authors: Zahra Khosravi Babadi; Ronald Garcia; Gholam Hossein Ebrahimipour; Chandra Risdian; Peter Kämpfer; Michael Jarek; Rolf Müller; Joachim Wink
Journal: Microorganisms Date: 2022-06-21

6. The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource.

Authors: Klaas J van Wijk; Tami Leppert; Qi Sun; Sascha S Boguraev; Zhi Sun; Luis Mendoza; Eric W Deutsch
Journal: Plant Cell Date: 2021-11-04 Impact factor: 12.085

7. Genome Sequences of Neurotropic Lineage III Listeria monocytogenes Isolates UKVDL9 and 2010L-2198.

Authors: Taylor M Albrecht; Zuzana Kucerova; Sarah E F D'Orazio
Journal: Microbiol Resour Announc Date: 2021-05-06

8. Draft Genome Sequences of Four Bacterial Strains of Heterotrophic Alteromonas macleodii and Marinobacter, Isolated from a Nonaxenic Culture of Two Marine Synechococcus Strains.

Authors: Patricia Arias-Orozco; Yunhai Yi; Oscar P Kuipers
Journal: Microbiol Resour Announc Date: 2021-05-13

9. Complete Genome Sequence of Bacillus sp. Strain V3, Isolated from Mangrove Sediments in Wenzhou, China.

Authors: Ruihua Yan; Shija Vicent Michael; Guangya Zhang; Runying Zeng; Xialan Li
Journal: Microbiol Resour Announc Date: 2021-05-06

10. Analysis of the Taxonomy and Pathogenic Factors of Pectobacterium aroidearum L6 Using Whole-Genome Sequencing and Comparative Genomics.

Authors: Peidong Xu; Huanwei Wang; Chunxiu Qin; Zengping Li; Chunhua Lin; Wenbo Liu; Weiguo Miao
Journal: Front Microbiol Date: 2021-07-02 Impact factor: 5.640