| Literature DB >> 32652120 |
Qingyu Chen1, Ramona Britto2, Ivan Erill3, Constance J Jeffery4, Arthur Liberzon5, Michele Magrane2, Jun-Ichi Onami6, Marc Robinson-Rechavi7, Jana Sponarova8, Justin Zobel9, Karin Verspoor10.
Abstract
Entities:
Year: 2020 PMID: 32652120 PMCID: PMC7646089 DOI: 10.1016/j.gpb.2018.11.006
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1Biological analysis pipeline
Three stages of a biological analysis pipeline, heavily involving biological databases, are presented. Pre-DB: the data collection and submission stage, where entity duplicates often matter. Within-DB: the data curation and visualization stage, where near-identical duplicates often matter. Post-DB: the data downloading and usage stage, where the definition of duplicates is use case dependent. DB: database.
Figure 2Characteristics of duplicate records
A. Duplicate types and number of participants who selected different duplicate types. B. Distribution of participants according to the number of duplicate types they selected. There are 21 participants in total.
Figure 3Impacts of duplicate records
A. The number of participants who believed duplication has impacts or not. B. A more detailed breakdown by type of impact, for those who believed duplication has impacts.
Figure 4Solutions to duplicate records
The X-axis represents the options to address duplication; the Y-axis represents the corresponding number of participants selecting that option.
Representative software and resources used in expertcuration
| Identify homologs | BLAST | Sequence alignment | ||
| Document inconsistencies | Ensembl | Phylogenetic resources | ||
| T-Coffee | Sequence difference ( | |||
| MUSCLE | ||||
| ClustalW | ||||
| Predict topology | SignalP | Signal peptide prediction | ||
| TMHMM | Transmembrane domain prediction | |||
| Predict PTMs | NetNGlyc | |||
| Sulfinator | Tyrosine sulfation site prediction | |||
| Identify domains | InterPro | Retrieval of motif matches | ||
| REPEAT (REP tool) | Identification of repeats | |||
| Identify relevant literature | PubMed | Literature resources | ||
| iHOP | ||||
| Extract named entities | PubAnnotation | Information extraction | ||
| PubTator | ||||
| Assign GOs | GO | Gene ontology terms | ||
| BLAST | Sequence alignment | |||
| ECO | Evidence code ontology | |||
Note: A complete set of the software, including the detailed versions of the software, can be found in UniProt manual curation standard operating procedure documentation (www.uniprot.org/docs/sop_manual_curation.pdf). PTM, post-translational modification.