| Literature DB >> 30649295 |
Sushma Naithani1, Parul Gupta1, Justin Preece1, Priyanka Garg1, Valerie Fraser1,2, Lillian K Padgitt-Cobb3, Matthew Martin1, Kelly Vining4, Pankaj Jaiswal1.
Abstract
Biocuration plays a crucial role in building databases and complex systems-level platforms required for processing, annotating and analyzing 'Big Data' in biology. However, biocuration efforts cannot keep pace with a dramatic increase in the production of omics data; this presents one of the bottlenecks in genomics. In two pathway curation jamborees, Plant Reactome curators tested strategies for introducing researchers to pathway curation tools, harnessing biologists' expertise in curating plant pathways and developing a network of community biocurators. We summarize the strategy, workflow and outcomes of these exercises, and discuss the role of community biocuration in advancing databases and genomic resources.Entities:
Mesh:
Year: 2019 PMID: 30649295 PMCID: PMC6334007 DOI: 10.1093/database/bay146
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1A general summary of the genomics pipeline depicting ‘Big Data’ generation, handling, analysis and its potential translation. The development of cyber-infrastructure and analysis tools is ongoing for storage, analysis and visualization of the raw and processed data simultaneously to make it available for integration across various public platforms using the FAIR (findable, accessible, interoperable and reusable) principle. A genomic knowledgebase such as Plant Reactome represents the downstream end of this pipeline, which can directly support the generation of a data-driven hypothesis for translation of biological knowledge.
Strategy, workflow and outcomes of the 2018 pathway jamboree
|
| |||
|---|---|---|---|
|
|
|
|
|
| Literature review (3–4 hr per article) | Research articles on five pathways: Arsenic transport, HSFA7 gene regulatory network involved in drought response, heat response, cold response and endosperm development were selected to identify (i) genes associated with a pathway or biological process and (ii) gene–gene or gene–protein interaction. | From 7 research articles, a list of 200 genes was extracted; 6 gene–gene interactions involved in heat stress were identified; and a network of HSFA7 transcription factor consisting of 35 genes was deduced. | Graduate students found the critical review of literature useful in their own research. |
| Gene IDs conversion (2–3 min per gene) | Participants were instructed to convert old gene IDs (e.g. MSU gene ID) to the standard RAP gene IDs using this tool: | 196 gene IDs were converted. 4 MSU genes do not have corresponding genes in RAP. | Provided insights into the value of consistency in gene nomenclature and synthesis of knowledge. |
| Transcript IDs (2 min per transcript) | Ensembl transcript IDs were used for depicting transcription events. | 36 transcript IDs were extracted. | Showed how to get transcript Ids. |
| UniProt IDs (5 min per gene) | To acquire a protein IDs and annotation for a specific gene product from UniProt ( | 189 UniProt IDs were mapped to RAP genes. | Learned that genomic resources are not perfect and are a work in progress. |
| Subcellular location within a plant cell (5–15 min per protein) | The default location of genes and transcripts is nucleus. To assign subcellular location to a protein, experimental evidence from the literature or prediction from the compendium of crop proteins with Annotated locations (cropPAL) ( | 189 proteins were assigned subcellular location. This provides enrichment of annotation. | Students found these tools to be useful for their own research project. |
| Transmembrane domain(s) (5 min per protein) | TMHMM ( | 26 were found to be transmembrane proteins. | This is a useful exercise to enrich protein annotation. |
| GO (2–3 min per gene) | Assign GO terms | GO terms were assigned to 36 transcription events and in the case of proteins, were extracted from UniProt. | Learned about the utility of ontologies in curation. |
| Molecular interactions (15 min) | Summarized interaction data | Discussed a transcription network of 36 genes and another network of 6 genes, which emerged from literature review. | Learned how false-positive protein–protein interactions can be identified by considering their subcellular location. |
| Reaction (10 min) | Showed how to assign a reaction to a gene and protein. | Metabolic/transport/translocation/transcription/binding events were illustrated on whiteboard. | How different types of reactions can be depicted in pathways |
| Pathway (10 min) | Showed how to assemble various reactions into a pathway and to associate them with complex biological processes (i.e. development of plant organs and tissues, plant’s response to abiotic stress). | As an example, the gene regulatory network of | How to integrate information from various sources to create a pathway. |
| Summary (15–30 min per summary) | Create a summary of genes and pathways with citation | In total, summaries of 20 genes and 5 pathways were composed. | Critical review of data from the literature |
|
| |||
|
|
|
|
|
| Discussed how various macromolecular interactions can be depicted in the form of a series of connected reactions (2 hours) | 15 min whiteboard presentation by each participant. | The rough outline of five pathway diagrams was created. | This exercise was highly appreciated by all participants. Beyond biocuration exercise, it helped them in building data-driven hypotheses and refining their own research projects. |
|
| |||
|
|
|
|
|
| PathVisio tool (1 hr) | Showed curation of gene–gene network involved in rice response to biotic stress | The curated pathway was deposited in WikiPathways | Participants found this to be a useful resource. |
| Introduction to Plant Reactome analysis tools and the Reactome Curator Tool (6 hr) | Short introduction to Plant Reactome Introduction to the Reactome Curator Tool by showing the step-by-step curation of an arsenic transport pathway. Participants curated pathways of interest with one-on-one assistance provided by organizers | Curation of arsenic transport pathway. Curation of one gene-regulatory pathway and partial curation of three additional pathways. | Participants found Plant Reactome to be a useful resource for the analysis of large-scale gene expression datasets, and the Reactome Curator Tool to be the most sophisticated pathway curation platform. |