| Literature DB >> 28605765 |
Roberto Vera Alvarez1, Newton Medeiros Vidal1, Gina A Garzón-Martínez2, Luz S Barrero2, David Landsman1, Leonardo Mariño-Ramírez1.
Abstract
Abstract: The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database: URL: http://www.ncbi.nlm.nih.gov/projects/physalis/. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.Entities:
Mesh:
Year: 2017 PMID: 28605765 PMCID: PMC5467576 DOI: 10.1093/database/bax008
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Global workflow for annotating NCBI BioProject transcriptome data.
Tools used inside this workflow
| Tool | Version | Main use | URL |
|---|---|---|---|
| EUtils | 4.50 | Advanced method for accessing the NCBI set of interconnected databases from a UNIX terminal window | |
| SRA toolkit | 2.6.3 | Programmatically access data housed within SRA and convert it from the SRA format to different formats | |
| BioPython | 1.67 | Python tools for computational molecular biology and bioinformatics | |
| Bowtie 2 | 2.2.6 | An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences | |
| Samtools | 1.3.1 | A suite of programs for interacting with high-throughput sequencing data | |
| BLAST | 2.4.0 | The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences | |
| JBioWH | 6.1.3 | An open-source, platform-independent programming framework that allows a user to build a customized integrated database |
Figure 2.Collage of the content view for the BioProject and transcript pages. (a) BioProject summary page, (b) transcript list page, (c) description and cross-referenced blocks, and (d) alignment view.
Figure 3.Basic Local Alignment Search Tool summary popup.
Figure 4.Cross-referenced Gene Ontology and Enzyme Commission lists (a and c) and full descriptions (b and d).
Figure 5.Statistics graphs for Gene Ontology namespace.
Figure 6.Backend database schema developed to store multiple BioProjects.
Top seven proteins aligned with transcript JO140768 via BLAST
| Accession | Title | Taxonomy | EValue | BitScore | Score | Length |
|---|---|---|---|---|---|---|
| XP_009779970.1 | PREDICTED: mitochondrial adenine nucleotide transporter ADNT1-like isoform X1 ( | 0 | 635 | 1639 | 337 | |
| XP_010326870.1 | PREDICTED: mitochondrial adenine nucleotide transporter ADNT1 isoform X1 ( | 0 | 633 | 1633 | 337 | |
| XP_010326872.1 | PREDICTED: mitochondrial adenine nucleotide transporter ADNT1 isoform X2 ( | 0 | 626 | 1615 | 337 | |
| NP_001275102.1 | Mitochondrial carrier-like protein ( | 0 | 625 | 1,613 | 337 | |
| NP_001275102.1 | Mitochondrial carrier-like protein ( | 0 | 625 | 1,613 | 337 | |
| XP_004248074.1 | PREDICTED: mitochondrial adenine nucleotide transporter ADNT1 ( | 0 | 595 | 1,534 | 338 | |
| XP_004248074.1 | PREDICTED: mitochondrial adenine nucleotide transporter ADNT1 ( | 0 | 595 | 1,534 | 338 |