| Literature DB >> 32316247 |
Qiao Wen Tan1, William Goh1, Marek Mutwil1.
Abstract
As genomes become more and more available, gene function prediction presents itself as one of the major hurdles in our quest to extract meaningful information on the biological processes genes participate in. In order to facilitate gene function prediction, we show how our user-friendly pipeline, the Large-Scale Transcriptomic Analysis Pipeline in Cloud (LSTrAP-Cloud), can be useful in helping biologists make a shortlist of genes involved in a biological process that they might be interested in, by using a single gene of interest as bait. The LSTrAP-Cloud is based on Google Colaboratory, and provides user-friendly tools that process quality-control RNA sequencing data streamed from the European Nucleotide Archive. The LSTRAP-Cloud outputs a gene coexpression network that can be used to identify functionally related genes for any organism with a sequenced genome and publicly available RNA sequencing data. Here, we used the biosynthesis pathway of Nicotiana tabacum as a case study to demonstrate how enzymes, transporters, and transcription factors involved in the synthesis, transport, and regulation of nicotine can be identified using our pipeline.Entities:
Keywords: RNA; cloud; coexpression; metabolism; sequencing
Year: 2020 PMID: 32316247 PMCID: PMC7230309 DOI: 10.3390/genes11040428
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Schematic of the Large-Scale Transcriptomic Analysis Pipeline in Cloud (LSTrAP-Cloud) pipeline used for streaming and mapping of RNA sequencing data, and for the generation of coexpression network based on the gene of interest.
Figure 2Summary of tobacco experiments streamed and processed by LSTrAP-Cloud. (A) The violin plot shows the speed of simultaneous streaming and processing of RNA-seq files by kallisto. (B) Amount of reads processed per gigabyte streamed. (C) Cumulative size (y-axis) of files streamed over time (x-axis). (D) The scatterplot shows the percentage of reads pseudoaligned to the coding sequences (CDS) (x-axis) against the total number of streamed reads (y-axis) for each experiment, which are represented as dots on the plot. (E) The percentage of reads pseudoaligned to the CDS (x-axis) versus the percentage of genes with nonzero Transcripts Per Kilobase Million (TPM) values (y-axis) for each experiment, which are represented as dots on the plot. Red lines indicate cutoffs that were used to the select experiments for downstream analyses.
Figure 3Nicotine biosynthesis. (A) Schematic of the nicotine biosynthesis pathway. Abbreviations for the enzymes are ODC2: ornithine decarboxylase 2, PMT: putrescine N-methyltransferase, MPO1: N-methylputrescine oxidase 1, BBL: berberine bridge enzyme-like proteins, AO: L-aspartate oxidase, QS: quinolinate synthase, QPT2: quinolinate phosphoribosyltransferase 2, A622: isoflavone-like oxidoreductase. (B) Coexpression network of A622 (Nitab4.5_0000884g0010.1). Abbreviations are AP2/ERF: APETALA 2/ethylene responsive factor, bHLH: basic helix-loop-helix, MATE2: multiantimicrobial extrusion family protein 2, OCT: organic cation transporter, NiaP: nicotinate transporter, PUP: purine uptake permease, and UmamiT: Usually Multiple Amino Acids Move In and Out Transporters. For brevity, only homologs of genes involved in nicotine biosynthesis as described in (A), transporters, and transcription factors are shown.
Figure 4Relative expression of genes across major organs in Nicotiana tabacum. The expression values for each gene were scaled by dividing each row by the maximum value found in the row.