| Literature DB >> 27515514 |
Yueqiong Ni1, Jun Li1, Gianni Panagiotou2.
Abstract
BACKGROUND: Microbiota-oriented studies based on metagenomic or metatranscriptomic sequencing have revolutionised our understanding on microbial ecology and the roles of both clinical and environmental microbes. The analysis of massive metatranscriptomic data requires extensive computational resources, a collection of bioinformatics tools and expertise in programming.Entities:
Keywords: Computational biology; Metatranscriptomics; Microbial RNA-Seq; Microbial community; Web servers
Mesh:
Year: 2016 PMID: 27515514 PMCID: PMC4982211 DOI: 10.1186/s12864-016-2964-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1The metatranscriptome analysis pipeline in COMAN. The example output figures from the analysis of a gut microbiome dataset are shown. Upper left: functional profiling; upper right: taxonomic contribution analysis (complete linkage method for clustering algorithm and Euclidean distance for dissimilarity metric); lower left: multi-dimensional scaling to illustrate sample clustering; lower right: co-expression network analysis with different inferred communities (for clarity purpose, the communities with fewer than 3 elements are merged together and not highlighted here)
Fig. 2Performance evaluation of subsets of the combined database used in COMAN. The combined database was constructed by merging the NCBI bacterial reference genomes non-coding RNAs with eukaryotic ribosomal DNA (both large and small subunits) deposited in the SILVA database. Different subsets of random 10 % and 5 % of the full combined database (indicated by x-axis) were taken and their performance was compared to the BLASTN mapping results from using the full version. For each subset, the “Relative Accuracy” is defined as the number of commonly identified reads between the subset and the full database, divided by the total number of reads identified only by using the subset for mapping. In comparison, the “Relative Sensitivity” is defined as the number of commonly identified reads between the subset and the full database, divided by the total number of reads identified by using the full database