| Literature DB >> 24165883 |
Victor M Markowitz1, I-Min A Chen, Krishna Palaniappan, Ken Chu, Ernest Szeto, Manoj Pillay, Anna Ratner, Jinghua Huang, Tanja Woyke, Marcel Huntemann, Iain Anderson, Konstantinos Billis, Neha Varghese, Konstantinos Mavromatis, Amrita Pati, Natalia N Ivanova, Nikos C Kyrpides.
Abstract
The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG's data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG's annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).Entities:
Mesh:
Year: 2013 PMID: 24165883 PMCID: PMC3965111 DOI: 10.1093/nar/gkt963
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.RNA-Seq data organization. (i) ‘Omics’ datasets generated can be accessed from ‘IMG Statistics’ on IMG’s front page, following the Experiments link available on the ‘IMG Statistics’ page. (ii) An RNA-Seq study is associated with samples and the number of genes expressed across all samples. (iii) Each sample is associated with the number of expressed genes, the total number of reads and the average number of reads per gene. (iv) An expressed gene is associated with a read count (total number of reads divided by the size of the gene) and normalized coverage (coverage for a gene in the experiment divided by the total number of reads in that experiment).
Figure 2.Biosynthetic clusters. (i) Genomes associated with biosynthetic clusters can be retrieved and examined using the ‘Genome Browser’. (ii) The number of biosynthetic clusters is provided in the ‘Genome Statistics’ section of the ‘Organism Detail’ page of a genome, together with a hyperlink to (iii) the list of biosynthetic clusters, whereby for each cluster the number of associated genes, the evidence type and the corresponding natural product are provided. (iv) A biosynthetic cluster can be examined using the ‘Biosynthetic Cluster Detail’ page, which includes information about the cluster. (v) ‘Natural Product List’ provides the list of the IMG genomes associated with natural products.
Figure 3.RNA-Seq data exploration. (i) The list of RNA-Seq studies associated with a genome can be accessed from its ‘Organism Details’, with each study associated with (ii) a list of RNA-Seq experiments (samples). Individual samples can be selected for further analysis, such as (iii) examining its expressed genes as a list or using the (iv) chromosome viewer. A sample can be also examined in the context of (v) pathways that have at least one enzyme associated with an expressed gene in the sample, whereby for each pathway (vi) enzymes are displayed with colors representing the level of expression for the associated genes; mousing over an enzyme shows the number of expressed genes associated with the enzyme.
Figure 4.RNA-Seq data comparison. (i) RNA-Seq sample comparison starts with the selection of samples of interest. (ii) ‘Pairwise Sample Analysis’ supports comparing samples in terms of up/downregulated genes, with (iii) a histogram preview helping setting the thresholds for comparison. (iv) The result of the comparison can be examined in terms of functions, whereby genes associated with KEGG pathways or COG functions are grouped together. (v) The strength of the association of gene expression between pairs of samples can be examined using ‘Spearman’s Rank Correlation’. (vi) ‘Linear Regression’ analysis helps estimate whether two samples are technical replicates. (vii) ‘Multiple Sample Analysis’ consists of clustering samples based on the abundance of expressed genes, using a variety of clustering methods. (viii) Clusters of samples can be examined in the context of pathways, whereby enzymes are displayed with colors representing the cluster.