Literature DB >> 29901703

MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data.

L M Simon1, S Karg1, A J Westermann2,3, M Engel1,4, A H A Elbehery5, B Hense1, M Heinig1, L Deng5, F J Theis1,6.   

Abstract

Background: With the advent of the age of big data in bioinformatics, large volumes of data and high-performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts; however, its generic nature also enables the detection of microbial and viral transcripts. Findings: We developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from six independent, controlled infection experiments of cell line models and compared them with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from  more than 17,000 samples from more than 400 studies relevant to human disease using state-of-the-art high-performance computing systems. The resulting data from this large-scale re-analysis are made available in the presented MetaMap resource. Conclusions: Our results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation toward the role of the microbiome in human disease. Additionally, codes to process new datasets and perform statistical analyses are made available.

Entities:  

Mesh:

Year:  2018        PMID: 29901703      PMCID: PMC6025204          DOI: 10.1093/gigascience/giy070

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data Description

Context

Recent studies have demonstrated the paramount importance of the microbiome for human health and disease [1]. For example, imbalance of the human gut microbiome was linked to noncommunicable diseases such as obesity [2, 3], diabetes [4], cardiovascular disease [5], chronic obstructive pulmonary disease [6], and colorectal carcinoma [7, 8], to name just a few. The advent of high-throughput sequencing technologies has revolutionized the life sciences. RNA sequencing (RNA-seq) technology produces one of the most frequent next-generation sequencing data types and has been applied to the study of a large number of biological samples relevant to human disease. The majority of the underlying raw data are freely accessible from data repositories such as the Gene Expression Omnibus (>1,700 human RNA-seq datasets as of January 2018) and the Sequence Read Archive (SRA) [9]. However, these data are typically exclusively used for single species (i.e., human) transcriptomics such as differential gene expression and alternative splicing analysis [9, 10]. Reads that do not map onto the human genome are considered noise or contamination and therefore are generally ignored [11, 12] (collectively about 9% of total reads, Fig. 1). Five years ago, it was postulated that interspecies interactions might be studied by simultaneous detection and quantification of RNA transcripts from a given host and a microbe via “dual” RNA-seq [13]. Meanwhile, this approach has been successfully applied to the interaction of mammalian cells with diverse bacterial [14] and viral pathogens [15-19].
Figure 1:

Schematic of the MetaMap pipeline. More than 400 projects from studies relevant to human disease were identified in the SRA database. More than 500 billion RNA-seq reads were downloaded and first filtered by mapping them onto the human genome. The remaining reads underwent metafeature classification. It is noted that 90.7% of all reads mapped to the human genome; 0.03%, 0.20%, and 0.39% of all reads were assigned to archaeal, bacterial, or viral metafeatures, respectively; and 8.6% of all reads remain nondiscriminative at the species level (“unclassified”).

Schematic of the MetaMap pipeline. More than 400 projects from studies relevant to human disease were identified in the SRA database. More than 500 billion RNA-seq reads were downloaded and first filtered by mapping them onto the human genome. The remaining reads underwent metafeature classification. It is noted that 90.7% of all reads mapped to the human genome; 0.03%, 0.20%, and 0.39% of all reads were assigned to archaeal, bacterial, or viral metafeatures, respectively; and 8.6% of all reads remain nondiscriminative at the species level (“unclassified”). Inspired by dual RNA-seq, in this study we hypothesize that reads in archived RNA-seq datasets derived from human primary cells or tissue samples that fail to map against the human reference genome may contain valuable information about the presence of certain microbes in the respective body niches and/or under defined disease conditions. To enable metatranscriptomic study of these data, we combined existing read alignment and metagenomic classification software into a two-step “omni” RNA-seq pipeline to comprehensively quantify archaeal, bacterial, and viral reads in human RNA-seq data (Fig.1). In the first step of this so-called MetaMap pipeline, all reads are aligned against the human genome using the ultra-fast RNA-seq aligner Spliced Transcripts Alignment to a Reference software (STAR) [20]. Subsequently, only the fraction of unmapped reads is subjected to metatranscriptomic classification using CLARK-S [21] (see Methods for details). The combination between scalability and accuracy was the main motivation behind choosing these two software packages over competing methods [22, 23]. It is important to note that CLARK-S uses a set of uniquely discriminative short sequences at the species level to classify reads. Therefore, reads containing nondiscriminative sequences that fail to be uniquely assigned to a single species, e.g., reads originating from the bacterial ribosomal 16S rRNA gene will be considered “unclassified” (altogether 8.6% in Fig.1). The output of CLARK-S is an operational taxonomic units (OTUs) count matrix, where rows correspond to viral, bacterial, and archeal species and columns correspond to (human) samples. Each entry corresponds to the number of non-human reads classified to the respective species. For convenience, in the following, we refer to the set of microbial and viral species profiled using our approach as “metafeatures.” By screening the study abstracts of the SRA for search terms prioritizing human clinical datasets derived from polyA-independent sequencing protocols (see Methods), we identified more than 400 studies relevant to human disease comprising more than  17,000 cDNA libraries (close to 150 terabytes of raw sequencing data). Raw sequencing reads from these studies were downloaded and analyzed using the high-performance computing system of the Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities, which facilitated ultra-fast processing with median speeds of 25 and 21 million reads per hour per core per run for the STAR and CLARK-S steps, respectively. Of the more than 500 billion RNA-seq reads processed, around 91% could be mapped to the human genome. A fraction of 8.6% of all reads remained nondiscriminative at the species level and defined as “unclassified.” In addition, 0.03%, 0.20%, and 0.39% of all reads were assigned to archaeal, bacterial, or viral metafeatures, respectively. Despite these relatively low percentages, the absolute numbers of reads classified were in the hundred millions to billions, enabling statistical analyses.

Methods

High-performance computing environment

Project computations including download, alignment of reads onto the human genome, and metafeature quantification were made on the high-performance Linux Cluster at the LRZ [24].

RNA-seq data retrieval

Raw next-generation sequencing data were downloaded from the SRA. The R package SRAdb was downloaded on 23 May 2017 and used to query the SRA database. To identify SRA projects that contain transcriptomic analyses of human RNA-seq data, the SRA attributes “taxon_id,” “library_source,” “library_strategy,” and “platform” were searched for the terms “9606,” “TRANSCRIPT,” “RNA-seq,” and “ILLUMINA,” respectively. To remove potential bias derived from different sequencing technologies, we also restricted the query to SRA runs annotated with “ILLUMINA” in SRA attribute “platform.” To exclude studies with insufficient sample size for statistical analysis, the query was restricted to SRA projects containing more than five runs. To avoid concentrating the analysis on a small number of large projects, the query was restricted to SRA projects with fewer than 500 runs. To identify studies focusing on phenotypes relevant to human disease, we restricted the query to runs containing at least one or more of the terms “disease,” “patient,” “primary,” and “clinical” in the SRA attribute “study_abstract.” To exclude in vitro (cell-culture) experiments but focus on primary (clinical) samples, SRA runs containing the terms “mutant” or “cell-line” were removed from our selection. Furthermore, SRA runs containing the terms “single cell” and “GTEx” were removed. Finally, samples with fewer than 1 million total reads or read lengths <50 bp were excluded. The described query resulted in 484 short read projects (SRPs) containing  21,659 RNA-seq runs. Due to technical problems (i.e., missing URLs, restricted access), we were unable to download a fraction of 4,078 samples.

Human alignment

Alignment of reads against the human reference genome (hg38) and simultaneous human gene expression quantification was conducted with STAR (version 2.5.2). To increase mapping speed of a large number of samples, we used the –genomeLoad LoadAndKeep function to load the STAR index once and keep it in memory for subsequent alignments. The parameter –quantmode GeneCounts was used to generate the human gene expression count tables. Unmapped reads were saved with the –outReadsUnmapped Fastx parameter. To further increase mapping speed, multiple threads were used as implemented with the parameter –runThreadN 28. Runs with fewer than 30% reads mapping to the human genome were excluded from downstream analysis. All human alignments were conducted on the LRZ “CoolMUC2” Linux-Cluster. This cluster contains 384 nodes with 64 GB random access memory (RAM) memory and 28 cores each.

Metafeature quantification

Metafeature quantification was conducted with CLARK-S (version 1.2.3). CLARK-S is a software method for fast and accurate sequence classification of metagenomic next-generation sequencing data, including RNA-seq data. One major issue during the classification of metagenomic data is the rising number of targets to align against. CLARK-S solves this issue by building a large index file consisting of discriminative k-mers. The metagenomic reference database was generated following the description of the CLARK website using the following two commands: set_targets.sh bacteria virus –species and buildSpacedDB.sh. This database contained  16,551 genome sequences corresponding to 6,979 unique species (Additional File 2). To allow uniform processing, paired-end sequencing experiments were analyzed independently. Each single unmapped read file was used as input for CLARK-S with the following parameters: classify_metagenome.sh –spaced –O list of FASTQ files. To increase classification speed, the CLARK-S express mode was selected and multiple threads were used with parameters –m 2 and –n 32, respectively. The output files of this step contain all input read identifiers with the corresponding metafeature classification. In the subsequent step, total counts are summarized for each feature with the estimate_abundance.sh command. To enable comparison across single-end and paired-end experiments, metafeature counts from paired-end experiments were averaged and subsequently rounded to conserve count distribution. To account for varying sequencing depths, metafeature abundance was estimated as the number of reads per million total reads sequenced. Metafeature quantification was conducted on the LRZ “Teramem” Linux-Cluster. This cluster contains one node with 6,144 GB RAM memory and 96 cores.

BLAST-based metafeature classification

To validate results generated by the MetaMap pipeline, the Basic Local Alignment Search Tool (BLAST) [25] was used as follows. A BLAST database was created from the same genome sequences used in the CLARK-S approach. Then, reads were aligned to this database using BLASTN with a threshold E-value of 1e-10. Produced counts from paired-end experiments were averaged. For each file, BLAST was done by running approximately 10 kb chunks (record separator “>”) in parallel using parallel [41] (28 jobs), each with eight threads using one node on the LRZ “CoolMUC3” Linux Cluster. This cluster contains 148 nodes with 96 GB RAM memory and 64 cores each. Output was parsed to exclusively keep reads that could be assigned at the species level.

Differential metafeature abundance

Differential metafeature abundance analysis was performed using the R package DESeq2 [26]. DESeq2 models differential gene expression by fitting a negative binomial distribution to the raw counts underlying RNA-seq data. This framework can account for confounding variables such as sequencing depth. Therefore, the data need not be normalized prior to statistical inter-sample comparisons. For each of the four published bona fide dual RNA-seq studies, we classified samples into the following two groups based on the provided annotations: samples expected to contain the known pathogen, such as human papillomavirus-positive tumors in the Zhang et al. study [28], and pathogen-free controls, such as mock-treated cells in the Westermann et al. [27] study. Using this binary outcome, we performed differential expression analysis across all detected metafeatures. To account for sequencing depth, library size factors were estimated from the total number of sequenced reads. The dispersion for the negative binomial distribution was estimated using a local linear regression as implemented in the DESeq() function via the fitType parameter “local.”

Data validation and quality control

We validated our approach by recovering the ground truth in bona fide dual RNA-seq experiments performed with human cell lines and samples from patients with well-known infection status. Of the four selected studies, one analyzed an infection model based on a bacterial (Salmonella enterica serovar Typhimurium) and three based on distinct viral pathogens (human papillomavirus, herpes simplex virus, rhinovirus). As expected, MetaMap detected the known pathogen at higher levels in the respective study compared to the other studies and pathogens (Table1). However, comparisons across studies and metafeatures may be biased by technical confounders (discussed in detail in the Re-use potential section). Therefore, we focused our analysis on the comparison of a single metafeature across subjects within a study. Using the annotation provided in the respective study, we performed differential metafeature abundance analysis to identify those metafeatures that show the largest relative difference in abundance levels between the infected and control samples (see Methods for details). The correct infection agent showed the most significant difference across all metafeatures between infected and control samples for each study (Fig.2). For example, Westermann et al. [27] generated dual RNA-seq data from HeLa cells infected with the enteric bacterial pathogen S. enterica serovar Typhimurium and compared them to mock-treated control samples. Accordingly, we observed S. enterica as the most differentially abundant metafeature between the infected and the control samples (P <1e-75, Fig. 2A). Likewise, we recovered alphapapillomavirus 9, human alphaherpesvirus 1 (also known as herpes simplex virus 1), and rhinovirus A as the most differentially abundant metafeatures in the data from Zhang et al. [28], Rutkowski et al. [29], and Bai et al. [30], respectively. In the Westermann et al. [27] and Rutkowski et al. [29] studies, several additional metafeatures showed a strong differential abundance effect (Fig. 2A and 2C). These metafeatures were closely related to the true infection agent, i.e., Salmonella bongori (P<1e-67) and Panine alphaherpesvirus 3 (P <1e-9) for the Westermann et al. [27] or Rutkowski et al. [29] study, respectively. These findings confirm that our MetaMap pipeline recapitulates results from dedicated dual RNA-seq studies, i.e., studies based on known infectious agents. Therefore, MetaMap may be equally suited to detect previously unknown microbial and viral species in human primary samples.
Table 1:

Overview of four dual RNA-seq studies used to validate the MetaMap pipeline.

StudyInfection agentTotal reads Salmonella enterica Alphapapillomavirus 9Human alphaherpesvirus 1Rhinovirus A
Westermann et al. [27] Salmonella enterica serovar Typhimurium1.0e+07 6.3e+03 1.2e-011.5e-011.2e-01
Zhang et al. [28]Human papillomavirus4.6e+073.0e-02 5.1e+01 2.2e-022.2e-02
Rutkowski et al. [29]Herpes simplex virus3.5e+071.1e+003.1e-02 3.1e+04 3.0e-02
Bai et al. [30]Rhinovirus6.6e+062.0e-011.5e-011.5e-01 4.4e+01

Total reads column depicts the average read depth per sample for each study. Average metafeature abundance for alphapapillomavirus 9, Salmonella enterica, human alphaherpesvirus 1, and rhinovirus A are shown in reads per million. The correct infection agent for the respective study is highlighted in bold font

Figure 2:

Differential metafeature abundance analysis of controlled infection experiments recovers ground truth. “Volcano” plots show fold change and inverted P value on the x and y axes, respectively. Each dot represents a metafeature. The most significant metafeature is colored in red. Insets display box plots of the abundance levels in reads per million of the top hit metafeature across conditions for each study. For all box plots, the box represents the interquartile range, the horizontal line in the box is the median, and the whiskers represent 1.5 times the interquartile range.

Differential metafeature abundance analysis of controlled infection experiments recovers ground truth. “Volcano” plots show fold change and inverted P value on the x and y axes, respectively. Each dot represents a metafeature. The most significant metafeature is colored in red. Insets display box plots of the abundance levels in reads per million of the top hit metafeature across conditions for each study. For all box plots, the box represents the interquartile range, the horizontal line in the box is the median, and the whiskers represent 1.5 times the interquartile range. Overview of four dual RNA-seq studies used to validate the MetaMap pipeline. Total reads column depicts the average read depth per sample for each study. Average metafeature abundance for alphapapillomavirus 9, Salmonella enterica, human alphaherpesvirus 1, and rhinovirus A are shown in reads per million. The correct infection agent for the respective study is highlighted in bold font As an additional control, we re-analyzed two projects contained in our data collection that are derived from the B lymphoblast cell line under noninfectious conditions. However, since Epstein-Barr virus (EBV) is used for transfection and transformation of lymphocytes to lymphoblasts, we expected to detect reads from this virus in these projects [31], but no further viral or microbial reads [32]. Indeed, the most abundant metafeatures in each project were dominated by reads classified to gammaherpesvirus 4 (also known as EBV) and Enterobacteria phage phiX174 sensu lato (phiX), commonly used as spike-in in Illumina sequencing runs [33] (Fig. 3A and 3B). On average, 95% and 97% of all metafeature reads were classified as phiX or EBV for projects SRP041338 and SRP091453, respectively (Fig. 3C). Conversely, the abundance of reads mapping to bacterial species for these two projects corresponds to the bottom percentile as compared to all other projects in the MetaMap database, supporting sterility of this cell line (Fig. 3D). This demonstrates that MetaMap not only is capable of rediscovering known pathogenic species (true positives) in controlled infection experiments (Fig. 2) but it also minimizes the detection of false positives or, at least, provides measures such as abundance and significance, allowing the user to identify and counterselect those species.
Figure 3:

Analysis of lymphoblast cell line experiments further supports the MetaMap pipeline. (A and B) Mean abundance levels across all samples of the top five metafeatures for projects SRP041338 and SRP091453, respectively. (C) Relative proportion of reads mapping to EBV, phiX, and all other metafeatures across RNA-seq samples. (D) Cumulative distribution plot of the average proportion of bacterial metafeature reads across all projects. Purple and pink vertical lines highlight projects SRP041338 and SRP091453, respectively.

Analysis of lymphoblast cell line experiments further supports the MetaMap pipeline. (A and B) Mean abundance levels across all samples of the top five metafeatures for projects SRP041338 and SRP091453, respectively. (C) Relative proportion of reads mapping to EBV, phiX, and all other metafeatures across RNA-seq samples. (D) Cumulative distribution plot of the average proportion of bacterial metafeature reads across all projects. Purple and pink vertical lines highlight projects SRP041338 and SRP091453, respectively. As a technical validation, we compared our approach to an alternative metatranscriptomic classification strategy for the Westermann et al. [34] study. All non-human reads were aligned using BLASTN to a BLAST database consisting of the same genomic sequences used by CLARK-S (see Methods for details). The average metafeature abundances across all 42 samples derived from the BLAST-based approach and CLARK-S correlated significantly (Spearman correlation, Rho: 0.16, P: 3.1e-10) (Fig. 4A). BLAST showed higher sensitivity and detected more metafeatures compared to CLARK-S (indicated by the accumulation of dots at value 0 on the x axis in Fig. 4A). This is mostly observed for low abundance metafeatures that could represent low counts derived from sequencing and/or mapping errors. However, most importantly, the true pathogen metafeature “Salmonella enterica” showed very high correlation across samples between the BLAST- and CLARK-based abundance estimates (Fig. 4B). Noteworthy, the MetaMap pipeline processed reads more than three orders of magnitude faster than BLAST, demonstrating a significant speed advantage while generating comparable results (Fig. 4C).
Figure 4:

Alternative BLAST-based classification method validates metafeature abundance estimates by MetaMap. (A) Average metafeature reads per million levels derived using the CLARK-S software, as implemented in the MetaMap pipeline, and a BLAST-based alternative approach on the x and y axes, respectively. (B) Correlation in S. enterica abundance levels between the two classification approaches. (C) Difference in classification speed between the BLAST and CLARK-S metatranscriptomic classification. The y axis shows the number of reads processed per hour per thread in log10 space.

Alternative BLAST-based classification method validates metafeature abundance estimates by MetaMap. (A) Average metafeature reads per million levels derived using the CLARK-S software, as implemented in the MetaMap pipeline, and a BLAST-based alternative approach on the x and y axes, respectively. (B) Correlation in S. enterica abundance levels between the two classification approaches. (C) Difference in classification speed between the BLAST and CLARK-S metatranscriptomic classification. The y axis shows the number of reads processed per hour per thread in log10 space.

Re-use potential

Microbial and viral contamination in next-generation sequencing data has been observed. It can be caused by mapping errors due to genome sequence similarity between different species [35, 36]. In addition, technical confounders can obstruct the analysis and potentially generate artificial differences if not considered properly. For example, different types of human samples may contain different amounts of non-human material due to varying sterility of the tissues. Furthermore, sequencing depth may introduce a detection floor for metafeatures that are not abundant. Therefore, comparisons across different tissues and sequencing depths may generate artificial differences. Additionally, given that only uniquely discriminative sequences are counted, the absolute abundance levels may not be comparable across metafeatures. Finally, the MetaMap pipeline captures metafeature abundance at the RNA level, which may not necessarily correspond to genomic abundance levels. Metafeatures may not be abundant at the DNA level but highly transcriptionally active and thus abundantly detected at the RNA level, or the inverse. These potential challenges need to be taken into consideration when comparing across metafeatures. To minimize these effects, we encourage focusing on studies that include intraproject comparisons that test one metafeature at a time, as exemplified in the differential metafeature abundance analysis. Our rationale is that technical confounders, in contrast to biologically meaningful changes, should affect all runs within a project to the same extent and therefore not show condition-specific effects. For example, in the Westermann et al. study [34], we detected substantial levels of phiX in both conditions (infected samples and mock-treated controls), but only the “Salmonella” metafeature showed a condition-specific effect. We aim to address the challenges inherent to interproject and intermetafeature comparisons in future work. All the raw data described in the present study were publicly available, yet have been very cumbersome to extract individually. The presented MetaMap database makes these data easily accessible for a very broad community, thereby allowing for global comparisons over hundreds of individual studies and thousands of sampled conditions. While we attempted to minimize the risk of detecting false positives (Fig. 3), it should be noted that not all metafeatures classified by MetaMap will necessarily refer to true biological factors. Noteworthy, our approach reveals a correlation between metafeatures and disease, not causality, and cannot discriminate disease-associated effects from potential treatment effects. However, our pipeline provides the user with a scientific starting ground to validate the presence/absence of defined microbial and viral species under defined conditions and explore the underlying biology and significance in greater detail. As a potential use case of these data, users can test for associations of microbial or viral metafeatures with a plethora of human diseases or between themselves. In addition, users with interest in a specific bacterial or viral species can easily identify studies and, consequently, disease contexts in which reads from this organism were detected. This could give an important first hint to assess whether the respective species might be implicated in a given human disease etiology. Furthermore, this resource provides the opportunity to support findings derived from standard microbiome profiling technologies, such as 16S rRNA gene based or shotgun metagenomics [37]. Finally, metafeature detection in human clinical RNA-seq samples may provide a diagnostic advantage when studying microbes or viruses that are challenging to isolate. The composite metafeature OTU count table, derived from 17 278 cDNA libraries from 436 SRA projects, including annotations is provided for download [38].

Availability of source code and requirements

Project name: MetaMap Project home: https://github.com/theislab/MetaMap Operating system(s): Platform-independent Programming language: Unix command line, R Other requirements: STAR and CLARK-S may require large amounts of memory (>100 GB) License: GNU GPL

Availability of supporting data

The datasets supporting the results presented here are available in the GigaScience Database repository [38]. The protocols are also available at [40].

Additional file

Additional File1.csv

Abbreviations

BLAST: basic local alignment search tool; EBV: Epstein-Barr virus; LRZ: Leibniz Supercomputing Centre; OTU: operational taxonomic unit; phiX: Enterobacteria phage phiX174 sensu lato; RNA-seq: RNA sequencing; SRA: Sequencing Read Archive; SRP: short read project; STAR: Spliced Transcripts Alignment to a Reference software.

Competing interests

The authors declare that they have no competing interests.

Funding

L.S. acknowledges funding from the European Union's Horizon 2020 Research and Innovation Programme under the Marie Sklodowska-Curie grant agreement (753039). The operation of the LRZ Linux Cluster is funded via the Bavarian State Ministry of Education, Science, and the Arts.

Author contributions

Conceptualization: L.S., M.E., L.D., and B.H.; formal analysis: L.S., M.H., S.K., and A.E.; investigation: L.S., A.J.W., and M.E.; methodology: L.S., S.K, M.H.; writing the original draft: L.S. and A.J.W; writing, reviewing, and editing: L.S., A.J.W., M.E., A.E., L.D., M.H., and F.T.; supervision: L.D., M.H., and F.T. Click here for additional data file. Click here for additional data file. Click here for additional data file. 3/5/2018 Reviewed Click here for additional data file. 4/19/2018 Reviewed Click here for additional data file. Click here for additional data file.
  35 in total

1.  Genomic analysis identifies association of Fusobacterium with colorectal carcinoma.

Authors:  Aleksandar D Kostic; Dirk Gevers; Chandra Sekhar Pedamallu; Monia Michaud; Fujiko Duke; Ashlee M Earl; Akinyemi I Ojesina; Joonil Jung; Adam J Bass; Josep Tabernero; José Baselga; Chen Liu; Ramesh A Shivdasani; Shuji Ogino; Bruce W Birren; Curtis Huttenhower; Wendy S Garrett; Matthew Meyerson
Journal:  Genome Res       Date:  2011-10-18       Impact factor: 9.043

2.  STAR: ultrafast universal RNA-seq aligner.

Authors:  Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal:  Bioinformatics       Date:  2012-10-25       Impact factor: 6.937

3.  Higher classification sensitivity of short metagenomic reads with CLARK-S.

Authors:  Rachid Ounit; Stefano Lonardi
Journal:  Bioinformatics       Date:  2016-08-18       Impact factor: 6.937

4.  An obesity-associated gut microbiome with increased capacity for energy harvest.

Authors:  Peter J Turnbaugh; Ruth E Ley; Michael A Mahowald; Vincent Magrini; Elaine R Mardis; Jeffrey I Gordon
Journal:  Nature       Date:  2006-12-21       Impact factor: 49.962

5.  Dual analysis of the murine cytomegalovirus and host cell transcriptomes reveal new aspects of the virus-host cell interface.

Authors:  Vanda Juranic Lisnic; Marina Babic Cac; Berislav Lisnic; Tihana Trsan; Adam Mefferd; Chitrangada Das Mukhopadhyay; Charles H Cook; Stipan Jonjic; Joanne Trgovcich
Journal:  PLoS Pathog       Date:  2013-09-26       Impact factor: 6.823

6.  Mining RNA-seq data for infections and contaminations.

Authors:  Thomas Bonfert; Gergely Csaba; Ralf Zimmer; Caroline C Friedel
Journal:  PLoS One       Date:  2013-09-03       Impact factor: 3.240

7.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors:  Michael I Love; Wolfgang Huber; Simon Anders
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

8.  Dual RNA-seq reveals viral infections in asthmatic children without respiratory illness which are associated with changes in the airway transcriptome.

Authors:  Agata Wesolowska-Andersen; Jamie L Everman; Rebecca Davidson; Cydney Rios; Rachelle Herrin; Celeste Eng; William J Janssen; Andrew H Liu; Sam S Oh; Rajesh Kumar; Tasha E Fingerlin; Jose Rodriguez-Santana; Esteban G Burchard; Max A Seibold
Journal:  Genome Biol       Date:  2017-01-19       Impact factor: 13.583

Review 9.  Resolving host-pathogen interactions by dual RNA-seq.

Authors:  Alexander J Westermann; Lars Barquist; Jörg Vogel
Journal:  PLoS Pathog       Date:  2017-02-16       Impact factor: 6.823

10.  An evaluation of the accuracy and speed of metagenome analysis tools.

Authors:  Stinus Lindgreen; Karen L Adair; Paul P Gardner
Journal:  Sci Rep       Date:  2016-01-18       Impact factor: 4.379

View more
  9 in total

1.  MetaQUBIC: a computational pipeline for gene-level functional profiling of metagenome and metatranscriptome.

Authors:  Anjun Ma; Minxuan Sun; Adam McDermaid; Bingqiang Liu; Qin Ma
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

2.  CAFU: a Galaxy framework for exploring unmapped RNA-Seq data.

Authors:  Siyuan Chen; Chengzhi Ren; Jingjing Zhai; Jiantao Yu; Xuyang Zhao; Zelong Li; Ting Zhang; Wenlong Ma; Zhaoxue Han; Chuang Ma
Journal:  Brief Bioinform       Date:  2020-03-23       Impact factor: 11.622

3.  SEAweb: the small RNA Expression Atlas web application.

Authors:  Raza-Ur Rahman; Anna-Maria Liebhoff; Vikas Bansal; Maksims Fiosins; Ashish Rajput; Abdul Sattar; Daniel S Magruder; Sumit Madan; Ting Sun; Abhivyakti Gautam; Sven Heins; Timur Liwinski; Jörn Bethune; Claudia Trenkwalder; Juliane Fluck; Brit Mollenhauer; Stefan Bonn
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

4.  Comparative RNA-Seq transcriptome analyses reveal dynamic time-dependent effects of 56Fe, 16O, and 28Si irradiation on the induction of murine hepatocellular carcinoma.

Authors:  Anna M Nia; Kamil Khanipov; Brooke L Barnette; Robert L Ullrich; George Golovko; Mark R Emmett
Journal:  BMC Genomics       Date:  2020-07-01       Impact factor: 3.969

5.  A tissue level atlas of the healthy human virome.

Authors:  Ryuichi Kumata; Jumpei Ito; Kenta Takahashi; Tadaki Suzuki; Kei Sato
Journal:  BMC Biol       Date:  2020-06-04       Impact factor: 7.431

6.  Characterization of the consensus mucosal microbiome of colorectal cancer.

Authors:  Lan Zhao; Susan M Grimes; Stephanie U Greer; Matthew Kubit; HoJoon Lee; Lincoln D Nadauld; Hanlee P Ji
Journal:  NAR Cancer       Date:  2021-12-22

7.  Meta'omics: Challenges and Applications.

Authors:  Valerio Fulci
Journal:  Int J Mol Sci       Date:  2022-06-10       Impact factor: 6.208

8.  MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data.

Authors:  L M Simon; S Karg; A J Westermann; M Engel; A H A Elbehery; B Hense; M Heinig; L Deng; F J Theis
Journal:  Gigascience       Date:  2018-06-01       Impact factor: 6.524

9.  Application of a bioinformatic pipeline to RNA-seq data identifies novel virus-like sequence in human blood.

Authors:  Marko Melnick; Patrick Gonzales; Thomas J LaRocca; Yuping Song; Joanne Wuu; Michael Benatar; Björn Oskarsson; Leonard Petrucelli; Robin D Dowell; Christopher D Link; Mercedes Prudencio
Journal:  G3 (Bethesda)       Date:  2021-09-06       Impact factor: 3.154

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.