Literature DB >> 35993706

RRAP: RPKM Recruitment Analysis Pipeline.

Conner Y Kojima1, Eric W Getz1, J Cameron Thrash1.   

Abstract

A common method for quantifying microbial abundances in situ is through metagenomic read recruitment to genomes and normalizing read counts as reads per kilobase (of genome) per million (bases of recruited sequences) (RPKM). We created RRAP (RPKM Recruitment Analysis Pipeline), a wrapper that automates this process using Bowtie2 and SAMtools.

Entities:  

Year:  2022        PMID: 35993706      PMCID: PMC9476942          DOI: 10.1128/mra.00644-22

Source DB:  PubMed          Journal:  Microbiol Resour Announc        ISSN: 2576-098X


ANNOUNCEMENT

Quantifying the relative abundance of microorganisms in a sample is a critical component of microbial ecology research. Whole-community metagenomic sequencing can be used to calculate relative abundance after recruiting reads to genomes generated from isolates, metagenomes, or single cells (1–4). Since genomes will have different sizes and each sample will have different numbers of reads, normalizing for these two variables can be accomplished with the RPKM (reads per kilobase [of genome] per million [bases of recruited sequences]) method, which was originally developed to quantify relative transcript abundance (5). To automate the process of read recruitment and RPKM normalization for use in recruiting hundreds or thousands of samples to similarly large numbers of genomes, we developed RRAP (RPKM Recruitment Analysis Pipeline). RRAP is a wrapper for other established tools that takes paired-end metagenomic sequences and reference genome sequences as the input and generates both read alignment data and RPKM values. The pipeline streamlines the read recruitment process by automatically handling the preprocessing steps of merging contigs, concatenating reference genomes, and indexing reference sequences. RRAP installs the most recent versions of Bowtie2 and SAMtools that are compatible with the other dependencies (6, 7). After performing read recruitment with Bowtie2, the pipeline sorts and indexes sequence alignment data before counting the numbers of mapped and unmapped metagenomic reads per reference sequence with SAMtools. From the output, RRAP calculates both unadjusted and log10-adjusted RPKM values for each reference genome in each metagenomic sample. Other bioinformatics tools are similar to RRAP but serve different purposes. The Enveomics Collection is a compilation of scripts that analyze metagenomes (8). The scripts BlastTab.catbj.pl and BlastTab.recplot2.R in particular use BLAST results to generate a recruitment plot for visualization purposes. The script anir.rb estimates the average nucleotide identity of reads against a genome using existing alignment data. Anvi’o also provides a metagenomics workflow that assembles reads and maps them to contigs, but this is a much more comprehensive software package than RRAP and serves numerous purposes (9, 10). There are other existing pipelines that perform read recruitment but do not calculate RPKM values. Sunbeam and ngs_backbone are two examples that recruit reads with bwa instead of Bowtie2 to produce alignment data but do not calculate RPKM values (11–13). RRAP is therefore a unique, lightweight, and standalone pipeline for both recruitment and RPKM calculation.

Data availability.

The code, detailed instructions for use, and sample data files to install and test run RRAP are available on GitHub (https://github.com/thrash-lab/rrap). Because the pipeline has dependencies, we recommend installation through the Conda package manager (14). Upon installation, RRAP can be accessed from the command line with a single command. Sample metagenomes and reference genomes to allow quick testing of read recruitment and RPKM calculations were obtained from previous studies (15–19).
  17 in total

1.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

2.  Snakemake--a scalable bioinformatics workflow engine.

Authors:  Johannes Köster; Sven Rahmann
Journal:  Bioinformatics       Date:  2012-08-20       Impact factor: 6.937

3.  Genome streamlining in a cosmopolitan oceanic bacterium.

Authors:  Stephen J Giovannoni; H James Tripp; Scott Givan; Mircea Podar; Kevin L Vergin; Damon Baptista; Lisa Bibbs; Jonathan Eads; Toby H Richardson; Michiel Noordewier; Michael S Rappé; Jay M Short; James C Carrington; Eric J Mathur
Journal:  Science       Date:  2005-08-19       Impact factor: 47.728

4.  Interaction dynamics and virus-host range for estuarine actinophages captured by epicPCR.

Authors:  Eric G Sakowski; Keith Arora-Williams; Funing Tian; Ahmed A Zayed; Olivier Zablocki; Matthew B Sullivan; Sarah P Preheim
Journal:  Nat Microbiol       Date:  2021-02-25       Impact factor: 17.745

5.  Community-led, integrated, reproducible multi-omics with anvi'o.

Authors:  A Murat Eren; Evan Kiefl; Alon Shaiber; Iva Veseli; Samuel E Miller; Matthew S Schechter; Isaac Fink; Jessica N Pan; Mahmoud Yousef; Emily C Fogarty; Florian Trigodet; Andrea R Watson; Özcan C Esen; Ryan M Moore; Quentin Clayssen; Michael D Lee; Veronika Kivenson; Elaina D Graham; Bryan D Merrill; Antti Karkman; Daniel Blankenberg; John M Eppley; Andreas Sjödin; Jarrod J Scott; Xabier Vázquez-Campos; Luke J McKay; Elizabeth A McDaniel; Sarah L R Stevens; Rika E Anderson; Jessika Fuessel; Antonio Fernandez-Guerra; Lois Maignien; Tom O Delmont; Amy D Willis
Journal:  Nat Microbiol       Date:  2021-01       Impact factor: 17.745

6.  Ecophysiology of the Cosmopolitan OM252 Bacterioplankton (Gammaproteobacteria).

Authors:  Emily R Savoie; V Celeste Lanclos; Michael W Henson; Chuankai Cheng; Eric W Getz; Shelby J Barnes; Douglas E LaRowe; Michael S Rappé; J Cameron Thrash
Journal:  mSystems       Date:  2021-06-29       Impact factor: 6.496

7.  Microbial Gene Abundance and Expression Patterns across a River to Ocean Salinity Gradient.

Authors:  Caroline S Fortunato; Byron C Crump
Journal:  PLoS One       Date:  2015-11-04       Impact factor: 3.240

8.  Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments.

Authors:  Erik L Clarke; Louis J Taylor; Chunyu Zhao; Andrew Connell; Jung-Jin Lee; Bryton Fett; Frederic D Bushman; Kyle Bittinger
Journal:  Microbiome       Date:  2019-03-22       Impact factor: 14.650

9.  Metabolic diversity within the globally abundant Marine Group II Euryarchaea offers insight into ecological patterns.

Authors:  Benjamin J Tully
Journal:  Nat Commun       Date:  2019-01-17       Impact factor: 14.919

10.  Twelve years of SAMtools and BCFtools.

Authors:  Petr Danecek; James K Bonfield; Jennifer Liddle; John Marshall; Valeriu Ohan; Martin O Pollard; Andrew Whitwham; Thomas Keane; Shane A McCarthy; Robert M Davies; Heng Li
Journal:  Gigascience       Date:  2021-02-16       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.