| Literature DB >> 22253280 |
Tim Carver1, Simon R Harris, Thomas D Otto, Matthew Berriman, Julian Parkhill, Jacqueline A McQuillan.
Abstract
UNLABELLED: So-called next-generation sequencing (NGS) has provided the ability to sequence on a massive scale at low cost, enabling biologists to perform powerful experiments and gain insight into biological processes. BamView has been developed to visualize and analyse sequence reads from NGS platforms, which have been aligned to a reference sequence. It is a desktop application for browsing the aligned or mapped reads [Ruffalo, M, LaFramboise, T, Koyutürk, M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 2011;27:2790-6] at different levels of magnification, from nucleotide level, where the base qualities can be seen, to genome or chromosome level where overall coverage is shown. To enable in-depth investigation of NGS data, various views are provided that can be configured to highlight interesting aspects of the data. Multiple read alignment files can be overlaid to compare results from different experiments, and filters can be applied to facilitate the interpretation of the aligned reads. As well as being a standalone application it can be used as an integrated part of the Artemis genome browser, BamView allows the user to study NGS data in the context of the sequence and annotation of the reference genome. Single nucleotide polymorphism (SNP) density and candidate SNP sites can be highlighted and investigated, and read-pair information can be used to discover large structural insertions and deletions. The application will also calculate simple analyses of the read mapping, including reporting the read counts and reads per kilobase per million mapped reads (RPKM) for genes selected by the user. AVAILABILITY: BamView and Artemis are freely available software. These can be downloaded from their home pages: http://bamview.sourceforge.net/; http://www.sanger.ac.uk/resources/software/artemis/. Requirements: Java 1.6 or higher.Entities:
Mesh:
Year: 2012 PMID: 22253280 PMCID: PMC3603209 DOI: 10.1093/bib/bbr073
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1:Schematic of the workflow from a NGS experiment to the visualization of the results produced. The BamView window shown here displays two panels produced using the clone option. The ‘Strand Stack’ view in the top panel shows the reads mapped to the forward and reverse strands above and below the sequence line. The bottom panel shows the ‘Stack’ view of the reads. The (red) vertical lines in this panel show the differences in the read sequence to the reference. The SNP density plot is shown as well as the pop-up menu with the available views and options.
Summary of the BamView standalone command line options
| Option | Parameter type | Description |
|---|---|---|
| -h | Prints the command line options. | |
| -a | File | BAM/SAM file to display. |
| -r | File | Path to the reference sequence file (FASTA, EMBL, GenBank). |
| -n | Integer | Number of bases to display in the view. |
| -c | String | Chromosome name. |
| -v | IS, S, PS, ST or C | View used on opening: IS (inferred size), S (stack, default), PS (paired stack), ST (strand), C (coverage). |
| -b | Integer | Base position to open at. |
| -o | Show the read orientation. | |
| -pc | Plot the average coverage. | |
| -ps | Plot the SNP density (only with −r). |
If no options are specified then a prompt for the BAM file is shown.
Summary of the features and functionality in BamView
| Read views |
Stack Strand Stack Paired Stack Inferred Size Coverage Read Bases |
reads are piled up along y-axis. reads are split by their directionality. read pairs are stacked and connected. reads are plotted along they y-axis by this. plot of average coverage over a given window. shown when zoomed in to the nucleotide level. |
| Read filtering |
Using mapping quality. Using the read flag (e.g. proper pair). | |
| Zoom | From large sequence regions to the nucleotide level. | |
| Clone panels | The read panel can be cloned and each clone configured independently. | |
| Highlight |
Base regions by clicking and dragging. Read pairs by clicking on them. | |
| Display | SNPs are shown as vertical red lines on the reads. | |
| Read summary |
Placing the mouse over a read displays its details. Clicking on a read and right clicking will give the option to show the complete SAMTOOLS mapping information | |
| Base colour | At nucleotide resolution bases can be coloured by their mapping quality. | |
| Graphs |
SNP density plot. Coverage plot of the reads for each BAM loaded in. The plots are configurable by right clicking on the plot. | |
| Analyses |
Read count for selected features. RPKM for selected features used to investigate expression levels. | |
| File format | BAM sorted and indexed. It can be on a local and remote (HTTP/FTP) file system. | |
| When reference sequences are concatenated together (e.g. in a multiple FASTA) BamView will offset the read positions correctly by matching the names (e.g. locus_tag or label) of the features to the sequence name in the BAM. | ||
| Use cases |
Expression studies. Gene boundary identification. SNP confirmation. Identifies large structural variants (such as deletion, insertions). Assisting sequence assembly, e.g. identifying breakpoints and duplicated repeats. | |
Figure 2:The filter options for sequence reads. A mapping quality cut-off (MAPQ) can be applied and reads can be explicitly filtered in or out based on their flag. The filter shown will hide all unmapped reads and only show the reads that are defined as a proper pair.
Figure 3:Chlamydia trachomatis genome and plasmid sequences in Artemis at different levels of resolution. (A) Coverage is plotted in the BamView panel showing an increase at the boundary between the genome (orange) and plasmid (brown) sequence. The plasmid has a higher copy number and so has a higher read coverage. (B) Shows a smaller region (∼18 kb) of the plasmid with the reads plotted in the BamView panel along the y-axis by the log of the read-pair's inferred size. Vertical (red) lines on the reads indicate differences to the reference sequence. This view reveals the presence of an indel by the increase in the inferred size and drop in coverage. (C) Showing a region at the nucleotide level and highlighting a position where there appears to be a SNP. (D) One of the reads has been highlighted and the detailed SAM information is opened in a separate window.
Figure 4:Correction of annotation with RNA-Seq data in Artemis. (A) The gene has two exons and the splice site can be identified by split reads in the BamView top panel. The gray lines represent split reads and show splice sites. The split reads provide evidence for a different splice site (black arrows). Therefore this gene was corrected [4]. (B) The middle panel shows the coverage plots for three different time points. This gene is expressed mostly at time point 0 h and the coverage plots also pinpoint the position of the splice site. Two different representations of gene PF10_0022 of Plasmodium falciparum are shown (C), an incorrect old version and a corrected version.
Figure 5:ACT example of two Plasmodium berghei genome assemblies. A BamView panel is loaded for each assembly to identify a miss-assembly in the top sequence (A). The bottom sequence (C) is the de novo assembly. The vertical gray bars (B) are BLAST matches (with a 99% identity cut-off) that indicate synteny and the black bars inversions. The white boxes on the forward strand indicate sequencing gaps, the other boxes in the top genome are gene models. In each BamView panel the view is cloned to show a coverage plot of mapped Illumina reads and the insert size view of mapped 454 reads. In the corrected genome (C) the coverage is more even and does not drop except in the gap regions. The outer BamView show mapped 454 reads with an insert size of 20 k. The read filter was used to just show proper pairs. Far fewer read pairs confirm the top assembly. As the new assembly is more covered with 454 mate pairs, the assembly seems to be correct.