| Literature DB >> 25725497 |
Martin Hunt1, Astrid Gall1, Swee Hoe Ong1, Jacqui Brener2, Bridget Ferns3, Philip Goulder2, Eleni Nastouli4, Jacqueline A Keane1, Paul Kellam5, Thomas D Otto1.
Abstract
MOTIVATION: An accurate genome assembly from short read sequencing data is critical for downstream analysis, for example allowing investigation of variants within a sequenced population. However, assembling sequencing data from virus samples, especially RNA viruses, into a genome sequence is challenging due to the combination of viral population diversity and extremely uneven read depth caused by amplification bias in the inevitable reverse transcription and polymerase chain reaction amplification process of current methods.Entities:
Mesh:
Year: 2015 PMID: 25725497 PMCID: PMC4495290 DOI: 10.1093/bioinformatics/btv120
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Example HIV-1 assemblies. Plots show the proportion of single base differences per mapped read compared to the IVA contig, the read depth and contigs from PRICE, Trinity and VICUNA aligned to the single IVA contig. The minimum read depth is 63
Summary of assembly QC results
| IVA | PRI | Tri | VIC | IVA | PRICE | Tri | VIC | |
|---|---|---|---|---|---|---|---|---|
| Ideal assemblies (%) | 57.1 | 11.9 | 14.3 | 2.4 | 21.4 | 0.0 | 1.0 | 0.0 |
| Mean reference bases assembled (%) | 97.9 | 97.2 | 89.8 | 98.3 | 98.8 | 89.8 | 97.6 | 94.3 |
| Mean % annotation transferred | 99.0 | 90.0 | 86.2 | 97.3 | 99.0 | 92.1 | 96.1 | 95.3 |
| Total assembly errors | 1 | 4 | 0 | 1 | 0 | 6 | 0 | 0 |
aHIV-1: the entire genome must be assembled into a unique contig. Influenza: each segment must be assembled into a unique contig.
bAn error is an inversion, relocation or translocation reported by GAGE. Numbers reported are the total across all assemblies. Supplementary Tables S1 and S2 expand on this table.
Fig. 2.Comparison of assembly success. (a) For each segment of the reference, the longest matching contig was found. This plot shows the total length of these contigs for each assembly, as a percentage of the reference length. (b) Total assembly lengths, excluding contamination by only counting contigs that match the reference, as a percentage of the reference length