| Literature DB >> 29114402 |
Nina Luhmann1,2, Daniel Doerr1,3, Cedric Chauve4.
Abstract
Yersinia pestis is the causative agent of the bubonic plague, a disease responsible for several dramatic historical pandemics. Progress in ancient DNA (aDNA) sequencing rendered possible the sequencing of whole genomes of important human pathogens, including the ancient Y. pestis strains responsible for outbreaks of the bubonic plague in London in the 14th century and in Marseille in the 18th century, among others. However, aDNA sequencing data are still characterized by short reads and non-uniform coverage, so assembling ancient pathogen genomes remains challenging and often prevents a detailed study of genome rearrangements. It has recently been shown that comparative scaffolding approaches can improve the assembly of ancient Y. pestis genomes at a chromosome level. In the present work, we address the last step of genome assembly, the gap-filling stage. We describe an optimization-based method AGapEs (ancestral gap estimation) to fill in inter-contig gaps using a combination of a template obtained from related extant genomes and aDNA reads. We show how this approach can be used to refine comparative scaffolding by selecting contig adjacencies supported by a mix of unassembled aDNA reads and comparative signal. We applied our method to two Y. pestis data sets from the London and Marseilles outbreaks, for which we obtained highly improved genome assemblies for both genomes, comprised of, respectively, five and six scaffolds with 95 % of the assemblies supported by ancient reads. We analysed the genome evolution between both ancient genomes in terms of genome rearrangements, and observed a high level of synteny conservation between these strains.Entities:
Keywords: Yersinia pestis; ancestral reconstruction; assembly; comparative genomics
Mesh:
Substances:
Year: 2017 PMID: 29114402 PMCID: PMC5643016 DOI: 10.1099/mgen.0.000123
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.Result of gap filling for both data sets. Note that if a gap is conflicting and IS-annotated, we assigned it to the conflicting group. We differentiated between gaps of length 0 (i.e. both markers are directly adjacent), completely and partially filled gaps, and not filled gaps (Tables S6 and S7).
Fig. 2.Comparison between the de novo assembly of the London strain (blue) and the Marseille strain (red) with the reference Y. pestis CO92. The inner links connect corresponding CARs in the reconstructions and the reference. Note that there is only a small inversion, marked in black among the grey links. The positions in both reconstructions covered by markers are indicated in green. All gaps that have IS annotations in the extant genomes are shown in orange. For CO92, all IS annotations are shown as well. In addition, gaps that are only partially filled or have very unconserved extant gap lengths are indicated in red. Finally, the outermost ring shows the mean read coverage in windows of length 200 bp in log scale. The figure was made with Circos [57].
Assembly statistics for both data sets, based on contigs with a minimal length of 500 bp. All program parameters are given in Table S9
The lap and cgal likelihoods have been computed based on all reads mapping to any of the reference sequences. Ragout and MeDuSa depend on the quality of the initial assembly in terms of assembled sequence length; hence, we omit results for the Minia assembly here and refer to Table S10.
| Strain | Assembly | No. of contigs | Total length (bp) | No of Ns | N50 | LAP | CGAL |
|---|---|---|---|---|---|---|---|
|
| SPAdes | 2555 | 3 792 691 | 0 | 1888 | −11.01048 | −6.90196e+08 |
| Minia | 4183 | 2 631 422 | 0 | 930 | −15.69016 | −7.98656e+08 | |
| SPAdes-Ragout | 1 | 4 068 385 | 776 139 | – | −12.52232 | −4.8192e+08 | |
| SPAdes-MeDuSa | 77 | 4 333 801 | 1917 | 700 415 | −7.97066 | −5.00106e+08 | |
| Minia-AGapEs | 5 | 4 441,104 | 0 (313 628) | 3 511 710 | −7.26576 | −3.55155e+08 | |
|
| SPAdes | 3201 | 6 072 375 | 0 | 4592 | −11.03336 | −6.0411e+08 |
| Minia | 3089 | 3 636 663 | 0 | 1368 | −15.05058 | −8.71446e+08 | |
| SPAdes-Ragout | 2 | 4 564 323 | 542 013 | 4 530 296 | −13.34526 | −5.84186e+08 | |
| SPAdes-MeDuSa | 2155 | 6 052 372 | 618 | 1 643 585 | −10.88342 | −6.12532e+08 | |
| Minia-AGapEs | 6 | 4 350 872 | 0 (184 003) | 3 459 919 | −8.05526 | −4.32647e+08 |
IS annotations in the London dataset identified by ISseeker in either draft assembly, AGapEs reconstruction or both
| SPAdes | AGapEs | SPAdes | Minia | AGapEs | Minia | |
|---|---|---|---|---|---|---|
| IS gap | 7 | 55 | 23 | 0 | 78 | 0 |
Fig. 3.Length of reconstructed sequence defined by markers, completely filled gaps, partially filled gaps (only the covered parts are considered) and unfilled gaps. The simulation parameters vary in terms of read coverage (cov) and simulated bacterial contamination (cont). Above each bar, the percentage of the reconstructed sequence supported by the aDNA reads is given.