| Literature DB >> 31712730 |
Weihong Qi1, Andrea Colarusso2, Miriam Olombrada3,4, Ermenegilda Parrilli2, Andrea Patrignani5, Maria Luisa Tutino6, Macarena Toll-Riera7,8.
Abstract
Pseudoalteromonas haloplanktis TAC125 is among the most commonly studied bacteria adapted to cold environments. Aside from its ecological relevance, P. haloplanktis has a potential use for biotechnological applications. Due to its importance, we decided to take advantage of next generation sequencing (Illumina) and third generation sequencing (PacBio and Oxford Nanopore) technologies to resequence its genome. The availability of a reference genome, obtained using whole genome shotgun sequencing, allowed us to study and compare the results obtained by the different technologies and draw useful conclusions for future de novo genome assembly projects. We found that assembly polishing using Illumina reads is needed to achieve a consensus accuracy over 99.9% when using Oxford Nanopore sequencing, but not in PacBio sequencing. However, the dependency of consensus accuracy on coverage is lower in Oxford Nanopore than in PacBio, suggesting that a cost-effective solution might be the use of low coverage Oxford Nanopore sequencing together with Illumina reads. Despite the differences in consensus accuracy, all sequencing technologies revealed the presence of a large plasmid, pMEGA, which was undiscovered until now. Among the most interesting features of pMEGA is the presence of a putative error-prone polymerase regulated through the SOS response. Aside from the characterization of the newly discovered plasmid, we confirmed the sequence of the small plasmid pMtBL and uncovered the presence of a potential partitioning system. Crucially, this study shows that the combination of next and third generation sequencing technologies give us an unprecedented opportunity to characterize our bacterial model organisms at a very detailed level.Entities:
Mesh:
Year: 2019 PMID: 31712730 PMCID: PMC6848147 DOI: 10.1038/s41598-019-52832-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Sequencing output metrics.
| ONT | PacBio | Illumina | |
|---|---|---|---|
| Input DNA | HMW DNA, without shearing | HMW DNA, sheared and size selected for fragments longer than 10 kb | DNA was isolated using the DNeasy Blood & Tissue kit (QIAGEN) |
| Library preparation kit | ONT 1D ligation sequencing kit | PacBio P6 DNA/Polymerase binding kit 2.0 | Illumina TruSeq |
| Sequencer | GridIon X5 | PacBio RSII | HiSeq. 4000 |
| Run time | 24 hours | 6 hours | 3.5 days |
| Num. reads | 194,538 | 92,873 | 3,452,040 |
| Num. bases (bp) | 2,293,338,560 | 779,603,216 | 1,035,612,000 |
| Read N50 (bp) | 23,927 | 12,153 | 2 × 150 |
| Longest read (bp) | 183,036 | 69,046 | 2 × 150 |
| Mean read length (bp) | 11,789 | 8,394 | 2 × 150 |
| Estimated coveragea | 573 X | 195 X | 259 X |
| Average Phred quality | 9.9 | 8.2 | 39 |
| General alignment error rateb | 11.19% | 17.72% | 0.2% |
| Insertions | 40,665,761 | 46,593,977 | 2,014 |
| Mapped reads with at least one insertion | 97.89% | 93.35% | 0.03% |
| Deletions | 51,868,727 | 19,348,548 | 3,757 |
| Mapped reads with at least one deletion | 97.95% | 93.01% | 0.06% |
| Mapped reads | 88.65% | 95.17% | 97.76 |
| Clipped mapped reads | 86.86% | 83.36% | 0.39% |
aAssuming a genome size of 4 Mb.
bComputed as a ratio of total collected edit distance to the number of mapped bases.
Assembly statistics of circularized, trimmed and polished genome drafts.
| ONT | PacBio | Illumina | ||
|---|---|---|---|---|
| Assembler | Canu | HGAP3 | Canu | SPAdes |
| Circularizing and trimming | amos, minimus2 | amos, minimus2 | amos, minimus2 | NA |
| Aligner | bwa | blasr | blasr | NA |
| Sequencer-specific consensus polishing | Nanopolish | Quiver | Quiver | NA |
| Polishing using Illumina reads | Pilon | Pilon | Pilon | NA |
| Num. Contigs | 3 | 3 | 3 | 109 |
| Total Length (bp) | 3,996,798 | 3,940,687 | 3,913,837 | 3,883,161 |
| N50 (bp) | 3,295,052 | 3,240,603 | 3,213,753 | 414,366 |
| GC% | 40.07 | 40.07 | 40.07 | 39.97 |
| Num. substitution errors corrected using Illumina reads | 2,069 | 0 | 0 | NA |
| Num. InDel errors corrected using Illumina reads | 3,253 | 376 | 386 | NA |
| Reference genome coverage (%) | 100 | 100 | 100 | 98.89 |
| Average identity to the reference genome (%) | 99.98 | 99.99 | 99.99 | 99.99 |
| Num. residual SNPs | 53 | 40 | 40 | 34 |
| Num. residual InDels | 87 | 24 | 24 | 28 |
| Miss assemblies* | 2 | 1 | 1 | 0 |
| Num. Ns | 82 | 0 | 0 | 0 |
*One reported miss assembly is actually due to an assembly error in the reference genome (Supplementary Fig. S3).
Figure 1Effects of sequencing coverage on the consensus accuracy of Canu assemblies of ONT and PacBio reads.
Figure 2Schematic representation of pMEGA. Genes are depicted as arrows in the outermost circle; the arrowhead indicates the direction of transcription. Arrows coloured in red are involved in plasmid housekeeping functions (replication, partition, stability). Black arrows indicate genes involved in DNA rearrangements, orange arrows genes involved in metabolic functions, in navy blue defence genes, in green genes involved in mutagenesis, in purple in proteolysis and grey indicates genes with unknown function. The second outer circle depicts homology to Pseudoalteromonas arctica plasmid (>50% nucleotide identity); the third circle indicates homology to Pseudoalteromonas nigrifaciens plasmid (>50% nucleotide identity). The fourth and the fifth circle indicate homology to P. haloplanktis TAC125 chromosome I and II, respectively (>50% nucleotide identity). The intensity of the colour indicates the % of nucleotide identity, the more intense the colour is, the higher the % of identity is. The innermost circle represents the GC content.
Figure 3ORFs analysis of pMtBL. (a) pMtBL map. The OriR is highlighted in black. Manually analysed putative ORFs are represented as thick arrows. (b) orf2 expression analysis using end-point RT-PCR. After total RNA extraction a cDNA was synthetized using the primer pMtBL_B7_rv specific for orf2. Then PCR reactions with primers pMtBL_A4_fw and pMtBL_B7_rv were performed on the cDNA obtained from total P. haloplanktis TAC125 RNA after growth in GG (lane 2) and TYP media (lane 3). The PCR reaction was also carried out directly on RNA extracted after growths in GG (lane 4) and TYP (lane 5) and on total P. haloplanktis TAC125 DNA (lane 6). The expected amplicon of <100 bp was obtained only in the reactions where either the cDNA (lanes 2 and 3) or the total bacterial DNA (lanes 6) were used as templates. Total RNA templates did not lead to any amplification demonstrating the absence of DNA cross-contamination (lanes 4 and 5). Lane 1, 1 kb NEB marker. Full-length gel is presented in Supplementary Fig. S7.
Figure 4Schematic diagram of pMtBL derivative shuttle vectors and their segregational stability. (a) Overview of the extent of pMtBL regions included in each vector series. pGEM-T-MtBL encompasses the entire pMtBL plasmid; MAV was developed only introducing the psychrophilic OriR[4]. (b) Retention of plasmids representative of each family derived from pMtBL without antibiotic selection. Each experiment was carried out as biological duplicates.