| Literature DB >> 30878524 |
Florence Maurier1, Delphine Beury1, Léa Fléchon2, Jean-Stéphane Varré2, Hélène Touzet2, Anne Goffard1, David Hot1, Ségolène Caboche3.
Abstract
Genome sequencing of virus has become a useful tool for better understanding of virus pathogenicity and epidemiological surveillance. Obtaining virus genome sequence directly from clinical samples is still a challenging task due to the low load of virus genetic material compared to the host DNA, and to the difficulty to get an accurate genome assembly. Here we introduce a complete sequencing and analyzing protocol called V-ASAP for Virus Amplicon Sequencing Assembly Pipeline. Our protocol is able to generate the viral dominant genome sequence starting from clinical samples. It is based on a multiplex PCR amplicon sequencing coupled with a reference-free analytical pipeline. This protocol was applied to 11 clinical samples infected with coronavirus OC43 (HcoV-OC43), and led to seven complete and two nearly complete genome assemblies. The protocol introduced here is shown to be robust, to produce a reliable sequence, and could be applied to other virus.Entities:
Keywords: Bioinformatics; Complete genome; Coronavirus; High-throughput sequencing
Mesh:
Year: 2019 PMID: 30878524 PMCID: PMC7112119 DOI: 10.1016/j.virol.2019.03.006
Source DB: PubMed Journal: Virology ISSN: 0042-6822 Impact factor: 3.616
Fig. 1Schematic representation of the complete protocol for viral sequencing directly from clinical samples.
Fig. 2Difference observed between sequences generated from alignment combined with consensus extraction and full-amplicon based extraction.
Output comparison of V-ASAP and alignment-based approach using different reference sequences.
| Method | V-ASAP | Alignment | Alignment | Alignment | Alignment |
|---|---|---|---|---|---|
| Reference sequence | None | ||||
| SRR5121076 ( | |||||
| # contigs | 2 | 2 | 2 | 32 | 33 |
| Total length | 10162 | 10450 | 9546 | 2722 | 2889 |
| N50 | 5254 | 5275 | 4923 | 115 | 111 |
| # unaligned contigs | 0 | 0 | 0 | 13 | 14 |
| Genome fraction (%) | 95.796 | 98.482 | 89.989 | 21.380 | 20.249 |
| # Ns | 0 | 3 | 10 | 3 | 8 |
| # mismatches | 0 | 1 | 0 | 0 | 0 |
| # indels | 0 | 0 | 0 | 0 | 0 |
| SRR5121078 ( | |||||
| # contigs | 3 | 3 | 7 | 28 | 29 |
| Total length | 9965 | 10,061 | 8876 | 2340 | 2449 |
| N50 | 5254 | 5255 | 4584 | 115 | 111 |
| # unaligned contigs | 0 | 0 | 1 | 12 | 11 |
| Genome fraction (%) | 95.919 | 96.843 | 85.388 | 18.433 | 18.760 |
| # Ns | 0 | 0 | 0 | 1 | 23 |
| # mismatches | 0 | 0 | 0 | 0 | 0 |
| # indels | 0 | 0 | 0 | 0 | 0 |
| SRR5121079 ( | |||||
| # contigs | 2 | 2 | 4 | 29 | 33 |
| Total length | 10,162 | 10,209 | 9588 | 2569 | 2773 |
| N50 | 5254 | 2255 | 4924 | 111 | 107 |
| # unaligned contigs | 0 | 0 | 0 | 11 | 13 |
| Genome fraction (%) | 97.132 | 97.582 | 91.617 | 19.853 | 20.340 |
| # Ns | 0 | 0 | 13 | 27 | 9 |
| # mismatches | 0 | 0 | 1 | 0 | 0 |
| # indels | 0 | 0 | 0 | 0 | 0 |
Fig. 3Coronavirus genome amplification trials by RT-PCR for different fragment lengths. A: Capillary electrophoresis profiles of RT-PCR products of the obtained different fragments. B: Agarose gel-like profiles. L: ladder with the sizes alongside (in bp); a to j: RT-PCR product from ~500p to ~5000 bp with an increment-step of ~500 bp.
Fig. 4Study of the cluster size distribution for the 99 amplicons (PCR01 to PCR99) from the MDS4 sample. The top panel shows the number of reads for each amplicon (in black) and the number of reads in the biggest cluster (in gray). The bottom panel shows the proportion of each clusters for each amplicon. The proportion of the biggest cluster is in black and the proportion of clusters containing less than 1% of reads were summed up and are in gray.
Results and metrics from the sequencing of the 11 clinical samples.
| Sample | Viralload | Number of freezing/thawing | Library [c](nM) | Mappedreads (%) | Mergedreads | Amplicons <10reads | Amplicons <100reads | Contigs | Assemblysize (bp) |
|---|---|---|---|---|---|---|---|---|---|
| MDS1 | + | 3 | ND (7.9) | 92.03 | 2,270,961 | 25 | 41 | 8 | 28.915 |
| MDS2 | + | 3 | 4.76 | 98.87 | 1,972,807 | 0 | 2 | 1 | 30.665 |
| MDS4 | + | 2 | 6.97 | 99.02 | 1,811,280 | 0 | 3 | 1 | 30.664 |
| MDS5 | + | 6 | 0.46 (2.10) | 11.30 | 2,309,880 | 70 | 95 | 17 (+19) | 16.516 (+8738) |
| MDS6 | + | 3 | 3.49 | 97.45 | 1,971,524 | 1 | 7 | 1 | 30.664 |
| MDS11 | +++ | 2 | 5.77 | 99.03 | 1,973,800 | 0 | 2 | 1 | 30.665 |
| MDS12 | ++ | 3 | 0.63 (14.10) | 98.79 | 2,130,557 | 9 | 17 | 1 | 30.419 |
| MDS14 | ++ | 4 | 5.34 | 92.51 | 2,164,740 | 0 | 7 | 1 | 30.664 |
| MDS15 | ++ | 4 | 2.98 (45.20) | 98.47 | 2,118,341 | 4 | 10 | 2 | 30.471 |
| MDS16 | ++ | 3 | 2.43 (37) | 95.81 | 2,180,964 | 4 | 10 | 1 | 30.668 |
| PR2 | + | 3 | 0.29 (1.90) | 9.39 | 2,182,806 | 68 | 81 | 17 (+18) | 19.045 (+8062) |
ND: not detected. Numbers in brackets correspond to the concentration obtained after library amplification.
The “mapped reads” column shows the number of reads mapped against the KX344031 sequence after filtering of PhiX control reads.
Number of amplicons having a sequencing depth lower than 10 reads after merging.
Number of amplicons having a sequencing depth lower than 100 reads after merging.
Number of contigs produced with CAP3. Number of singlets are in brackets.
Size of the final assembly. Numbers in brackets correspond to the cumulative size of singlets.