| Literature DB >> 31788019 |
Zhaozhao Dai1, Tong Li1, Jiadong Li1, Zhifei Han1, Yonglong Pan1, Sha Tang2, Xianmin Diao2, Meizhong Luo1.
Abstract
BACKGROUND: Large insert paired-end sequencing technologies are important tools for assembling genomes, delineating associated breakpoints and detecting structural rearrangements. To facilitate the comprehensive detection of inter- and intra-chromosomal structural rearrangements or variants (SVs) and complex genome assembly with long repeats and segmental duplications, we developed a new method based on single-molecule real-time synthesis sequencing technology for generating long paired-end sequences of large insert DNA libraries.Entities:
Keywords: Ampicillin resistance gene tag; Assembly error; De novo assembly; Fosmid; Long paired-end; Mate-pair; PacBio; Structural rearrangement
Year: 2019 PMID: 31788019 PMCID: PMC6878638 DOI: 10.1186/s13007-019-0525-6
Source DB: PubMed Journal: Plant Methods ISSN: 1746-4811 Impact factor: 4.993
Fig. 1The pipeline of Fosmid-size long paired-end library construction. The red area represents the vector, the blue area represents the large inserted genomic fragment, and the yellow area represents the Ampicillin resistance gene tag. The Fosmid clones were pooled together, and DNA was extracted for paired-end library construction. Pooled Fosmid plasmid DNA was sheared into ~ 15 kb fragments by g-TUBE (Covaris). It generated insert only, vector with single-ends and vector with paired-ends. Then, these DNA fragments were end repaired and gel purified for ligation with the Ampicillin resistance gene tag. Although all fragments could be ligated to the Ampicillin resistance gene tag, only those containing the chloramphenicol resistant gene and oriV ligated to an Amp tag were screened out with double resistance to chloramphenicol and ampicillin after transformation. Finally, the vector was removed by I-SceI and the paired-end fragments with the Amp tag were sequenced on PacBio
Fig. 2The maps of the vectors pcc2FOS and pHZAUFOS3.A is the map of pcc2FOS. NotI was used to release the insert and the lacZ fragment was outside of CmR and oriV. B is the map of pHZAUFOS3. The LacZ fragment was moved between CmR and oriV; the two I-SceI sites adjacent to LacZ were used to release the insert, and another two I-SceI sites were used to break the vector skeleton into small fragments (2–3 kb)
Summarized statistics for the four Fosmid-size paired-end libraries
| Sample | FESa number | FES-1b N50 (bp) | FES-1 average length (bp) | FES-1 total bases (bp) | FES-2c N50 (bp) | FES-2 average length (bp) | FES-2 total bases (bp) | |
|---|---|---|---|---|---|---|---|---|
| Y1 | S288C_1 | 35,510 | 3066 | 2004 | 71,170,214 | 3112 | 2014 | 71,513,294 |
| Y2 | S288C_2 | 17,844 | 2742 | 1884 | 33,626,713 | 2709 | 1845 | 32,925,281 |
| Yugu1_1 | 20,119 | 2466 | 1656 | 33,311,652 | 2435 | 1642 | 33,039,852 | |
| Yugu1_2 | 5476 | 2316 | 1663 | 9,104,650 | 2327 | 1618 | 8,862,453 | |
| S1 | Yugu1_3 | 295 | 2381 | 1725 | 508,797 | 2509 | 1889 | 557,384 |
| Yugu1_4 | 21,657 | 3484 | 2220 | 48,077,465 | 3345 | 2180 | 47,212,605 | |
| Yugu1_5 | 4546 | 3391 | 2419 | 10,995,363 | 3455 | 2449 | 11,133,474 | |
| Yugu1_6 | 15,127 | 2556 | 1642 | 24,838,345 | 2496 | 1613 | 24,405,221 | |
| S2 | Yugu1_t | 75,047 | 2853 | 2060 | 154,610,364 | 2850 | 2057 | 154,381,829 |
aFES Fosmid end sequence, bFES-1 Fosmid left-end sequence, cFES-2 Fosmid right-end sequence
Fig. 3Length distribution of genomic distance spanned by Fosmid-size paired-end sequences. Smoothed histograms of the spacing between unique read pairs in Fosmid size paired-end libraries are shown for the S. cerevisiae S288C library Y1 (grey) and Y2 (black) (A) and the S. italica Yugu1 library S1 (grey) and S2 (black) (B) aganist their respective reference genomes. The y-axis represents percentage of all unique read pairs that fall in the 1-kb bin. The x-axis represents the distance between read pairs
Fig. 4Genome alignments between scaffolds and reference. A is the comparison results of the assembly from the simulated sequencing depth of 30 × and the Y1 Fosmid long paired ends covering tenfold of the S. cerevisiae S288C physical genome with reference. B is the comparison results of the assembly from the simulated sequencing depth of 30 × and the Y2 Fosmid long paired-ends covering fivefold of the S. cerevisiae S288C physical genome with reference. The plot shows the best (1-to-1) alignments between the reference (x-axis) and each assembly (y-axis). Red lines indicate forward-strand matches while blue lines indicate reverse-complement matches. Dashed vertical lines delineate chromosome ends while dashed horizontal lines delineate contigs. A diagonal indicates concordant matches while off-diagonal matches indicate assembly errors or differences versus the reference
Summarized statistics for the assembly of Setaria italica Yugu18
| Name | num_seqs | sum_len | avg_len | max_len | N50 | < 30 kb |
|---|---|---|---|---|---|---|
| Yugu18_contigs | 383 | 407,498,629 | 1,063,965 | 12,402,311 | 3,758,082 | 165 |
| Yugu18_scaffold | 330 | 407,887,709 | 1,236,023 | 14,943,871 | 5,196,440 | 164 |
Yugu18_ contigs: assembly of the whole-genome sequences from PacBio only
Yugu18_scaffold: assembly of the whole-genome sequences from PacBio and the long paired ends of S1 and S2
Examples of rearrangements in the Yugu1 genomeidentified by long paired ends
| Support read number | Support read number | SV type | SV length (bp) | Coordinate (bp) |
|---|---|---|---|---|
| 12 | 81 | Deletion | 58,435 | Chr: NC_028457.1 39834375–39892810 |
| 11 | 32 | Duplication | 33,248 | Chr: NC_028455.1 29621517–29654765 |
| 10 | 47 | Duplication | 49,573 | Chr: NC_028452.1 16482798–16532371 |
| 10 | 43 | Translocation | Non | NC_028452.1 28454982 NC_028451.1 596035 |
| 9 | 59 | Translocation | Non | NC_028453.1 23754472 NC_028452.1 6538935 |
SV: structural arrangement; Coordinate: the location of SVs; NC_: chromosome; NW: scaffold
Examples of rearrangements in the Yugu1 genome identified by long single ends
| Support read number | Support read number | SV type | SV length (bp) | Coordinate (bp) |
|---|---|---|---|---|
| 15 | 45 | Inversion | 8091 | Chr: NC_028458.1 49998519–50006610 |
| 12 | 17 | Deletion | 1596 | Chr: NC_028451.1 33653189–33654785 |
| 9 | 47 | Duplication | 1883 | NW_014576740.1 62365–64247 |
| 7 | 37 | Deletion | 359 | Chr: NC_028450.1 25710225–25710584 |
| 7 | 11 | Duplication | 2400 | Chr: NC_028455.1 4360170–4362570 |
SV: structural arrangement; Coordinate: the location of SVs; NC_: chromosome; NW: scaffold