| Literature DB >> 31816089 |
Gareth Linsmith1,2,3, Stephane Rombauts1,2, Sara Montanari4, Cecilia H Deng5, Jean-Marc Celton6, Philippe Guérif6, Chang Liu7, Rolf Lohaus1,2, Jason D Zurn8, Alessandro Cestaro3, Nahla V Bassil8, Linda V Bakker9, Elio Schijlen9, Susan E Gardiner10, Yves Lespinasse6, Charles-Eric Durel6, Riccardo Velasco11, David B Neale4, David Chagné10, Yves Van de Peer1,2,12, Michela Troggio3, Luca Bianco3.
Abstract
BACKGROUND: We report an improved assembly and scaffolding of the European pear (Pyrus communis L.) genome (referred to as BartlettDHv2.0), obtained using a combination of Pacific Biosciences RSII long-read sequencing, Bionano optical mapping, chromatin interaction capture (Hi-C), and genetic mapping. The sample selected for sequencing is a double haploid derived from the same "Bartlett" reference pear that was previously sequenced. Sequencing of di-haploid plants makes assembly more tractable in highly heterozygous species such as P. communis.Entities:
Keywords: Hi-C; Pac-Bio sequencing; Pyrus communis L; chromosome-scale assembly
Mesh:
Year: 2019 PMID: 31816089 PMCID: PMC6901071 DOI: 10.1093/gigascience/giz138
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:(a) 17-mer frequency distribution of diploid P× bretschneideri. Using KAT [13] v2.3.4, 17-mers were counted in all whole-genome shotgun PE reads. The density plot of the number of unique k-mer species (y axis) for each k-mer frequency (x axis) is plotted. The homozygous peak is observed at a multiplicity (k-mer coverage) of 86×, while the heterozygous peak is observed at 43×. (b) 17-mer frequency distribution of DH P. communis(BartlettDHv2.0). Using KAT [13] v2.3.4, 17-mers were counted in all whole-genome shotgun PE reads. The density plot of the number of unique k-mer species (y axis) for each k-mer frequency (x axis) is plotted. The homozygous peak is observed at a multiplicity (k-mer coverage) of 86×, while no heterozygous peak is observed.
Genome assembly metrics
| Metric | Total assembled (Mb) | % Genome | N50 (MB) | No. of sequences |
|---|---|---|---|---|
| Contigs | 501 | 94.8 | 5.3 | 620 |
| Scaffolds | 496.9 | 94.0 | 6.5 | 592 |
| Anchored into chromosomes | 445.1 | 84.2 | 26.2 | 17 |
| LG0 | 51.8 | 9.8 | 0.19 | 477 |
Figure 2:Marey plot of Chr1 with heat map of dispersed repeats and genes in bins of 200 kb. The lighter the colour the more elements are present. Genetic positions refer to the high-density map of Bartlett. Dots represent the genetic and physic position (on BartlettDHv2.0) of 11,474 SNPs.
Summary statistics of best Canu and best Falcon contig assemblies
| Assembly | Total assembled (Mb) | % Genome | N50 (Mb) | No. of contigs | >140 kb (Mb) |
|---|---|---|---|---|---|
| Canu | 501 | 94.8 | 5.3 | 620 | 479.6 |
| Falcon | 515 | 97.5 | 2.4 | 1,282 | 483.6 |
Summary statistics of the Canu and Falcon hybrid assemblies combined with the Bionano optical mapping data
| Hybrid assembly | Bionano incorporated (Mb) | % Genome | N50 (Mb) | No. of scaffolds | No. of conflicts with optical map |
|---|---|---|---|---|---|
| Canu + Bionano | 459.2 | 87.0 | 8.1 | 123 | 13 |
| Falcon + Bionano | 451.4 | 85.4 | 3.5 | 214 | 38 |
Summary statistics of gene annotations from selected Rosaceae species, Pyrus communis(BartlettDHv2.0, Bartlettv1.0 assembly [9]), Pyrus×bretschneideri [8], Malus×domestica (GDDH13) [2], Fragaria vesca [17]
| Statistic |
|
|
|
|
|
|---|---|---|---|---|---|
| Predicted genes | 37,445 | 45,217 | 42,812 | 28,588 | 44,105 |
| Mean CDS length (nt) | 1,120 | 1,209 | 1,346 | 1,177 | 1,167 |
| Mean exon length (CDS only) (nt) | 222 | 236 | 285 | 297 | 282 |
| Mean intron length (nt) | 296 | 352 | 346 | 407 | 689 |
| Mean exons per gene | 5 | 5 | 5 | 5 | 5 |
| Single-exon genes | 6,789 | 11,268 | 12,309 | 5,004 | 6,350 |
| Genes per 100 kb | 7.6 | 9.1 | 8.5 | 62.3 | 6.7 |
Figure 3:Plot of protein clusters shared by 9 species, P.×bretschneideri [8], P. communis(BartlettDHv2.0), M.×domestica [2], F. vesca [17], P. persica [18], R. chinensis [19], R. occidentalis [1], V. vinifera [20], and A. thaliana [21].
Figure 4:(a, b, c) Paralog KS distributions of P. communisBartlettDHv2.0, P. × bretschneideri [8], and M. × domestica GDDH13 [2] (grey histograms and line, left-hand y-axes; a peak represents a WGD event) and 1-to-1 ortholog KS distributions between indicated species (blue and red filled curves of kernel-density estimates, right-hand y-axes; a peak represents a species divergence event).
Figure 5:Chromosome 1 alignment dot plots. Dot plots are produced using the DGENIE software [30] and alignments with minimap2 (v2.16). (a) Dot plot of Chromosome 1 P.×bretschneideri to P. communis—BartlettDHv2.0 (top left). (b) Dot plot of Chromosome 1 P. communis BartlettDHv2.0 to M.×domestica—GDDH13 (top right). (c) Dot plot of Chromosome 1 P.×bretschneideri to M.×domestica—GDDH13 (bottom left). (d) Dot plot of Chromosome 1 P. communis—Bartlettv1.0 to P. communis—BartlettDHv2.0 (bottom right).
Figure 6:Self-collinearity of P. communis (BartlettDHv2). The coloured lines link collinearity blocks representing syntenic regions that were identified by MCScanX.
Figure 7:Duplication depth of F. vesca gene homologs in P. communis (BartlettDHv2). Inter-species collinearity between F. vesca[17] and P. communis was interrogated using MCScanX and at each gene locus of the F. vesca assembly the number of P. communis—F. vesca inter-species collinear blocks (duplication depth) was counted. The number of F. vesca gene homologs having each copy number in the P. communis (BartlettDHv2) assembly is then plotted. It can be seen that most gene loci from F. vesca occur twice in P. communis.