| Literature DB >> 29792315 |
Kristian A Stevens1, Keith Woeste2, Sandeep Chakraborty3, Marc W Crepeau4, Charles A Leslie3, Pedro J Martínez-García3, Daniela Puiu5, Jeanne Romero-Severson6, Mark Coggeshall2, Abhaya M Dandekar3, Daniel Kluepfel7, David B Neale3, Steven L Salzberg5,8, Charles H Langley4.
Abstract
Genomic analysis in Juglans (walnuts) is expected to transform the breeding and agricultural production of both nuts and lumber. To that end, we report here the determination of reference sequences for six additional relatives of Juglans regia: Juglans sigillata (also from section Dioscaryon), Juglans nigra, Juglans microcarpa, Juglans hindsii (from section Rhysocaryon), Juglans cathayensis (from section Cardiocaryon), and the closely related Pterocarya stenoptera While these are 'draft' genomes, ranging in size between 640Mbp and 990Mbp, their contiguities and accuracies can support powerful annotations of genomic variation that are often the foundation of new avenues of research and breeding. We annotated nucleotide divergence and synteny by creating complete pairwise alignments of each reference genome to the remaining six. In addition, we have re-sequenced a sample of accessions from four Juglans species (including regia). The variation discovered in these surveys comprises a critical resource for experimentation and breeding, as well as a solid complementary annotation. To demonstrate the potential of these resources the structural and sequence variation in and around the polyphenol oxidase loci, PPO1 and PPO2 were investigated. As reported for other seed crops variation in this gene is implicated in the domestication of walnuts. The apparently Juglandaceae specific PPO1 duplicate shows accelerated divergence and an excess of amino acid replacement on the lineage leading to accessions of the domesticated nut crop species, Juglans regia and sigillata.Entities:
Keywords: Juglans; genomic variation; polyphenol oxidase; reference genomes; walnut
Mesh:
Year: 2018 PMID: 29792315 PMCID: PMC6027890 DOI: 10.1534/g3.118.200030
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Selected targets and the basic statistics for the unassembled genomes. The genome sequences assembled in this study all have 2n = 32 chromosomes. The Illumina sequence reads and the corresponding genome sizes estimates from the 31-mer analysis of the paired end reads are given. Qualitative levels of heterozygosity, 1 = highest 7 = lowest, are based on quantitatively ranking the 31-mer distributions by relative proportion of the two peaks
| Taxonomy Properties | |||||||
|---|---|---|---|---|---|---|---|
| (2n) | 32 | 32 | 32 | 32 | 32 | 32 | 32 |
| name | ‘Rawlins’ | ‘Sparrow’ | ‘Wild Walnut’ | ’83-129’ | ‘Chandler’ | ’83-13’ | |
| accession | DJUG105 | DJUG11.03 | DJUG29.11 | DJUG951.04 | 64-172 | DPTE1.09 | |
| source | NCGR | MU | NCGR | NCGR | NCGR | UCD | NCGR |
| Paired end reads | 264,112,180 | 846,241,271 | 249,382,312 | 260,534,438 | 787,524,840 | 219,992,493 | 260,634,420 |
| Mate-pairs | 71,229,807 | 57,101,723 | 75,354,980 | 78,329,874 | 54,720,606 | 63,339,005 | 82,902,639 |
| Total 31-mers | 5.77×1010 | 5.54×1010 | 5.42×1010 | 5.71×1010 | 5.65×1010 | 5.71×1010 | 5.58×1010 |
| Haploid 31-mer depth | n/a | 24 | 23 | 23 | 24 | 25 | 23 |
| Diploid 31-mer depth | 50 | 47 | 47 | 47 | 47 | 50 | 47 |
| Genome size estimate | 5.77×108 | 5.83×108 | 5.82×108 | 5.71×108 | 5.94×108 | 5.71×108 | 6.00×108 |
| Relative heterozygosity | 6 | 3 | 5 | 2 | 3 | 4 | 1 |
The genome size estimates for these the genomes are derived from the paired end sequence using 31-mer histograms as described in methods.
Figure 1The 31-mer histograms of our paired end sequence data. Each histogram shows a bimodal distribution typical of diploid heterozygous genome. The relative fraction of the distribution under the left (haploid) peak is proportional to the genome heterozygosity. Using the relative proportions of the two peaks the genomes can be ranked by their heterozygosity (Table 1).
Assembly statistics for our six genomes. The original v1.0 J. regia assembly, constructed using similar methods, is included for comparison. As an additional resource and for validation purposes, we also included a v1.5 J. regia assembly which incorporates light coverage of PacBio sequences for improved contiguity
| Species | Assembly size | Scaffolds | N50 Scaffold |
|---|---|---|---|
| 640,895,151 | 232,579 | 244,921 | |
| 643,318,433 | 273,094 | 470,924 | |
| 797,890,490 | 332,634 | 145,095 | |
| 941,867,385 | 329,873 | 135,837 | |
| 668,759,554 | 282,224 | 200,575 | |
| 991,966,387 | 396,056 | 148,559 | |
| 712,759,961 | 186,636 | 241,714 | |
| 651,682,552 | 4,402 | 639,948 |
CEGMA Core gene results for the genome assemblies of all six Juglans species and the outgroup P. stenoptera. CEGMA: Complete and Partial record the number and fraction of all 248 ultra-conserved CEGs present in the assembly as a complete or partial annotation respectively. Partial annotations use a more liberal cutoff that includes all complete annotations. BUSCO: The number and percentage of 1440 single copy Embryophyte genes present in the assembly. These results are further broken down into single-copy and duplicated genes. *Summary results for v1.0 and v1.5 assemblies were the same for both analyses
| CEGMA | |||||||
| Complete | 207 | 201 | 206 | 203 | 201 | 201 | 205 |
| >% | 83.47 | 81.05 | 83.06 | 81.85 | 81.05 | 81.05 | 82.66 |
| Partial | 235 | 232 | 235 | 239 | 238 | 238 | 234 |
| % | 94.76 | 93.55 | 94.76 | 96.37 | 95.97 | 95.97 | 94.35 |
| BUSCO | |||||||
| Complete | 1330 | 1346 | 1370 | 1357 | 1343 | 1320 | 1323 |
| % | 92% | 93% | 95% | 94% | 93% | 92% | 92% |
| Single-copy | 1005 | 1198 | 1071 | 1187 | 1185 | 780 | 743 |
| % | 70% | 83% | 74% | 82% | 82% | 54% | 52% |
| Duplicated | 325 | 148 | 299 | 170 | 158 | 540 | 580 |
| % | 23% | 10% | 21% | 12% | 11% | 38% | 40% |
| Fragmented | 32 | 26 | 14 | 28 | 28 | 37 | 41 |
| % | 2% | 2% | 1% | 2% | 2% | 3% | 3% |
| Missing | 78 | 68 | 56 | 55 | 69 | 83 | 76 |
| % | 5% | 5% | 4% | 4% | 5% | 6% | 5% |
Pairwise genome alignment statistics (top) The percent coverage is calculated for each ordered pair as the percentage of the reference genome covered by the aligned query genome. (bottom) Divergence is calculated for each ordered pair of aligned query to reference genomes. For both metrics, the highest values belonged to pairs of genomes within the same Juglans section
Figure 2Genome wide phylogenetic trees. (a) An unrooted neighbor joining tree reconstructed from genome wide pairwise divergence estimates. The tree is drawn to scale with the bar representing 0.1 nucleotide substitutions per site. (b) Rooted maximum likelihood trees constructed from the curated nucleotide alignments of single copy BUSCO orthologs appearing in all seven genomes. The scale bar represents 0.005 nucleotide substitutions per site. Nucleotide distances and the number of bootstrap replicates supporting the split are noted on each edge. (c) Juglans chronogram calibrated from (b) estimating section level divergence times (MYA).
The count of single nucleotide polymorphisms and a corresponding estimate of nucleotide diversity from re-sequenced population samples from four Juglans species. The individual accessions are described in Supplementary Table S2
| Species | Number of individuals | Re-sequenced depth | Filtered single nucleotide polymorphisms | Nucleotide diversity π |
|---|---|---|---|---|
| 10 | 90.8X | 942,379 | ||
| 12 | 87.2X | 4,427,957 | ||
| 13 | 1525X | 11,003,383 | ||
| 27 | 1620X | 9,619,940 |
Figure 3Orthologs, paralogs, and alleles of PPO1 and PPO2 in the six Juglans species and the outgroup P. stenoptera. Figure illustrates the location, order, and orientation of the PPO gene family in each assembly. Copies identified as haploid alleles are gray. A allelic copy of PPO1 interrupted by an insertion was also noted in J. microcarpa. Lineages with positive K are marked in red on the dendogram to the right. [a,b] PPO1 K 0.006/0.002 PPO2 K 0.001/0 [b,c] PPO1 K 0.03/0.01 [c,] PPO1 K 0.002/0. Inset: Comparing J.regia v1.5 (top) and v1.0 (bottom). In J.regia v1.5 the two genes are tandem and the contiguous interval between reveals a novel repetitive sequence with homology to FAR1 and the potential cause of the original assembly issue.
Figure 4Desktop genome browser sessions using JBrowse. The PPO1 and PPO2 region of scaffold896 in J. regia v1.5. The gene regions for PPO1 and PPO2 are aligned to the same scaffold in assemblies as divergent as the outgroup P. stenoptera. An apparent excess divergence in J. regia coincides with a lineage specific insertion of a 10kbp FAR1 domain containing repeat. At this scale only SNP density is visible. Zooming in would reveal the 8 sites overlapping PPO1 and the 20 sites overlapping PPO2.