| Literature DB >> 30898899 |
Carolina Bernhardsson1,2,3, Amaryllis Vidalis4,5, Xi Wang4,3, Douglas G Scofield4,6,7, Bastian Schiffthaler8, John Baison2, Nathaniel R Street8, M Rosario García-Gil2, Pär K Ingvarsson1,3.
Abstract
Norway spruce (Picea abies (L.) Karst.) is a conifer species of substanital economic and ecological importance. In common with most conifers, the P. abies genome is very large (∼20 Gbp) and contains a high fraction of repetitive DNA. The current P. abies genome assembly (v1.0) covers approximately 60% of the total genome size but is highly fragmented, consisting of >10 million scaffolds. The genome annotation contains 66,632 gene models that are at least partially validated (www.congenie.org), however, the fragmented nature of the assembly means that there is currently little information available on how these genes are physically distributed over the 12 P. abies chromosomes. By creating an ultra-dense genetic linkage map, we anchored and ordered scaffolds into linkage groups, which complements the fine-scale information available in assembly contigs. Our ultra-dense haploid consensus genetic map consists of 21,056 markers derived from 14,336 scaffolds that contain 17,079 gene models (25.6% of the validated gene models) that we have anchored to the 12 linkage groups. We used data from three independent component maps, as well as comparisons with previously published Picea maps to evaluate the accuracy and marker ordering of the linkage groups. We demonstrate that approximately 3.8% of the anchored scaffolds and 1.6% of the gene models covered by the consensus map have likely assembly errors as they contain genetic markers that map to different regions within or between linkage groups. We further evaluate the utility of the genetic map for the conifer research community by using an independent data set of unrelated individuals to assess genome-wide variation in genetic diversity using the genomic regions anchored to linkage groups. The results show that our map is sufficiently dense to enable detailed evolutionary analyses across the P. abies genome.Entities:
Keywords: Norway spruce; Picea abies; genetic map; genome assembly; sequence capture
Mesh:
Substances:
Year: 2019 PMID: 30898899 PMCID: PMC6505157 DOI: 10.1534/g3.118.200840
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Overview of the three component maps and the total number of probe-markers available in the consensus map. Cluster: Name of each putative maternal family that was identified in the principal component analysis. Samples: Number of megagametophytes in each cluster. Markers: Number of probe-markers in each component map with number of unique segregating bins within brackets (one marker for each bin was used to anchor the bin markers to the genetic map). Scaffolds: Number of scaffolds represented in each component map
| Cluster | Samples | Markers | Scaffolds |
|---|---|---|---|
| Cluster 1 | 314 | 9,073 (3,924) | 7,101 |
| Cluster 2 | 270 | 11,647 (5,311) | 8,738 |
| Cluster 3 | 842 | 19,006 (11,479) | 13,301 |
| Total | 1,426 | 21,056 | 14,336 |
Marker density and size of each component genetic map created from the three clusters as well as for the consensus map. LG: Linkage group. Cluster 1-3: Component maps for cluster 1-3 with number of probe-markers (marker-bins) assigned, map size (in cM) and maximum gap in map (in cM) for each of the LGs. Consensus: Number of markers and map size of the LGs in the consensus map
| LG | Cluster 1 | Cluster 2 | Cluster 3 | Consensus | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Markers | Length (cM) | Max gap (cM) | Markers | Length (cM) | Max gap (cM) | Markers | Length (cM) | Max gap (cM) | Markers | Length (cM) | |
| I | 975 (421) | 385.5 | 8.0 | 1,159 (553) | 439.9 | 21.1 | 1,967 (1,185) | 414.1 | 8.8 | 2,172 | 414.1 |
| II | 701 (305) | 249.2 | 9.6 | 863 (366) | 289.0 | 9.4 | 1,456 (864) | 289.8 | 10.9 | 1,608 | 250.3 |
| III | 859 (394) | 324.0 | 4.6 | 1,069 (479) | 381.1 | 7.1 | 1,738 (1,075) | 346.4 | 5.2 | 1,940 | 342.5 |
| IV | 771 (323) | 298.7 | 14.5 | 970 (452) | 350.9 | 8.6 | 1,531 (916) | 303.0 | 27.0 | 1,704 | 303.0 |
| V | 761 (311) | 273.2 | 8.9 | 1,116 (499) | 395.6 | 9.5 | 1,649 (1,032) | 342.6 | 15.1 | 1,865 | 275.0 |
| VI | 648 (292) | 241.0 | 8.4 | 915 (399) | 270.7 | 4.6 | 1,456 (894) | 269.5 | 8.4 | 1,622 | 240.2 |
| VII | 682 (331) | 314.0 | 8.4 | 923 (443) | 380.8 | 13.4 | 1,625 (1,013) | 321.9 | 7.9 | 1,769 | 321.0 |
| VIII | 775 (339) | 307.0 | 5.6 | 943 (454) | 367.26 | 9.8 | 1,465 (904) | 315.6 | 6.6 | 1,609 | 305.9 |
| IX | 792 (332) | 283.3 | 5.4 | 786 (364) | 295.6 | 5.9 | 1,589 (911) | 285.1 | 7.4 | 1,738 | 285.0 |
| X | 648 (289) | 231.6 | 7.0 | 960 (454) | 342.7 | 6.9 | 1,564 (917) | 272.7 | 7.1 | 1,709 | 273.1 |
| XI | 677 (253) | 200.6 | 3.7 | 1,025 (411) | 269.2 | 4.0 | 1,440 (818) | 233.6 | 3.0 | 1,608 | 233.4 |
| XII | 784 (334) | 281.6 | 9.3 | 919 (437) | 360.7 | 11.1 | 1,526 (950) | 312.3 | 14.3 | 1,712 | 312.3 |
| Total | 9,073 (3,924) | 3,389.4 | 14.5 | 11,648 (5,311) | 4,143.4 | 21.1 | 19,006 (11,479) | 3,706.7 | 27.0 | 21,056 | 3,555.8 |
Figure 4Marker order comparison between Linkage Groups (LGs) from the haploid consensus map presented here and the Picea abies map from Lind . Consensus LG I - LG XII are located on the x-axis from left to right. Lind et al. LG 1 - LG 12 are located on the y-axis from top to bottom. Each dot represents a marker comparison from the same scaffold, where black coloration represents the LG where the majority of marker comparisons are mapped. Gray coloration represents markers mapping to a different LG compared to the majority of markers. Turquoise coloration represents markers located on split scaffolds, which are indicative of assembly errors.
Figure 5Marker order comparison of Linkage Groups (LGs) between the Picea abies haploid consensus map presented here and the Picea glauca map from Pavy . Consensus LG I - LG XII are located on the x-axis from left to right. Pavy et al. LG 1 - LG 12 are located on the y-axis from top to bottom. Each dot represents a marker comparison from the same scaffold, where black coloration represents markers mapping to the same LG in the two species, gray coloration represents markers mapping to different LGs. Turquoise coloration represents markers located on split scaffolds, indicating an assembly error.
Figure 1Circos plot of the consensus map. A) Marker distribution over the 12 linkage groups (LG I-LG XII). Each black vertical line represents a marker (21,056 in total) in the map and is displayed according to the marker positions in cM. Track B-C visualizes multi marker scaffolds, where each line is a pairwise position comparison of probe-markers from the same scaffold. B) Position comparisons of probe-markers from the same scaffold that are located on the same LG. Light gray lines indicate probe-markers that are located < 5cM from each other, dark gray lines indicate probe-markers located 5-10 cM apart and red lines indicate probe-markers >10 cM apart. C) Position comparisons of probe-markers from the same scaffold that are mapping to different LGs. Orange lines indicated probe-markers from the same scaffold split over 2 LGs, while dark blue lines indicated probe-markers split over 3 LGs.
Estimated genetic length of each Linkage Group (LG) in the three component maps. LG: linkage group in the consensus map; Observed genetic length (cM): The genetic length of the LG calculated from all probe-marker bins (same as in Table 2); Mean estimated genetic length (cM): the average length of the LG when using 100 random probe-marker bins in 100 map calculations; SD (cM): Standard deviation of the estimated length; Inflation/Marker bin: The difference between observed genetic length and the estimated length divided by the number of probe-marker bins in the linkage group
| LG | Cluster 1 | Cluster 2 | Cluster 3 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Observed genetic length (cM) | Mean estimated genetic length (cM) | SD (cM) | Inflation / Marker bin (cM) | Observed genetic length (cM) | Mean estimated genetic length (cM) | SD (cM) | Inflation / Marker bin (cM) | Observed genetic length (cM) | Mean estimated genetic length (cM) | SD (cM) | Inflation / Marker bin (cM) | |
| I | 385.5 | 245.5 | 5.2 | 0.33 | 439.9 | 252.3 | 6.1 | 0.34 | 414.2 | 204.8 | 7.3 | 0.18 |
| II | 249.2 | 168.8 | 4.7 | 0.26 | 289.0 | 192.9 | 4.4 | 0.26 | 289.8 | 166.4 | 2.8 | 0.14 |
| III | 324.0 | 195.8 | 5.8 | 0.33 | 381.1 | 218.6 | 5.8 | 0.34 | 346.4 | 168.5 | 3.9 | 0.17 |
| IV | 298.7 | 204.7 | 5.0 | 0.29 | 350.9 | 215.6 | 5.7 | 0.30 | 303.0 | 167.0 | 3.5 | 0.15 |
| V | 273.2 | 195.7 | 4.6 | 0.25 | 395.6 | 218.4 | 9.0 | 0.36 | 342.6 | 180.0 | 5.1 | 0.16 |
| VI | 241.0 | 161.8 | 4.7 | 0.27 | 270.7 | 170.0 | 4.7 | 0.25 | 269.5 | 142.2 | 2.9 | 0.14 |
| VII | 314.0 | 223.6 | 5.3 | 0.27 | 380.8 | 248.7 | 6.3 | 0.30 | 321.9 | 175.9 | 3.7 | 0.14 |
| VIII | 307.0 | 203.6 | 4.9 | 0.31 | 367.26 | 226.7 | 5.7 | 0.31 | 315.6 | 179.2 | 4.3 | 0.15 |
| IX | 283.3 | 194.0 | 4.9 | 0.27 | 295.6 | 185.3 | 6.8 | 0.30 | 285.1 | 157.2 | 3.0 | 0.14 |
| X | 231.6 | 164.4 | 3.6 | 0.23 | 342.7 | 193.6 | 4.5 | 0.33 | 272.7 | 141.5 | 2.7 | 0.14 |
| XI | 200.6 | 141.8 | 4.7 | 0.23 | 269.2 | 147.0 | 4.8 | 0.30 | 233.6 | 119.6 | 3.0 | 0.14 |
| XII | 281.6 | 194.4 | 4.9 | 0.26 | 360.7 | 209.6 | 9.1 | 0.35 | 312.3 | 168.7 | 3.1 | 0.15 |
| Total | 3,389.4 | 2,294.2 | — | 0.28 | 4,143.4 | 2,478.3 | — | 0.31 | 3,706.7 | 1,971.0 | — | 0.15 |
Figure 2Fraction of scaffolds that are being represented by 1-11 unique markers in the consensus map. Insert: Fraction of scaffolds that have multiple probe-markers (2-11) that are distributed over 1-3 linkage groups (inter-split scaffolds). Red dot indicate the fraction of scaffolds with multiple probe-markers which are positioned > 5cM apart on the same linkage group (intra-split scaffolds).
Figure 3Box plot of scaffold lengths for all multi-marker scaffolds (dark gray box) and for scaffolds showing a split within or across LGs (light gray box). The split scaffolds are significantly longer than the multi-marker scaffolds in general (t = -7.70, df = 193.39, p-value = 7.00e-13).
Overview of annotated gene models anchored to the genetic map. Gene models: Annotated protein coding gene models with High-, Medium- and Low confidence level (Nystedt ). Mapped scaffolds: Number of gene models positioned on scaffolds that are anchored to the genetic map (Percentage of total number of gene models for each confidence level). Multi-marker scaffolds: Number of gene models positioned on scaffolds with multiple markers in the genetic map (Percentage of gene models on mapped scaffolds). Inter-split scaffolds: Number of gene models positioned on the 164 scaffolds that are split between LGs in the genetic map (Percentage of gene models on mapped scaffolds / Percentage of gene models on multi-marker scaffolds). Intra-split scaffolds: Number of gene models positioned on the 22 scaffolds that are split between different regions of the same LG (Percentage of gene models on mapped scaffolds / Percentage of gene models on multi-marker scaffolds). Split within gene models: Number of gene models that have an internal split (Percentage of gene models on mapped scaffolds / Percentage of gene models on multi-marker scaffolds)
| Gene models | Mapped scaffolds | Multi-marker scaffolds | Inter-split scaffolds | Intra-split scaffolds | Split within gene models |
|---|---|---|---|---|---|
| High confidence | 8,379 (31.7%) | 3,122 (37.3%) | 145 (1.7% / 4.6%) | 15 (0.18% / 0.48%) | 58 (0.69% / 1.9%) |
| Medium confidence | 6,624 (20.6%) | 2,215 (33.4%) | 114 (1.7% / 5.1%) | 16 (0.23% / 0.68%) | 29 (0.44% / 1.3%) |
| Low confidence | 2,076 (25.8%) | 762 (36.7%) | 35 (1.7% / 4.6%) | 5 (0.29% / 0.79%) | 13 (0.63% / 1.7%) |
| Total | 17,079 (25.6%) | 6,099 (35.7%) | 294 (1.7% / 4.8%) | 36 (0.21% / 0.59%) | 100 (0.59% / 1.6%) |
Figure 6Sliding window analysis of neutrality statistics. Analyses were performed using 10 cM windows with 1 cM incremental steps along the consensus map linkage groups and visualized using coloring alternates between adjacent LGs. A) Number of segregating sites. Dashed horizontal line indicates the overall average of 1017. B) Pairwise nucleotide diversity (π). Dashed horizontal line indicates the overall average of 0.005. C) Tajima’s D. Dashed horizontal line indicates the overall average of -0.852. D) Linkage disequilibrium Zn scores. Dashed horizontal line indicates the overall average of 0.040.