| Literature DB >> 24855315 |
Justin E Anderson1, Michael B Kantar2, Thomas Y Kono1, Fengli Fu1, Adrian O Stec1, Qijian Song3, Perry B Cregan3, James E Specht4, Brian W Diers5, Steven B Cannon6, Leah K McHale7, Robert M Stupar8.
Abstract
Gene structural variation (SV) has recently emerged as a key genetic mechanism underlying several important phenotypic traits in crop species. We screened a panel of 41 soybean (Glycine max) accessions serving as parents in a soybean nested association mapping population for deletions and duplications in more than 53,000 gene models. Array hybridization and whole genome resequencing methods were used as complementary technologies to identify SV in 1528 genes, or approximately 2.8%, of the soybean gene models. Although SV occurs throughout the genome, SV enrichment was noted in families of biotic defense response genes. Among accessions, SV was nearly eightfold less frequent for gene models that have retained paralogs since the last whole genome duplication event, compared with genes that have not retained paralogs. Increases in gene copy number, similar to that described at the Rhg1 resistance locus, account for approximately one-fourth of the genic SV events. This assessment of soybean SV occurrence presents a target list of genes potentially responsible for rapidly evolving and/or adaptive traits.Entities:
Keywords: CNV; Glycine max; nested association mapping; soybean; structural variation
Mesh:
Substances:
Year: 2014 PMID: 24855315 PMCID: PMC4455779 DOI: 10.1534/g3.114.011551
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Genome-wide view of copy number variation found in the soybean NAM parents. Data points are the log2 ratio of each genotype vs. the Williams82-ISU-01 reference for each probe. Colored spots denote probes within segments that exceed threshold: blue for UpCNV and red for DownCNV.
Figure 2Classification system for CNVs that were associated with gene models. (A) Presence–absence and copy number status for a hypothetical gene in each of the six classes. Genes are found in one of three states: single copy, absent (white gap), or multiple copies (two or more arrows). (B) Gene representatives for each of the six classes showing allelic clusters. Each gene shows one data point for each of the 41 genotypes. The estimated copy number from sequence depth and CGH are shown on the X and Y axes, respectively.
The number of gene models identified within six structural variation categories
| Gene Models Evaluated | DownCNV/DownPAV | UpPAV | UpCNV and UpPAV | UpCNV and DownCNV/PAV | UpCNV | Multi-Allelic UpCNV | |
|---|---|---|---|---|---|---|---|
| Wm82-ISU-01 copy number | 1 | 1 | 0 | 0 | 1 | 1 | 1 |
| NAM parent copy number | — | 0 | 1 or >1 | >1 and (1 or >>1) | >1 and 0 | >1 | >1 and >>1 |
| Genes with syntenic paralog | 32,464 | 149 | 4 | 1 | 10 | 71 | 9 |
| Genes without syntenic paralog | 21,369 | 951 | 96 | 15 | 79 | 122 | 21 |
| Total genes assessed | 53,833 | 1100 | 100 | 16 | 89 | 193 | 30 |
The first two rows indicate the definition of each category based on the observed presence and copy number differences between Wm82-ISU-01 and at least one of the 41 NAM parent lines. The next two rows indicate the number of genes exhibiting each category among the subsets of genes that have maintained a syntenic paralog or have not maintained a syntenic paralog.
Figure 3Copy number variation at the soybean cyst nematode locus Rhg1. (A) The copy number variant (arrow) is clearly visible from a full view of the chromosome 18 CGH results, overlaying data from all 41 genotypes. (B) The view from (A) is zoomed-in on the 31-kb UpCNV segment that overlaps five gene models (Cook ). (C) Viewing only one genotype from each allele class confirms a clear separation between three different copy number states. (D) Cross-validation of the CNV for Glyma18g02590 using both CGH (y-axis) and sequence depth (x-axis) analyses.
Figure 4Copy number variation at Glyma13g04670. (A) The copy number variant (arrow) is visible from a full view of the chromosome 13 CGH results, overlaying data from all 41 genotypes. (B) The view from (A) is zoomed in on the approximately 10-kb UpCNV segment that overlaps with Glyma13g04670, revealing multiple CNV classes. (C) Viewing one genotype from each predicted class confirms distinct copy number states. (D) Cross-validation of the CNV for Glyma13g04670 using both CGH (y-axis) and sequence depth (x-axis) analyses, revealing at least four copy number classes.
Gene models with specific Pfam domains that are enriched for associations with SV
| Pfam ID | Description | Total in Soybean Genome | DownCNV/PAV | UpPAV | UpCNV and UpPAV | UpCNV and DownCNV/PAV | UpCNV | Multi-Allelic UpCNV | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OBS | EXP | OBS | EXP | OBS | EXP | OBS | EXP | OBS | EXP | OBS | EXP | |||
| CL0022 | Leucine-rich repeat | 1110 | 168** | 23 | 7 | 2 | 3 | 0 | 17** | 2 | 2** | 4 | 6 | 1 |
| PF07714 | Protein tyrosine kinase | 786 | 38* | 16 | 0 | 1 | 1 | 0 | 4 | 1 | 3 | 3 | 3 | 0 |
| PF08263 | Leucine-rich repeat N-terminal domain | 550 | 74** | 11 | 1 | 1 | 0 | 0 | 9** | 1 | 10 | 2 | 3 | 0 |
| PF00931 | NB-ARC domain | 454 | 112** | 9 | 6 | 1 | 6** | 0 | 13** | 1 | 9 | 2 | 2 | 0 |
| PF01582 | Toll-interleukin receptor | 196 | 30** | 4 | 3 | 0 | 3 | 0 | 2 | 0 | 0 | 1 | 0 | 0 |
| PF14368 | Probable lipid transfer | 104 | 14** | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PF12819 | Carbohydrate-binding protein of the ER | 95 | 14** | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | 0 |
| PF14111 | Domain of unknown function (DUF4283) | 82 | 10* | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | 0 |
| PF13947 | Wall-associated receptor kinase galacturonan-binding | 71 | 10* | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 |
| PF14380 | Wall-associated receptor kinase C-terminal | 33 | 10** | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PF05686 | Glycosyl transferase family 90 | 20 | 7** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PF05018 | Domain of unknown function (DUF667) | 7 | 5** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PF00499 | NADH-ubiquinone/plastoquinone oxidoreductase chain 6 | 2 | 0 | 0 | 2* | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
The number of gene models expected to be associated with SV is shown, compared with the number of gene models observed to be associated with SV for each category. Significance of enrichment was determined by the Fisher exact test with a resampling approach to correct for multiple hypotheses as implemented by the FuncAssociate 2.0 (Berriz ) program using 10,000 simulations (*P < 0.01, **P < 0.001). Only Pfam domains significantly enriched (P < 0.01) in at least one SV category were listed.