| Literature DB >> 31722668 |
Clément Mabire1, Jorge Duarte2, Aude Darracq1, Ali Pirani3, Hélène Rimbert2,4, Delphine Madur1, Valérie Combes1, Clémentine Vitte1, Sébastien Praud2, Nathalie Rivière2, Johann Joets1, Jean-Philippe Pichon2, Stéphane D Nicolas5.
Abstract
BACKGROUND: Insertions/deletions (InDels) and more specifically presence/absence variations (PAVs) are pervasive in several species and have strong functional and phenotypic effect by removing or drastically modifying genes. Genotyping of such variants on large panels remains poorly addressed, while necessary for approaches such as association mapping or genomic selection.Entities:
Keywords: Array; Breakpoint; Chromosomal rearrangements; Copy number variation; Genome assembly; Genotyping; Present absent variation; Structural variation; Zea mays
Mesh:
Substances:
Year: 2019 PMID: 31722668 PMCID: PMC6854671 DOI: 10.1186/s12864-019-6136-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
F2, PH207, and C103 de novo assembly metrics
| Maize line | Number of scaffolds | Min size | Max size | Average size | N50 | Total (Mb) | % of Ns | Complete BUSCOs (C) | Fragmented BUSCOs (F) | Missing BUSCOs (M) |
|---|---|---|---|---|---|---|---|---|---|---|
| F2 | 76,563 | 892 | 112,956 | 16,900 | 14,042 | 646.3 | 9.48% | 89.3% | 4.9% | 5.8% |
| PH207 | 81,688 | 884 | 2,024,489 | 29,557 | 16,860 | 797.5 | 8.90% | 91.8% | 2.7% | 5.5% |
| C103 | 84,990 | 886 | 120,582 | 19,305 | 16,146 | 793 | 8.21% | 90.6% | 4.2% | 5.2% |
Number of scaffold: The number of scaffold sequences assembled, Min Size: the length of the shortest scaffold, Max size: the length of the longest scaffold, Average Size: the average size of scaffolds, N50: N50 of the assembly, Total: the total number of bases included in the assembly, % of Ns: the percentage of Ns present in the assembly; BUSCO statistics included the percentage of complete (C), fragmented (F) and missing (M) BUSCO genes from a total of 1440 BUSCO genes.
Fig. 1Genotyping of InDel CNVMAIZE_DEL_12661 using three probe types on 445 individuals. a Schematic distribution of the 9 probes along the sequence of InDel CNVMAIZE_DEL_12661 (green line) and the bordering sequence common between all individuals (blue line) genotyped by the array. Double, dotted, and full arrows represented the probes designing on the forward and reverse flanking sequences of the breakpoint sites (BP), at not polymorphic (MONO) and polymorphic sites (OTV) within internal sequence of InDel. b Schematic distribution of the 8 probes passing Affymetrix® quality control and called by the Affymetrix® pipeline c) Clustering produced by the Affymetrix® algorithm for an OTV, MONO, and BP probe from InDel based on both fluorescence contrast (X axis) and intensity (Y axis) of the 445 inbred lines. Red, blue and yellow dots indicated the presence of the sequence (genotype “present”) either homozygous for allele A (AA) or allele B (BB) or heterozygous (AB), respectively. Cyan and green indicated that the sequence was absent in the individual (OO), or only in one copy of the sequence, e.g. hemizygous for presence/absence (OB or OA). Black dots indicated individuals for which no genotype could be assigned (Missing data) d) Haplotypes displayed by the genotyping using 8 probes (column) on the 445 inbred lines (row). Colors corresponded to the genotype of individuals produced by clustering in c)
Fig. 2Distribution of 105,927 InDels genotyped by the array according to their size and the cumulated length of Presence/Absence regions (PARs) in their internal sequence. a Distribution of the number of InDels according to the proportion of presence/absence regions (sequence not present elsewhere in the genome) within their internal sequence. b Distribution of the number of InDels according to their size (kbp) and the percentage of internal sequence of InDel covered by PAR(s). Red Color indicates the proportion of InDels with (red) or without (blue) PARs for the 7 InDel size classes
Number of probes and targeted InDels before and after selection for array design and passing the Affymetrix® quality control according to different probes type. Percentages are indicated in brackets
| Before selection | On array | Called by Affymetrix® pipeline | ||||
|---|---|---|---|---|---|---|
| Probes | InDela | Probes | InDela | Probes | InDela | |
| BP Type1 | 6,648 (0.02%) | 3,324 (2.82%) | 4,691 (0.71%) | 2,751 (2.6%) | 2,092 (0.44%) | 1,482 (1.66%) |
| BP Type2 | 51,770 (0.2%) | 25,885 (21.98%) | 38,790 (5.85%) | 22,662 (21.39%) | 20,540 (4.29%) | 14,407 (16.12%) |
| BP Type3 | 71,820 (0.27%) | 35,910 (30.5%) | 41,272 (6.23%) | 27,897 (26.34%) | 23,631 (4.93%) | 18,485 (20.68%) |
| BP Type4 | 312 (0.001%) | 156 (0.13%) | 241 (0.04%) | 146 (0.14%) | 119 (0.02%) | 93 (0.1%) |
| OTV | 872,324 (3.26%) | 21,390 (18.16%) | 163,278 (24.64%) | 18,558 (17.52%) | 96,867 (20.22%) | 15,064 (16.85%) |
| MONO | 25,735,797 (96.25%) | 68,573 (58.23%) | 414,500 (62.54%) | 65,796 (62.11%) | 335,778 (70.1%) | 63,597 (71.14%) |
| ALL | 26,738,671 | 117,756 | 662,772 | 105,927 | 479,027 | 89,393 |
Note that a same InDel could be genotyped by several probe types which resulted in the percentage values great than 1
Fig. 3Number of InDels interrogated by each probe types or their combination, for which: a probe could be designed (a) and a probe was finally selected to be included in the final array (b). Vertical bars indicate number of InDels interrogated by each probe types or their combination. Black points and connected traits below the vertical bars indicate the corresponding probes types or their combination that are used for interrogating this subset of InDels. Horizontal bars indicate number of InDels interrogated by each probe types (OTV, BP, MONO). Number of InDels by probe type, for which: a probe could be designed (a) and a probe was finally selected to be included in the final array (b). Number of InDels that could be targeted by each type of probes designed (a) and selected to be included in the final array (b)
Comparison between the clustering expected for BP, MONO, and OTV probe types and the clustering produced by Affymetrix® pipelines based on the fluorescent intensity and contrast of 445 inbred lines for 479,027 probes
| Classification based on the clustering produced by Affymetrix® pipelines and genotyping assignment | |||||||
| Probe types | BP | OTV | |||||
| BP | Number (%) | 20,370 (43.9%) | 26,012 (56.1%) | ||||
| Clustering examples | |||||||
| Description | Two homoz. clusters | Two homoz. and one OT clusters | |||||
| OTV | MONO | SNP | monomorphic | ||||
| OTV | Number (%) | 78,799 (81.3%) | 502 (0.5%) | 17,562 (18.1%) | 4 (0.0%) | ||
| Clustering examples | |||||||
| Description | Two homoz. and one OT clusters | One homoz. and one OT clusters | Two homoz. clusters | One cluster | |||
| MONO | OTV | Unexpected MONO 1 | SNP | Unexpected MONO 2 | monomorphic | ||
| MONO | Number (%) | 212,434 (63,3%) | 15,690 (4,7%) | 68,562 (20,4%) | 1,981 (0.6%) | 9,525 (2.8%) | 27,586 (8.29%) |
| Clustering examples | |||||||
| Description | One homoz. and one OT clusters | Two homoz. and one OT clusters | One homoz., one OT and one het. clusters | Two homoz. clusters | One homoz. and one het. clusters | One cluster | |
“Clustering example”: typical example of clustering based on the fluorescent intensity (y-axis) and contrast (x-axis). Colors on figure indicate the assignation of the genotype to the individuals based on this clustering; “Number (%)”: Number (percentage) of probes displaying the corresponding clustering. “Description”: Brief characteristic of each classification based on the clustering of individuals (homoz.= homozygote, het=heterozygous, OT= off-target)
Consistency rate between genotyping by sequencing and by array for the 4 individuals used to discover the InDels, for the three probe types and for the two different genotypes observed from sequencing: presence (P) or absence (A)
| Probe Types | Genotype by sequencing | B73 | F2 | C103 | PH207 | All Lines |
|---|---|---|---|---|---|---|
| BPa | A | 0.98 | 0.98 | 0.98 | 0.97 | 0.98 |
| P | 0.97 | 0.97 | 0.97 | 0.96 | 0.97 | |
| ALLa | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 | |
| OTV | A | 0.85 | 0.89 | 0.80 | 0.78 | 0.83 |
| P | 0.93 | 0.97 | 0.96 | 0.96 | 0.96 | |
| ALL | 0.90 | 0.95 | 0.91 | 0.90 | 0.92 | |
| MONO | A | 0.77 | 0.81 | 0.82 | 0.81 | 0.80 |
| P | 0.90 | 0.98 | 0.94 | 0.94 | 0.95 | |
| ALL | 0.82 | 0.94 | 0.89 | 0.88 | 0.88 | |
| ALL | A | 0.80 | 0.86 | 0.84 | 0.82 | 0.82 |
| P | 0.92 | 0.97 | 0.94 | 0.95 | 0.95 | |
| ALL | 0.85 | 0.95 | 0.90 | 0.89 | 0.90 |
aNote that consistency rate of hemizygous genotypes (heterozygous for presence / absence) were not displayed in the table for BP probes but considered to estimate global consistency rate (ALL). Note that the absence of probe sequence due to absence of hybridization or no alignment on draft sequence of BP probes were considered as missing data. Missing data were not included in the comparison for all probes
Fig. 4Consistencies among probes within 50,648 InDels with at least two probes genotyped in 362 inbred lines. a Distribution of the average allelic frequencies of present calls over all probes. b Variation of proportion of genotypes not fully consistent across all probes (FreqDiff01). The black and gray curves with triangle points represent the variation of the median and average FreqDiff01 across InDels, respectively. Colored curves with circle points represent the expected variation of the proportion for different error rates (1%: red, 3%: green, 5%: light blue, 10%: dark blue). Frequencies of 1 (presence) and 0 (absence) indicate that all probes had consistent genotypes for the corresponding inbred line. Intermediate frequencies indicate that at least one probe was not consistent with the other probes for the same InDel in one inbred line
Fig. 5Principal coordinate analysis on the genetic distance between 362 inbred lines from an association panel estimated by a) 57,824 InDels and b) 28,143 SNPs. Colors represent the assignment of the inbred lines to the 5 genetic groups defined by admixture using pre-fixed Panzea SNPs from the 50 K Illumina array, when the probability of assignment to a group (membership) was greater than 60%. Inbred lines not assigned to a group were considered admixed and colored gray. The common names of maize accessions, typical of each genetic group, were used