Literature DB >> 31553737

Linkage disequilibrium and haplotype block patterns in popcorn populations.

Andréa Carla Bastos Andrade1, José Marcelo Soriano Viana1, Helcio Duarte Pereira1, Vitor Batista Pinto1, Fabyano Fonseca E Silva2.   

Abstract

Linkage disequilibrium (LD) analysis provides information on the evolutionary aspects of populations. Recently, haplotype blocks have been used to increase the power of quantitative trait loci detection in genome-wide association studies and the prediction accuracy of genomic selection. Our objectives were as follows: to compare the degree of LD, LD decay, and LD decay extent in popcorn populations; to characterize the number and length of haplotype blocks in the populations; and to determine whether maize chromosomes also have a pattern of interspaced regions of high and low rates of recombination. We used a biparental population, a synthetic, and a breeding population, genotyped for approximately 75,000 single nucleotide polymorphisms (SNPs). The sample size ranged from 190 to 192 plants. For the whole-genome LD and haplotype block analyses, we assumed a window of 500 kb. To characterize the block and step patterns of LD in the populations, we constructed LD maps by chromosome, defining a cold spot as a chromosome segment including SNPs with the same LDU position. The LD and haplotype block analyses were also performed at the intragenic level, selecting 12 genes related to zein, starch, cellulose, and fatty acid biosynthesis. The populations with the higher and lower frequencies of |D'| values greater than 0.75 were the biparental (65-74%) and the breeding population (26-58%), respectively. There were slight differences between the populations regarding the average distance for SNPs with |D'| values greater than 0.75 (in the range of approximately 207 to 229 kb). The level of LD expressed by the r2 values was low in the populations (0.02, 0.04, and 0.04, on average) but comparable to some non-isolated human populations. The frequency of r2 values greater than 0.75 was lower in the biparental population (0.2-0.5%) and higher in the other populations (0.2-1.6%). The average distance for SNPs with r2 values greater than 0.75 was much higher in the biparental population (approximately 80 to 126 kb). In the other populations, the ranges were approximately 6 to 19 and 6 to 35 kb. The heatmaps for the regions covered by the first 100 SNPs in each chromosome, in each population (1 to 3.3 Mb, approximately), provided evidence that the comparatively few high r2 values (close to 1.0) occurred only for SNPs in close proximity, especially in the synthetic and breeding populations. Due to the reduced number of SNPs in the haplotype blocks (2 to 3) in the populations, it is not expected advantage of a haplotype-based association study as well as genomic selection along generations. The results concerning LD decay (rapid decay after 5-10 kb) and LD decay extent (along up to 300 kb) are in the range observed with maize inbred line panels. The LD maps indicate that maize chromosomes had a pattern of regions of extensive LD interspaced with regions of low LD. However, our simulated LD map provides evidence that this pattern can reflect regions with differences in allele frequencies and LD levels (expressed by |D'|) and not regions with high and low rates of recombination.

Entities:  

Mesh:

Year:  2019        PMID: 31553737      PMCID: PMC6760792          DOI: 10.1371/journal.pone.0219417

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Linkage disequilibrium (LD) analysis is important to humans, other animal species, and plant geneticists because the results can be used for positional cloning, provide information on the rate of recombination, gene conversion, and evolutionary aspects of populations, including recombination history, mutation, selection, genetic drift, and admixture, and allow for the selection of populations and single nucleotide polymorphisms (SNPs) for association studies [1]. The most common LD measures are D' and r2. The statistic D' is the ratio between D (the difference between products of haplotypes, D = P(AB).P(ab)–P(Ab).P(aB)) and the deviation of the actual gametic frequency from linkage equilibrium [2]. The statistic r2 is the square of the correlation between the values of alleles at two loci in the same gamete, where D is the covariance [3]. Additional information on historical recombination is provided by analysis of the haplotype block pattern in populations. A haplotype block is a chromosome region in which there are few haplotypes (combinations of alleles of multiple SNPs within a haplotype block) (2–4 per block), and for which the LD analysis provides evidence of a low rate of recombination [1]. Recently, haplotype blocks have been used to increase the power of QTL (quantitative trait loci) detection in genome-wide association studies (GWAS) and the prediction accuracy with genomic selection. Based on a panel including 183 maize inbred lines genotyped for 38,000 SNPs, Maldonado et al. [4] confirmed the advantage of haplotype-based GWAS for ear and plant height, the ear height/plant height ratio, and leaf angle relative to single SNP analysis. Hess et al. [5] observed an increase of up to 5.5% in the accuracy of genomic prediction in an admixed dairy cattle population using fixed-length haplotypes relative to the single SNP approach. Although there are several methods for defining a haplotype block, the most common procedure was proposed by Gabriel et al. [6]. Their criterion is that the one-sided upper 95% confidence bound on D' is > 0.98 and the lower bound is > 0.70. Characterization of the LD and haplotype block patterns in human, domesticated animal, and plant populations has provided variable results concerning the degree of LD, LD decay, LD decay extent, and number and length of the haplotype blocks. Most maize LD studies have been done with inbred line panels. Thirunavukkarasu et al. [7] and Truntzler et al. [8] observed an overall average r2 between 0.23 and 0.61, LD decay after 5–10 kb, and LD extent along 200–300 kb. Faster LD decay and shorter LD extent (less than 4 kb) were observed by Maldonado et al. [4]. Higher LD and slower LD decay were observed in biparental and multiparental maize populations [9]. The number and length of haplotype blocks is also highly variable [4, 7]. In several investigations in human populations, the structure of LD was described based on LD maps. In an LD map, each SNP has an LD position in LD units (LDUs). One LDU is the distance in kilobases at which disequilibrium (expressed as the Malecot's prediction of association– ρ) declines to approximately 0.37 of its starting value. Assuming unrelated individuals, ρ equates to the absolute value of D'. The difference between the LD positions of two SNPs divided by the distance in kilobases (d) is the exponential decline of disequilibrium (ε). LDUs share an inverse relationship with the recombination rate. Thus, regions with extensive LD have few LDUs (plateaus or blocks), and regions with many LDUs have high levels of recombination rate (steps). Holes in the LD maps are regions where greater marker density is required to provide a full characterization of the block and step patterns of the LD. Holes are identified by an LD map interval of 3, which is an arbitrary value because disequilibrium is indeterminate for εd > 3 and of doubtful reliability for εd > 2 [10, 11]. Because there is no information on LD and the structure of haplotype blocks in popcorn populations and no LD maps for maize, the objectives of this study were: (1) to compare the degree of LD, the LD decay, and the LD decay extent in popcorn populations; (2) to characterize the number and length of haplotype blocks in the populations; and (3) to elaborate the first LD map for maize, for elucidating whether maize chromosomes also have a pattern of interspaced regions of high and low rates of recombination.

Materials and methods

Populations

We used a biparental (F2 generation) temperate population, a tropical synthetic (Synthetic UFV), and a tropical breeding population (Beija-Flor cycle 4). A biparental population is the most used maize population for deriving doubled haploids and inbred lines in hybrid breeding. Maize synthetic varieties are used as germplasm sources in breeding programs or as improved populations in developing countries. Theoretically, a biparental population shows LD only for linked genes and molecular markers. In a synthetic there is LD for genes and molecular markers with independent assortment. Because selection can change the LD degree, we also included a breeding population. The biparental population was derived from the single cross AP4502, developed by the Agricultural Alumni Seed Improvement Association, Romney, IN, USA. Synthetic UFV and Beija-Flor cycle 4 (BFc4) were developed by the Federal University of Viçosa (UFV), Minas Gerais, Brazil. The synthetic was derived by random crossings involving 20 elite inbred lines from the tropical population Viçosa and 20 elite inbred lines from the tropical population Beija-Flor. The inbred lines were selected based on expansion volume (a measure of popcorn quality). Beija-Flor cycle 4 was developed after four cycles of half-sib selection based on expansion volume.

DNA extraction, genotyping-by-sequencing (GBS), SNP calling, data quality control, and imputation

Leaf samples of young plants were collected for DNA extraction. The DNA extraction was performed using the CTAB (cetyl trimethylammonium bromide) protocol with modifications. After quantification, the DNA samples of 574 plants (190 or 192 from each population) were sent to the Institute of Biotechnology at Cornell University (two plates of 95 samples from the biparental population) and Institut de Recherche en Immunologie et en Cancérologie/IRIC at University of Montreal (four plates of 96 samples from the tropical populations) for GBS services based on HiSeq 2500 (paired-end reads of 125 bp) and NextSeq500 (single-end reads of 85 bp), respectively. The SNP variant call services were provided by the Institute of Biotechnology and Omega Bioservices, Norcross, GA, respectively, using B73 version 4 (current version) as the reference genome [12]. After reading the data using the R package vcfR [13], we filtered by missing allele and chromosome. Then, we computed the SNP and genotype call rates and the minor allele frequency (MAF), employing the R package HapEstXXR [14]. After filtering by MAF > 0.01, we imputed based on Beagle [15] using the R package synbreed [16]. The number of SNPs after data quality control and imputation were 145,420, 74,773, and 76,055 for the biparental population, Synthetic UFV, and Beija-Flor c4, respectively. To maintain a similar number of SNPs for the populations, we finally performed a random sampling of 75,000 SNPs from the biparental population.

LD and haplotype block analyses

For Hardy-Weinberg equilibrium analysis by population and chromosome, the Bonferroni criterion was adopted to keep a global level of significance of 1%. To characterize the block and step patterns of LD in the populations, we constructed LD maps by chromosome using the interval method [17]. We defined a cold spot region as a chromosome segment including SNPs with the same LDU position. To evaluate if the LD maps allow inference of the overall degree of LD by chromosome in the populations, we also processed a simulated data set, generated with REALbreeding software (available by request). This software has been recently used in studies on population structure [18], QTL mapping [19], genomic selection [20], and genome-wide association studies [21]. We simulated the genotyping of 200 individuals in a population (generation 0) and 200 individuals in the same population after 10 generations of random crossings (generation 10), for 287 SNPs spanning 298 cM (density of 1 cM) of a single chromosome. We then evaluated the degree of LD by chromosome in the populations concerning SNPs separated by up to 500 kb, using a two marker expectation-maximization (EM) algorithm [22]. For the whole-genome LD decay and LD decay extent analyses, we computed the average |D'| and r2 values, defining intervals of 50 kb (0–50 to 451–500). To define a haplotype block, we adopted the criterion proposed by Gabriel et al. [6]. The haplotypes were estimated using an accelerated EM algorithm with a partition-ligation approach [23] to generate phased haplotypes for population frequency [24]. The LD and haplotype block analyses were also performed at the intragenic level. We choose 12 genes related to zein (one), starch (four), cellulose (five), and fatty acid biosynthesis (two) (S1 Table). With two exceptions, the selected genes had at least five SNPs in each population (maximum of 21). For the intragenic LD decay and LD decay extent analyses, we computed the average |D'| and r2 values defining intervals of 1 kb (0–1 to 10.1–11 kb). All analyses were performed using LDMAP [17] and Haploview [22]. Heatmaps were generated using the R package pheatmap. To assess the haplotype blocks information, the haplotype files for each population and chromosome were read by a program (Haplotype blocks summary) developed in REALbasic 2009 by Prof. José Marcelo Soriano Viana.

Results

With the exception of chromosome 10 in the breeding population, the number of SNPs was generally in proportion to the chromosome length, providing an SNP density in the range of 23.5 to 44.3 kb (one SNP per 30.0 kb on average) (Table 1). The average MAF was approximately 0.1 regardless of chromosome and population, but the populations differed in their MAF distribution. The biparental population had a bimodal distribution and showed a higher number of SNPs with frequencies close to 0.01 and greater than 0.45 (S1 Fig). The synthetic and breeding populations had similar MAF distributions. The analysis of Hardy-Weinberg equilibrium provided evidence that most of the SNPs in the biparental population had a nonsignificant deviation, whereas most of the SNPs in the other populations showed a significant deviation. We retained SNPs with significant deviation from Hardy-Weinberg equilibrium in the synthetic and breeding populations to keep a similar number of SNPs for the LD and haplotype block analyses. To maintain a similar number of SNPs for constructing the LD maps by chromosome, we used the SNPs in Hardy-Weinberg equilibrium in the synthetic and breeding populations as well as a sample of SNPs with no significant deviation from Hardy-Weinberg equilibrium from the biparental population.
Table 1

Number of SNPs, SNP coverage (kb), average SNP interval (bp) and MAF, and minimum, average, and maximum LD measures by chromosome in each population.

PopulationChr.SNPsSNP coverageSNP intervalMAF|D'|r2
Min.Av.Max.Min.Av.Max.
Biparental111,816307,039.2725,982.750.090.000.781.00.000.0231.0
28,710244,412.2528,059.680.110.000.771.00.000.0261.0
38,205235,520.1928,693.180.110.000.751.00.000.0321.0
48,081246,827.2230,525.850.070.000.811.00.000.0151.0
58,697223,657.6725,708.940.090.000.791.00.000.0191.0
65,883173,906.6129,537.180.100.000.781.00.000.0271.0
76,401182,200.4828,440.640.110.000.771.00.000.0251.0
86,528181,042.6427,725.540.100.000.781.00.000.0231.0
95,625159,429.2628,336.110.110.000.761.00.000.0271.0
 105,054150,832.7329,824.390.100.000.761.00.000.0251.0
Synthetic111,224306,909.6627,341.760.100.000.751.00.020.0461.0
29,712244,369.3425,159.970.100.000.751.00.020.0411.0
39,374235,478.7225,083.000.100.000.761.00.020.0421.0
45,840246,943.4742,170.020.100.000.741.00.020.0521.0
59,460223,706.5123,589.540.100.000.741.00.020.0401.0
65,294173,221.4232,692.620.100.000.741.00.020.0501.0
76,299182,159.8028,857.920.110.000.741.00.020.0421.0
86,248180,660.3828,850.520.100.000.761.00.020.0441.0
95,161159,553.3330,909.310.110.000.751.00.020.0451.0
 106,161150,828.6124,464.310.090.000.751.00.020.0341.0
BFc4110,182306,774.0130,126.800.110.200.711.00.020.0471.0
28,481244,407.9728,816.880.110.210.691.00.020.0421.0
38,005235,478.7429,373.180.110.200.701.00.020.0401.0
45,558246,840.4444,379.590.110.200.691.00.020.0541.0
57,674223,706.5129,080.320.110.200.701.00.020.0391.0
64,547173,351.5038,093.290.110.190.681.00.020.0441.0
75,602182,155.1932,448.240.110.200.691.00.020.0401.0
85,020180,660.3835,943.930.120.200.701.00.020.0481.0
 95,353159,489.8729,788.600.110.200.691.00.020.0421.0
1015,633150,926.359,653.390.130.200.521.00.020.0211.0
The LD map from the simulated data provided evidence that the LD units were lower for the generation with lower LD (generation 10) (Fig 1). Thus, the LD maps by chromosome revealed that the higher global LD (in LDUs) was observed in the synthetic but only for chromosomes 1 to 7 (S2 Fig). The higher global LD for chromosomes 8 and 9 was observed in the biparental population. The higher global LD for chromosome 10 was seen in the breeding population. The lowest global LD was observed in chromosome 6, and the highest global LD was observed in chromosome 10 of the breeding population. Because of the much higher number of SNPs in Hardy-Weinberg equilibrium in the biparental population, we only used this population for analysis of the number and length of the hot (high recombination rate) and cold (low recombination rate) spot regions of the chromosomes, as well as the number and length of the holes (Table 2). Except for chromosome 10, where the average lengths of the hot and cold spot regions were approximately 37 and 38 kb, respectively, the average lengths of the hot and cold spots regions for the other chromosomes ranged between approximately 45–55 and 83–110 kb, respectively. The number of hot spots ranged between 1,788 and 3,897, and the number of cold spots ranged from 608 to 1,507. The holes represented only 0.4 to 2.7% of the chromosomal genomes.
Fig 1

LD maps for generations 0 and 10.

Table 2

Number and minimum, average, and maximum length (kb) of the hot spots (steps), holes, and cold spots (plateaus) by chromosome in the biparental population.

Chr.HotHolesColdHot spot lengthHole lengthCold spot length
spotsspotsMin.Av.Max.Min.Av.Max.Min.Av.Max.
13897615070.00145.8391759.09654.917194.277326.2260.00184.3091632.212
226911510080.00151.7271616.8340.200204.446427.9340.001101.2951745.439
32541710240.00152.6022163.5190.120185.774499.4440.00198.0812130.732
428681310960.00152.6261873.4360.860241.479480.8430.00184.4672388.138
528221311320.00145.1362642.23033.762189.326421.7710.00182.7982015.799
61892107660.00154.8692872.9570.053217.741433.7190.00188.4431845.273
71908257490.00150.9561983.4090.100193.875492.7140.001106.7401014.346
81987147850.00146.5541040.4530.097162.792492.3950.001109.7861516.706
9178846870.00150.3411362.15586.562305.480498.0820.00199.1681664.406
103360186080.00137.1653159.5673.594152.031483.6150.00138.306360.908
Concerning SNPs separated by up to 500 kb, the biparental population and the synthetic had similar average |D'| values (0.77 and 0.75). The values were approximately 10–14% greater than the average value in the breeding population (Table 1). Interestingly, the average r2 value in the biparental population was approximately half of the corresponding average values observed in the other populations (0.02 versus 0.04, and 0.04). Regardless of the chromosome, the populations with the higher and lower frequencies of |D'| values greater than 0.75 were the biparental population (65–74%) and the breeding population (26–58%), respectively. However, the frequency of r2 values greater than 0.75 was lower in the biparental population (0.2–0.5%) and higher in the other populations (0.2–1.6%) (S2 Table). Furthermore, the average distance for SNPs with r2 values greater than 0.75 was much higher in the biparental population (approximately 80 to 126 kb). In the other populations, the ranges were approximately 6 to 19 and 6 to 35 kb. There were slight differences between the populations regarding the average distance for SNPs with |D'| values greater than 0.75 (in the range of approximately 207 to 229 kb). The heatmaps for the regions covered by the first 100 SNPs in each chromosome, in each population (1 to 3.3 Mb, approximately), provided evidence that the comparatively few high r2 values (close to 1.0) occurred only for SNPs in close proximity, especially in the synthetic and breeding populations (S3 Fig). Although these regions do not represent the pattern of LD along the chromosomes (see the LD pattern for five segments of 100 SNPs along chromosome 4 in the biparental population in S4 Fig) there are some regions with blocks of intermediate r2 values for distant SNPs, especially in the biparental population. Regardless of the chromosome, population, and LD measurement, the LD decreased as the between-SNP distance increased from 0–50 to 451–500 kb (S5 and S6 Figs). In general, there was an initially higher LD decrease for SNPs separated by 51–100 kb (3 to 7% for |D'| and 28 to 66% for r2, on average) and then a gradual decrease to the minimum LD value for SNPs separated by 451–500 kb. Because there were no significant differences between chromosomes, we can state that following an initial higher decrease after 50 kb, the |D'| and r2 in the biparental population extended with similar magnitude for an interval of 450 kb (Fig 2A and 2B). In this interval, the average |D'| values decreased from 0.69–0.77 to 0.64–0.77 in the three populations, and the average r2 values in the biparental population decreased from 0.025 to 0.020. However, in the other two populations, the average r2 value decreased by approximately 50%. The r2 decay from its maximum average value reached 36 to 73% after 5–10 kb (Fig 2C).
Fig 2

Overall average |D'| (a) and r2 (b and c) values by distance interval (kb) in the biparental population (Bip), in the synthetic (Syn), and in the breeding population (BFc4).

Overall average |D'| (a) and r2 (b and c) values by distance interval (kb) in the biparental population (Bip), in the synthetic (Syn), and in the breeding population (BFc4). The biparental population also differed from the other populations concerning the pattern of haplotype blocks (Table 3). The biparental population presented a lower average number of haplotype blocks per chromosome (approximately 225 versus 700 and 730 on average), a lower block length (approximately 1 versus 11 kb on average), and a lower number of SNPs per block (approximately 2 versus 3 on average). Most of the haplotype blocks in the three populations included two SNPs, but the number of haplotype blocks with three or more SNPs was greater in the synthetic and breeding populations (S7 Fig). It is important to highlight that the total length of the haplotype blocks represents only 0.01 to 5.13% of the chromosome genomes.
Table 3

Haplotype blocks structure of the populations.

PopulationChr.BlocksBlock size (kb)SNPs
TotalMeanMin.Max.TotalMeanMin.Max.
Biparental133658.600.170.00110.307272.225
2294588.312.000.001298.906472.226
3273307.661.130.001101.906222.325
419335.800.190.00123.154302.226
521847.490.220.00120.394842.224
6169419.242.480.001292.353872.325
721545.600.210.00111.684792.225
8186511.792.750.001423.794092.225
919558.190.290.00115.584322.225
10170314.881.850.001307.493702.224
Synthetic1112611935.2310.600.001494.9430932.7210
29358501.159.090.001451.7425652.7211
38109065.7511.190.001457.3022572.8211
45256615.6312.600.001423.7114092.7212
59336428.486.890.001395.7925272.7211
64965051.0110.180.001492.9513542.7211
75695169.269.090.001317.0715942.8215
85838927.7615.310.001476.3715742.7210
94866553.3713.480.001398.7213752.829
105343905.247.310.001434.3214772.8210
BFc41101914352.6214.090.001499.0428182.8212
28617904.799.180.001415.2824322.8211
37968682.6910.910.001418.1821532.7216
45396605.6512.260.001442.0114922.8212
577610870.4414.010.001479.5022012.8215
64765833.8512.260.001466.8212782.727
75704471.357.840.001479.7016122.8213
84919272.3018.890.001495.2613902.8212
95415188.659.590.001449.7714782.728
1012366619.875.360.001471.3033712.7212
The intragenic LD analysis also revealed higher average |D'| values in the biparental population and synthetic relative to the average value observed in the breeding population (0.74 and 0.88 versus 0.67). The biparental population presented an average r2 value that was much lower than the average values observed in the other two populations (0.02 versus 0.13 and 0.14) (Table 4). Regardless of the population, the maximum intragenic |D'| (1,0) was observed for SNPs separated by up to 10.6 kb, while most of the higher intragenic r2 values (0.7 or greater) were only observed for the closest SNPs (S8 and S9 Figs). The intragenic heatmaps provided evidence of distinct LD patterns between genes and populations (S9 Fig). With regard to the intragenic LD decay, there was evidence of |D'| and r2 decay in the breeding population and r2 decay in the synthetic (Fig 3). Concerning the intragenic haplotype block structure, there was general evidence of a single block of variable size (0.03 to 8.72 kb) with two SNPs (Table 5). Genes Zm00001d018033 and Zm00001d041972 showed population differences in terms of block size and number of SNPs.
Table 4

Intragenic minimum, average, and maximum LD values in each population.

GenePopulation|D'|r2
Min.Av.Max.Min.Av.Max.
Zm00001d002654Biparental0.1760.961.00.0000.0050.19
Synthetic0.0030.601.00.0000.1591.00
BFc40.0420.441.00.0000.2581.00
Zm00001d004817Biparental0.0280.811.00.0000.0040.06
Synthetic0.0590.621.00.0000.0891.00
BFc41.0001.001.00.0020.3100.93
Zm00001d005451Biparental0.1480.911.00.0000.0030.01
Synthetic0.4070.891.00.0000.1061.00
BFc40.0570.511.00.0000.2110.97
Zm00001d041972Biparental0.1320.891.00.0000.0040.06
Synthetic0.2630.791.00.0000.1911.00
BFc40.1930.881.00.0000.2801.00
Zm00001d052263Biparental0.2360.851.00.0000.0110.06
Synthetic0.2170.931.00.0000.1161.00
BFc40.3230.871.00.0000.0851.00
Zm00001d018033Biparental0.0000.831.00.0000.0310.87
Synthetic0.4880.971.00.0000.0250.21
BFc40.1370.771.00.0010.0700.46
Zm00001d035760Biparental0.1870.841.00.0000.0070.06
Synthetic1.0001.001.00.0070.0070.01
BFc40.7210.720.70.0270.0270.03
Zm00001d036900Biparental0.0000.761.00.0000.0930.88
Synthetic0.0050.771.00.0000.0261.00
BFc40.0310.601.00.0000.0190.24
Zm00001d021731Biparental0.0940.591.00.0000.0370.68
Synthetic0.0190.581.00.0000.2821.00
BFc40.1930.571.00.0010.2481.00
Zm00001d023810Biparental1.0001.001.00.0000.0000.00
Synthetic0.0260.761.00.0000.0931.00
BFc40.0040.481.00.0000.0660.97
Zm00001d025201Biparental0.0970.841.00.0000.0040.06
Synthetic0.0590.591.00.0000.3680.87
BFc40.0060.681.00.0000.0611.00
Zm00001d026113Biparental0.0020.821.00.0000.0261.00
Synthetic0.1050.811.00.0000.0570.90
BFc40.0150.521.00.0000.0731.00
Fig 3

Intragenic LD decay and LD extent concerning SNPs separated by up to 10.6 kb (|D'| and r2 average values in intervals of 1 kb).

Table 5

Intragenic haplotype blocks structure in each population.

PopulationGeneChr.BlocksBlock size (kb)SNPs
TotalMeanMin.Max.TotalMeanMin.Max.
BiparentalZm00001d018033518.728.728.728.722222
Zm00001d0261131010.030.030.030.032222
SyntheticZm00001d002654210.050.050.050.053333
Zm00001d004817220.220.110.020.214222
Zm00001d005451210.030.030.030.032222
Zm00001d036900310.060.060.060.062222
Zm00001d041972310.020.020.020.022222
BFc4Zm00001d041972312.222.222.222.226666
Zm00001d018033510.260.260.260.262222

Discussion

It is difficult to characterize the LD and haplotype block patterns in two or more unrelated random cross populations based on an LD map and two measures of linkage disequilibrium. Based on studies of the LD pattern in human populations, LD maps demonstrated that the human chromosomes have a pattern of regions of extensive LD (plateaus or cold spots), interspaced with regions of high recombination rate (steps or hot spots) [25, 26]. Both regions are variable in number and length, and cold spots show equal (as assumed in this study) or similar LD in LDUs. The hot spots present distinct LDUs. The same pattern was seen in the LD maps of the chromosomes of the biparental population, elaborated under high density as recommended by Pengelly et al. [25]. To better understand the level of LD in the hot and cold spots, we analyzed two extreme segments of the chromosome 1 LD map, including 30 SNPs. Both segments have similar lengths in LDUs (4.1 and 3.6) and kb (970 and 828). The average |D'| was much greater for the SNPs in the seven cold spots (including three to 12 SNPs) relative to the average value for the SNPs in the 21 hot spots (including two to three SNPs) (0.89 versus 0.29). However, this was not verified via the r2 statistic (0.004 versus 0.038). When comparing populations that share a common origin, have a similar effective population size, and did not face an extreme reduction in size (population bottleneck), the statistics D, D', and r2 should provide a comparable characterization of the LD pattern if there are similar allele frequencies. If the populations have distinct distributions of allelic frequencies, D' can be used for analyzing the recombination history, and r2 should be the choice if recombination and mutation are important factors affecting the LD [1]. However, in the last two decades, most studies on LD in human populations have aimed to select populations and SNPs (tagging SNPs) for association studies [26, 27]. In general, both |D'| and r2 have been used [27, 28], and because of their high level of LD, isolated populations have been recommended for association studies [29]. The statistic r2 is the most relevant for association mapping because it has a simple inverse relationship with the sample size required to detect association [1]. The use of LD maps and two measures of LD for comparing the popcorn populations provided some contrasting results, but the general evidence is that the synthetic is the population with the higher LD. As expected, the lower average |D'| value in the breeding population reflects its recombination history. The synthetic and the biparental populations presented greater average |D'| and higher frequency of SNPs with elevated |D'| values because they have no recombination history. Because of the differences regarding molecular marker type and density, sample size, and genome coverage, comparison of LD values of human, domesticated animal, and plant populations should be made with caution, even when the studies involve the same species. We were surprised by the low average r2 values and the reduced frequency of SNPs with r2 values greater than 0.25 (defined as useful LD in some studies) in the popcorn populations. In the study of Yan et al. [30], involving 632 maize inbred lines and 943 SNPs (density of one SNP each 2,121 kb), the average r2 was only 0.009. However, for SNPs separated by up to 100 kb, the average was 0.2 (0.03, 0.09, and 0.10 for the biparental, synthetic, and breeding populations, respectively). Even higher LD values were reported in the maize NAM (nested association mapping) population [31] and in two biparental and four FPM (four parent maize) populations studied by Anderson et al. [9]. In general, the average r2 values observed in the popcorn populations are also lower than the values observed in cattle and chicken populations (0.1 to 0.8 for SNPs separated by up to 100 kb) [32-34]. The density ranged from 27.8 to 112.3 kb in these three studies. Using a 600K SNP chip (density of one SNP per 6.3 kb), Pardo et al. [28] observed a median pairwise r2 averaged across all chromosomes of 0.015 and 0.016 for the Dutch and HapMap-CEU populations, respectively. The absence of a uniform criterion for defining the LD decay and the LD extent also makes comparison of the results with human, domesticated animal, and plant populations difficult. Angius et al. [26] used LD decay as the distance over which the average LD decreases to half of its maximum value (half-length). They defined LD extent as the distance over which the average LD declines to an asymptotic value. Anderson et al. [9] used LD decay as the distance over which the average r2 dropped below 0.8, and LD extent as the distance over which the average r2 fell below 0.2. Concerning LD decay, our results showed differences between LD measures and populations. There were slight differences between chromosomes, but the higher r2 decay occurred after 5–10 kb (36 to 73%). Yan et al. [30] observed an LD decay of 64% after 5–10 kb in an inbred lines panel, and the LD reached an approximate asymptotic r2 value of 0.01 in the interval of 1–5 Mb (LD extent of 5 Mb). A similar LD extent (5 Mb) was observed in eight breeds of cattle, but a comparable LD decay (62%) occurred along 100 kb [35]. From the analysis of segments of one Mb in all chromosomes in Ashkenazi Jew, caucasian, and African American populations, Shifman et al. [36] observed LD decays of 17, 21, and 42% along 10 kb, respectively. A similar LD extent of 300 kb occurred in the populations (reaching an approximate asymptotic r2 value of 0.05). If there is a higher LD between QTLs and haplotypes than with individual SNPs, haplotype blocks can provide substantial statistical power in association studies [6] and increased accuracy of genomic prediction of complex traits [37]. Surprisingly, our results evidenced that the number and length of the haplotype blocks and the number of SNPs per haplotype block were proportional to the average r2. The criterion of Gabriel et al. [6] appears to provide a reduced number of SNPs per haplotype block. In a study with 235 soybean varieties genotyped by 5,361 SNPs (density of one SNP per 208 kb), Ma et al. [38] observed six SNPs per haplotype block on average. This is not surprising because the group of varieties corresponded to a pure line panel (high LD). In studies with German Holstein cattle and four chicken populations, the average number of SNPs per haplotype block ranged between approximately four to 10, and the mean block length ranged from approximately 146 to 799 kb [32, 33]. Low average numbers of SNPs per haplotype block (approximately 4–5) and reduced average haplotype block lengths (approximately 5–7 kb) were also observed in human populations [6, 28]. However, the size of each block varied dramatically in the study of Gabriel et al. [6], from less than one to 173 kb. Concerning the low intragenic LD and the minimum size of the haplotype blocks observed in the three populations, we believe that the lower LD for the biparental population is due to crossing two genetically similar high-quality inbred lines. Because there is no information on the LD and haplotype block patterns in the base populations Viçosa and Beija-Flor, we cannot infer that the higher average intragenic r2 values observed in the synthetic and breeding populations (for 11 of the 12 genes) are due to selection for quality. Characterization of the LD and haplotype block patterns regarding specific chromosomal regions has only been made by human geneticists, generally aimed at SNP tagging. From the analysis of SNPs within the HLA region on chromosome 6, Evseeva et al. [39] observed 18 haplotype blocks in European populations, based on the criterion of Gabriel et al. [6]. Furthermore, the LD was slightly lower in southern than northern European populations. Using the same criterion, Nuchnoi et al. [40] observed six and four haplotype blocks across a 472-kb region on chromosome 5q31-33 in Southeast (Thai) and Northeast Asian (Chinese and Japanese) populations. Akesaka et al. [41] identified two to six blocks in Korean and Japanese populations, depending on the criterion of an LD block, spanning approximately 3 to 47 kb. The median r2 value for the five genes in the region ranged from 0.03 to 0.89. In conclusion, the level of LD expressed by the r2 values in the three popcorn populations with different genetic structures—a biparental population, a synthetic, and a breeding population—is low but comparable to some non-isolated human populations. This finding does not imply that these populations cannot be used for GWAS because there is a fraction of high r2 values for SNPs separated by less than 5 kb. The populations are also not excluded for genomic selection because the most important factor affecting this selection process is the relatedness between individuals in the training and validation sets. However, we do not expect a significant advantage from haplotype-based GWAS and genomic selection along generations due to the reduced number of SNPs in the haplotype blocks (2 to 3). The results on LD decay (rapid decay after 5–10 kb) and LD decay extent (along up to 300 kb) are in the range observed with maize inbred line panels. Our most important result is that, similar to human chromosomes, maize (popcorn is also Zea mays, but ssp. everta) chromosomes also have a pattern of regions with extensive LD (plateaus or cold spots), interspaced with regions of low LD (steps or hot spots). It should be highlighted, however, that our simulated LD map provides evidence that this pattern can reflect regions with differences in allele frequencies and LD level (expressed by D') and not regions with high and low rates of recombination as evidenced by Jeffreys et al. [42], since the simulation process assumes a rate of recombination that is proportional to the distance in cM.

Gene name, annotation, and chromosome localization, and the number of intragenic SNPs in each population.

(PDF) Click here for additional data file.

Minimum and maximum LD values, average distance (kb), and frequency observed in chromosomes by population, concerning SNPs with |D'| and r2 values higher than 0.75, in the interval 0.25–0.75, and lower than 0.25.

(PDF) Click here for additional data file. MAF distribution in the biparental population (a), in the synthetic (b), and in the breeding population (c). (PDF) Click here for additional data file.

LD maps of the populations, by chromosome.

(PDF) Click here for additional data file.

LD heatmaps by populations and chromosome regarding the first 100 SNPs; the regions covered ranged from approximately 1.0 to 3.3 Mb; the r2 and |D'| values are above and below the diagonal, respectively.

(PDF) Click here for additional data file.

LD heatmaps for five segments of 100 SNPs along chromosome 4 in the biparental population; the regions covered ranged from approximately 1.4 to 6.0 Mb; the r2 and |D'| values are above and below the diagonal, respectively.

(PDF) Click here for additional data file. Average |D'| values by chromosome and by distance interval (kb) in the biparental population (a), in the synthetic (b), and in the breeding population (c). (PDF) Click here for additional data file. Average r2 values by chromosome and by distance interval (kb) in the biparental population (a), in the synthetic (b), and in the breeding population (c). (PDF) Click here for additional data file.

Distribution of the haplotype blocks based on the number of SNPs in the biparental population (Bip), in the synthetic (Syn), and in the breeding population (BFc4).

(PDF) Click here for additional data file. Overall intragenic |D'| (a, b, c) and r2 (d, e, f) by distance interval (bp) in the biparental population (a and d), in the synthetic (b and e), and in the breeding population (c and f). (PDF) Click here for additional data file.

Intragenic LD heatmaps by population; the r2 and |D'| values are above and below the diagonal, respectively.

(PDF) Click here for additional data file.
  39 in total

1.  The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis.

Authors:  N Maniatis; A Collins; C F Xu; L C McCarthy; D R Hewett; W Tapper; S Ennis; X Ke; N E Morton
Journal:  Proc Natl Acad Sci U S A       Date:  2002-02-12       Impact factor: 11.205

2.  Properties of linkage disequilibrium (LD) maps.

Authors:  Weilhua Zhang; Andrew Collins; Nikolas Maniatis; William Tapper; Newton E Morton
Journal:  Proc Natl Acad Sci U S A       Date:  2002-12-16       Impact factor: 11.205

3.  Linkage disequilibrium patterns of the human genome across populations.

Authors:  Sagiv Shifman; Jane Kuypers; Mark Kokoris; Benjamin Yakir; Ariel Darvasi
Journal:  Hum Mol Genet       Date:  2003-04-01       Impact factor: 6.150

4.  The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models.

Authors:  R C Lewontin
Journal:  Genetics       Date:  1964-01       Impact factor: 4.562

Review 5.  Allelic association: linkage disequilibrium structure and gene mapping.

Authors:  Andrew Collins
Journal:  Mol Biotechnol       Date:  2008-10-08       Impact factor: 2.695

Review 6.  Linkage disequilibrium and association mapping.

Authors:  B S Weir
Journal:  Annu Rev Genomics Hum Genet       Date:  2008       Impact factor: 8.929

7.  A first-generation haplotype map of maize.

Authors:  Michael A Gore; Jer-Ming Chia; Robert J Elshire; Qi Sun; Elhan S Ersoz; Bonnie L Hurwitz; Jason A Peiffer; Michael D McMullen; George S Grills; Jeffrey Ross-Ibarra; Doreen H Ware; Edward S Buckler
Journal:  Science       Date:  2009-11-20       Impact factor: 47.728

8.  Genome-wide haplotype-based association analysis of key traits of plant lodging and architecture of maize identifies major determinants for leaf angle: hapLA4.

Authors:  Carlos Maldonado; Freddy Mora; Carlos A Scapim; Marlon Coan
Journal:  PLoS One       Date:  2019-03-06       Impact factor: 3.240

9.  Improved maize reference genome with single-molecule technologies.

Authors:  Yinping Jiao; Paul Peluso; Jinghua Shi; Tiffany Liang; Michelle C Stitzer; Bo Wang; Michael S Campbell; Joshua C Stein; Xuehong Wei; Chen-Shan Chin; Katherine Guill; Michael Regulski; Sunita Kumari; Andrew Olson; Jonathan Gent; Kevin L Schneider; Thomas K Wolfgruber; Michael R May; Nathan M Springer; Eric Antoniou; W Richard McCombie; Gernot G Presting; Michael McMullen; Jeffrey Ross-Ibarra; R Kelly Dawe; Alex Hastie; David R Rank; Doreen Ware
Journal:  Nature       Date:  2017-06-12       Impact factor: 49.962

10.  Fixed-length haplotypes can improve genomic prediction accuracy in an admixed dairy cattle population.

Authors:  Melanie Hess; Tom Druet; Andrew Hess; Dorian Garrick
Journal:  Genet Sel Evol       Date:  2017-07-03       Impact factor: 4.297

View more
  3 in total

1.  Identification and genomic characterization of major effect bacterial blight resistance locus (BB-13) in Upland cotton (Gossypium hirsutum L.).

Authors:  S Anjan Gowda; Navin Shrestha; Taylor M Harris; Vasu Kuraparthy; Anne Z Phillips; Hui Fang; Shilpa Sood; Kuang Zhang; Fred Bourland; Rebecca Bart
Journal:  Theor Appl Genet       Date:  2022-10-08       Impact factor: 5.574

2.  Genome-wide association analysis of stem water-soluble carbohydrate content in bread wheat.

Authors:  Luping Fu; Jingchun Wu; Shurong Yang; Yirong Jin; Jindong Liu; Mengjiao Yang; Awais Rasheed; Yong Zhang; Xianchun Xia; Ruilian Jing; Zhonghu He; Yonggui Xiao
Journal:  Theor Appl Genet       Date:  2020-06-27       Impact factor: 5.699

3.  Significance of linkage disequilibrium and epistasis on genetic variances in noninbred and inbred populations.

Authors:  José Marcelo Soriano Viana; Antonio Augusto Franco Garcia
Journal:  BMC Genomics       Date:  2022-04-09       Impact factor: 3.969

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.