Literature DB >> 25983627

Developing a common bean core collection suitable for association mapping studies.

Juliana Morini Küpper Cardoso Perseguini¹, Gliciane Micaele Borges Silva², João Ricardo Bachega Feijó Rosa³, Rodrigo Gazaffi⁴, Jéssica Fernanda Marçal², Sérgio Augusto Morais Carbonell⁵, Alisson Fernando Chiorato⁵, Maria Imaculada Zucchi⁶, Antonio Augusto Franco Garcia³, Luciana Lasry Benchimol-Reis¹.

Abstract

Because of the continuous introduction of germplasm from abroad, some collections have a high number of accessions, making it difficult to explore the genetic variability present in a germplasm bank for conservation and breeding purposes. Therefore, the aim of this study was to quantify and analyze the structure of genetic variability among 500 common bean accessions to construct a core collection. A total of 58 SSRs were used for this purpose. The polymorphism information content (PIC) in the 180 common bean accessions selected to compose the core collection ranged from 0.17 to 0.86, and the discriminatory power (DP) ranged from 0.21 to 0.90. The 500 accessions were clustered into 15 distinct groups and the 180 accessions into four distinct groups in the Structure analysis. According to analysis of molecular variance, the most divergent accessions comprised 97.2% of the observed genetic variability present within the base collection, confirming the efficiency of the selection criterion. The 180 selected accessions will be used for association mapping in future studies and could be potentially used by breeders to direct new crosses and generate elite cultivars that meet current and future global market needs.

Entities: CellLine Chemical Disease Species

Keywords: Phaseolus vulgaris L; genetic diversity; genetic structure; microsatellites; molecular markers

Year: 2014 PMID： 25983627 PMCID： PMC4415564 DOI： 10.1590/S1415-475738120140126

Source DB: PubMed Journal: Genet Mol Biol ISSN： 1415-4757 Impact factor: 1.771

Introduction

Common bean (Phaseolus vulgaris L.) is a species of great agronomic interest, as it is an important grain legume for human consumption worldwide (Angioi ). This species was domesticated by Middle American and South American Andean cultures (Gepts ; Gepts, 1998) and has progressively dispersed worldwide (Angioi ; Asfaw ). Bitocchi suggested a Mesoamerican origin of the common bean. Burle pointed out Brazil as a secondary center of common bean diversity. In Brazil, the common bean most likely came from at least two different routes, as indicated by the occurrence of both small and large beans (Gepts, 1998). Nonetheless, beans of Mesoamerican genetic origin are preferred by most of the population, and this preference is shown by the dominance of carioca and black bean types in their diets. The narrow genetic base of modern crop cultivars is a serious obstacle to sustaining and improving crop productivity due to the vulnerability of genetically uniform cultivars to potentially new biotic and abiotic stresses (Abdurakhmonov and Abdukarimov, 2008). Plant germplasm resources worldwide, including wild plant species, modern cultivars, and their wild crop relatives, are important reservoirs of natural genetic variations. The Common Bean Germplasm Bank of the Agronomic Institute (IAC, Campinas, S.P. Brazil) holds more than 1800 accessions representing the two principal centers of origin (Andean and Mesoamerican) and includes ecotypes from different South American countries and a large number of lines from both Brazilian and international genetic improvement programs (Chiorato ). Association mapping, also known as linkage disequilibrium (LD)-based association mapping (Mackay and Powell, 2007; Zhu ; Myles ), has been proposed as an alternative to quantitative trait locus (QTL)-mapping. The LD associates single DNA sequence changes with traits of interest using collections of unrelated individuals. It is rapid and cost effective as many alleles may be assessed simultaneously, resulting in higher resolution mapping. It uses most of the recombination events that occur over time, while avoiding the need to expensively conduct crossing of populations. Field evaluation and use of large germplasm collections for associative mapping are mostly constrained by problems related to accession redundancy, economic cost, and time. Assessment of genetic resources, thus, could be more rational if focused on a subset of accessions, or the so-called core collection, which includes the maximum variability of the base collection with the minimal possible size (Frankel and Brown, 1984; Spagnoletti-Zeuli and Qualset, 1993; van Hintum ). A core collection is formed by selecting a small percentage of the original collection that will represent most of the total genetic variation with minimum redundancy (Brown, 1995). The principal steps to establish a core collection are as follows: (a) determine the size of the core subset; (b) divide the collection into distinct groups; and (c) select entries in each group to form the core collection. The complexity of establishing a core subset is a function of the available data and applied sampling procedure (Brown, 1989a,b; Brown and Spillane, 1999). The established core collection must be validated to ensure its adequacy and usefulness by assessing whether the characteristics and variability of the entire collection have been maintained. Comparison of the entire and the core collection properties is accomplished using mean, variance, frequency, and distribution data of several morphological traits or molecular markers. Understanding the genetic diversity and population structure of a core collection is also an important step since unaccounted population structure can lead to spurious associations (Pritchard ,b). Logozzo developed a core collection for European common bean germplasm with 544 accessions by using sampling methods based on the information available in the GenBank database and phaseolin pattern. Accessions with similar phenotypes may not necessarily have close genetic relationships (Marita ) because of the polygenic properties of most traits and the effect of the environment on the expression of the analyzed trait. Hence, applying molecular marker information reflecting the DNA polymorphism pattern is a powerful tool in core collection development. Microsatellites (simple sequence repeats - SSRs, Tautz, 1989) have a high level of polymorphism, which allows the discrimination of cultivars and closely related common bean breeding lines, providing a reliable and efficient tool for germplasm characterization, conservation, and management (Blair , 2007, 2009; Benchimol ; Perseguini ). Blair and McClean assessed the genetic diversity of common bean core collection by using SSRs and found a significant population structure that can be used for association studies. The aim of the present study was to access the diversity level and genetic structure of 500 accessions from the IAC Common Bean Germplasm Bank and select 180 accessions that represent most of the variability in order to use this core collection in association mapping studies.

Materials and Methods

Plant material and DNA extraction

Five hundred genotypes from the IAC Common Bean Germplasm Bank (Campinas, S.P., Brazil) were used (Table S1). These 500 genotypes were selected from among more than 1800 accessions from the genebank accessions because they already had information of important agronomic traits for these accessions. Among the agronomical traits considered were resistance to anthracnose, angular leaf spot, rust, fusarium wilt, bacterial blight, a gold mosaic virus, tolerance to water deficit, grain size and tegument color. Total genomic DNA for all recombinant inbred lines was isolated from bulked young leaves of 10 plants per genotype using the CTAB extraction method as described in Hoisington .

SSR analysis

A total of 58 microsatellites (Table 1) were selected for their broad genomic distribution and high polymorphism information content. From these, 43 were EST-SSRs (Hanai ) and 15 were genomic-SSRs previously mapped (Campos ). The PCR amplifications were performed in a 25 μL final volume containing 50 ng DNA, 1x buffer, 0.2 μM of each forward and reverse primer, 100 μM of each dNTP, 2.0 mM MgCl2, 10 mM Tris-HCl (pH 8.0), 50 mM KCl, and 0.5 U of Taq-DNA polymerase. The following conditions were used for amplification: 1 min at 94 °C, followed by 30 cycles of 1 min at 94 °C, 1 min at annealing temperature specific for each SSR and 1 min at 72 °C, with a final extension of 5 min at 72 °C. The PCR products were viewed on a 3% agarose gel. Amplicons were separated by 6% denaturing polyacrylamide gel electrophoresis and silver stained (Creste ) (Figure S1). SSRs bands were manually scored.

Table 1

Information for the 58 microsatellites that were used to assess the 500 common bean accessions, the core collection (180). The annealing temperatures (Ta), sizes fragments, numbers of alleles and polymorphism index values (PIC) and the discriminatory power (DP) are given for each marker. The first 15 are genomic-SSR loci, and the other 47 are EST-SSR loci.

SSRs used in genotyping analysis	Motive	Annealing temperature (Ta ºC)	Fragment length/size (pb)	Number of alleles (500)	Number of alleles (180)	PIC (500)	DP (500)	PIC (180)	DP (180)
SSR-IAC20	(GA)₇ AA (GA)₂	56	190–194	2	2	0.37	0.49	0.35	0.81
SSR-IAC24	(AC)₇ (AT)₆	56	172–176	3	3	0.26	0.28	0.26	0.74
SSR-IAC52	(GA)₁₁	56	160–216	7	6	0.64	0.69	0.61	0.88
SSR-IAC58	(TG)₁₀	56	192–200	3	3	0.41	0.48	0.44	0.83
SSR-IAC62	(AG)₁₄	45.3	202–216	8	8	0.81	0.84	0.82	0.95
SSR-IAC63	(AC)₆	59.8	202–208	2	2	0.37	0.56	0.37	0.85
SSR-IAC66	(GA)₁₀	56	260–300	10	10	0.86	0.87	0.87	0.96
SSR-IAC68	(CT)₈	56	252–276	3	3	0.57	0.72	0.51	0.90
SSR-IAC127	(TA)₃ T (TGA)₃ G (TA)₃	63.3	198–200	2	2	0.35	0.46	0.35	0.81
SSR-IAC136	(CA)₇ (AT)₅	56.7	232–270	6	6	0.72	0.64	0.73	0.88
SSR-IAC156	(TC)₃ TG (GC)₂	56.7	238–240	2	2	0.35	0.5	0.31	0.80
SSR-IAC160	(TG)₂ (TA)₂ (TG)₅	56.7	180–184	2	2	0.36	0.49	0.37	0.83
SSR-IAC167	(TG)₇ (CG)₃	56.7	150–180	2	2	0.35	0.45	0.32	0.78
SSR-IAC179	(AC)6	50	100–104	2	2	0.36	0.50	0.34	0.81
SSR-IAC181	(AT)₂ AC (AT)₃(AG)₅	58.4	204–208	2	2	0.37	0.57	0.37	0.85
PvM01	(TTC)₇	65	240–300	6	6	0.61	0.68	0.61	0.89
PvM02	(CTT)₆	65	148–198	4	4	0.60	0.65	0.61	0.88
PvM03	(TTC)₆	65	180–198	3	3	0.47	0.58	0.49	0.85
PvM04	(TTC)₁₀	55	198–298	7	7	0.77	0.79	0.76	0.92
PvM07	(ATG)₆	55	200–210	4	4	0.56	0.63	0.58	0.88
PvM11	(AGA)₆	55	154–156	2	2	0.21	0.32	0.20	0.75
PvM13	(GAA)₆	55	248–260	5	5	0.68	0.72	0.67	0.89
PvM14	(AATC)₅	55	148–152	3	3	0.44	0.55	0.44	0.83
PvM15	(TGCA)₅	55	182–310	3	3	0.29	0.34	0.26	0.75
PvM17	(ATGA)₅	55	202–208	8	7	0.51	0.58	0.50	0.85
PvM21	(AT)₁₄	55	226–320	12	12	0.86	0.90	0.86	0.97
PvM22	(TC)₅	55	220–226	3	3	0.37	0.47	0.38	0.82
PvM28	(TCA)₅	55	180–190	2	2	0.31	0.38	0.33	0.79
PvM36	(ATC)₅	55	204–208	2	2	0.36	0.47	0.37	0.82
PvM40	(CTG)₆	55	130–222	6	5	0.58	0.68	0.57	0.88
PvM45	(CT)₂₄	55	174–200	5	5	0.56	0.62	0.55	0.86
PvM52	(CA)₇	45	500–600	5	5	0.57	0.65	0.60	0.88
PvM53	(TC)₆	55	200–204	2	2	0.36	0.50	0.37	0.83
PvM56	(TA)₆	55	262–264	2	2	0.34	0.45	0.30	0.78
PvM58	(TC)₅	55	230–234	2	2	0.32	0.47	0.30	0.80
PvM61	(AC)₆	55	196–200	2	2	0.37	0.5	0.37	0.82
PvM62	(TC)₅	55	176–178	2	2	0.35	0.49	0.32	0.79
PvM66	(AT)₅	65	182–200	5	5	0.46	0.57	0.48	0.86
PvM68	(AT)₆	55	174–178	3	3	0.17	0.21	0.16	0.71
PvM73	(AAG)₆	45	600–598	2	2	0.37	0.5	0.37	0.82
PvM75	(GAT)₅	45	224–226	2	2	0.36	0.49	0.35	0.81
PvM79	(GAA)₅	65	128–130	2	2	0.36	0.49	0.36	0.81
PvM93	(GA)₅	65	206–216	2	2	0.18	0.24	0.17	0.72
PvM95	(AC)₉	65	390–400	2	2	0.32	0.41	0.34	0.80
PvM97	(TA)₅	65	168–170	2	2	0.33	0.46	0.33	0.81
PvM98	(TC)₅	65	108–120	3	3	0.18	0.21	0.17	0.71
PvM100	(AT)₇	65	198–304	4	3	0.25	0.33	0.22	0.75
PvM118	(TC)₈	65	200–210	3	3	0.56	0.65	0.59	0.88
PvM120	(TC)₆	65	200–204	3	2	0.37	0.49	0.36	0.81
PvM123	(CT)₉	55	198–220	2	2	0.26	0.31	0.22	0.74
PvM124	(TA)₅	55	500–598	2	2	0.37	0.5	0.37	0.82
PvM126	(TC)₇	65	130–140	3	3	0.20	0.22	0.19	0.72
PvM127	(TC)₅	55	260–270	3	3	0.43	0.50	0.43	0.82
PvM132	(AG)₅	55	290–296	2	2	0.27	0.32	0.27	0.76
PvM148	(CAA)₇	65	180–196	2	2	0.32	0.41	0.33	0.80
PvM150	(TCT)₅	65	190–200	3	3	0.47	0.58	0.45	0.85
PvM151	(TTA)₆	65	202–208	2	2	0.31	0.38	0.33	0.79
PvM153	(AGA)₅	65	272–274	2	2	0.22	0.27	0.21	0.91
-	-	-	-	-	-	Mean 0.29	Mean 0.37	Mean 0.29	Mean 0.865

Data analysis

The size of alleles was scored in base pairs (bp) by visual comparison with a 100-bp DNA ladder and the value was converted to gene and genotypic frequencies. After the binary allele scoring (1 or 0, respectively), genotyping was performed using the allele number in decreasing order, that is to say, the alleles of largest size received the highest numbers, declining towards the lower size alleles. In the case of diploids, such as common beans, the scoring was considered twice when the band was homozygous and the genotype heterozygous, in which case both alleles were scored. The resulting matrix was used for obtaining genetic distances in Tools for Population Genetic Analyses (TFPGA) software, version 1.3 (Miller, 1997). The percentage of polymorphisms obtained with each primer was calculated from this matrix. The genetic distances (GDs) were calculated from the SSR and EST-SSR data for all possible inbred pairs using modified Roger’s genetic distance (MRD; Goodman and Stuber, 1983) implemented in the TFPGA program. Cluster analyses were performed using UPGMA with the incorporated NTSYS-pc computer package (Rohlf, 2000), version 2.1. Clustering stability was tested using a Bootstrap procedure based on 10,000 re-samplings with the BooD program (Coelho, 2002). The polymorphism information content (PIC) values for SSRs were calculated using the following equation: where n is the number of alleles and f and f are the frequencies of the i th and j th allele, respectively (Lynch and Walsh, 1998). The discrimination power (DP) values for the k th primer were calculated using the formula: where N is the number of individuals, and p is the frequency of the j th pattern (Tessier ). The PIC was used to measure the information of a given marker locus for the pool of genotypes, while DP was used to measure the efficiency of the SSRs in identifying varieties by taking into account the probability that two randomly chosen individuals will have different patterns. Wrights F statistics for SSRs were estimated using the GDA program (Lewis and Zaykin, 2000). This analysis was used to compare the structure of genetic diversity of the base collection with the core collection. Analysis of molecular variance (AMOVA) was used for estimating population differentiation directly from molecular data and testing hypotheses about such differentiation. The analyses were carried out using Arlequin 3.5 software (Excoffier and Lischer, 2010). The significance of the fixation indices was tested by a permutation procedure with 10,000 iterations. The Arlequin 3.5 software was also used to estimate diversity fraction (FST) generated by SSRs analyses. AMOVA was performed with the base collection and core collection criteria. We used “among populations” to compare the base population and the core collection and “within population” to indicate the variability within each population. Bootstrapping (Efron and Tibshirani, 1993) was used to determine whether the number of polymorphic SSRs used for genetic similarity estimation was adequate for a precise estimation of molecular markers among the 500 genotypes (Tivang ). The polymorphic markers were submitted to sampling with replacement to create new samples from the original data. The genetic similarities for each of these subsets were calculated from 1000 bootstrap estimates of the SSRs for each of these combinations. The coefficients of variation (CV) were used to construct box plots for each sample size. These analyses were carried out with R software (R Development Core Team, 2014). The exponential function was adjusted to estimate the number of loci needed to obtain a 10% CV. The median and maximum CV values were used to evaluate the accuracy of the genetic distance estimates. Although the mean CV is often used in the literature, caution is needed when dealing with molecular marker data for which there is no assurance that the CV values are distributed symmetrically. The genetic structure of the sample was investigated using the Bayesian clustering algorithm implemented in STRUCTURE v.2.2 (Pritchard ). The Admixture model was used for the base dataset with no previous population information and the “no-correlated allele frequencies between populations” option. Ten runs were applied using a burn-in period of 200,000 iterations, a run length of 500,000 Monte Carlo Markov Chain (MCMC) iterations, and a number of clusters varying from K = 2 to K = 20. The ad hoc statistic ΔK defined by Evanno was used to determine the most probable number of clusters. The mean of the absolute values of L’ (K) was divided by the standard deviation, where L’ (K) stands for the mean likelihood plotted over 10 runs for each K. A hierarchical analysis of variance was carried out to test the significance of the differentiation among populations and clusters as defined by Structure software.

Construction of the core collection

In order to select the 180 accessions for a common bean core subset (Table S2), the following sampling criteria were applied: (i) the same percentage of each Structure group was selected to be integrated into the core collection; (ii) 105 accessions were selected equally from each structure group on the basis of the greatest genetic distance between accessions within each group and according to the genetic distance matrix and dendrogram (Figure S2); (iii) maintenance of 75 carioca tegument cultivars, widely cultivated in the State of São Paulo (Brazil) under the leadership of the Agronomic Institute (IAC).

Results and Discussion

Molecular marker polymorphism of the base collection and genetic analyses

Genetic diversity among 500 common bean accessions was assessed from a total of 200 informative loci. The average number of alleles per locus of genomic-SSRs was 3.73, ranging from 2 to 10 alleles, and for EST-SSRs, it was 3.35. The highest numbers of observed alleles were found for SSR-IAC66 and PvM21 (Table 1). Our study showed an average of 2.8 alleles per locus, and found only three alleles for SSR-IAC66, corroborating the previous evaluation by Hanai of 40 genomic-SSRs and 40 EST-SSRs in the Andean and Mesoamerican genotypes. Of the total number of markers in our study, 26 genomic-SSRs and 31 EST-SSRs exhibited a polymorphic pattern, with 2–7 alleles per locus and PvM21 showing 12 alleles. Hanai evaluated the genetic diversity of an additional set of 100 EST-SSRs in 24 common bean genotypes, of which 54 were polymorphic, with an average of 2.7 alleles per locus. The polymorphism information content (PIC) ranged from 0.26 to 0.86 for genomic markers and 0.17 to 0.86 for genic markers, and SSR-IAC66 and EST-SSR PvM21 were the loci with the highest PIC values (Table 1). The DP values ranged from 0.28 (SSR-IAC24) to 0.87 (SSR-IAC66) for genomic-SSRs and 0.21 (PvM68 and PvM98) to 0.90 (PvM21) for EST-SSRs (Table 1). The high PIC and DP values obtained for the SSR-IAC66 and PvM21 markers suggest their potential in accessing the genetic diversity in common beans. Benchimol assessed the genetic diversity of 20 common bean genotypes belonging to the Andean and Mesoamerican gene pools with genomic-SSRs and found PIC values varying from 0.05 to 0.83. Perseguini obtained lower PIC values (0.03 to 0.70) for a set of 60 carioca common beans, suggesting that this estimator is strongly influenced by the number and diversity of the genotypes under evaluation. The boxplot chart (Figure 1) revealed that 10 CV% was obtained for approximately 33 markers, indicating that the number of microsatellites used in this study was sufficient to explain the genetic diversity content with good genome coverage. The number of markers is an important parameter to be considered in genetic diversity studies. Clustering analyses, which use a pairwise diversity matrix as input, require that the number of markers accurately estimates the diversity values. In the SSR diversity studies of cultivated genotypes, the number of markers varied considerably. In common beans, the number of SSRs that were used to evaluate the genetic diversity within core collections ranged from 36 (Blair ) to 58 markers (McClean ).

Figure 1

Boxplot graph obtained by Bootstrap analysis of the data generated by genotyping 500 common bean accessions with 58 microsatellites.

The UPGMA dendrogram generated for the base collection revealed several groups, structured mostly in accordance to the grain morphology and genotype origin (Figure S2). To better understand the genetic organization of the 500 genotypes, Structure analyses were performed and found that the most appropriate number of groups (K) was 15 according to Evanno . Comparison of the clustering pattern determined by Structure with the UPGMA dendrogram indicated a strong correlation between the groups resolved in both analyses (Figure S3). The organization pattern of groups was inferred from the breeding institution (Groups 2, 3, 4, 6, 7, 9, 11, 12, 14, and 15). In fact, there are examples of crop species where breeding selection had resulted in domesticated populations displaying higher interpopulation differentiation than that by the wild populations (Doebley, 1989). This phenomenon and subsequent admixture (including crossing between cultivars) may maintain a high level of genetic diversity in breeding populations of domesticated species (Hernandez-Verdugo ). Perseguini reported that carioca tegument genotypes clustered according to their breeding program. Such tendency may be attributed to a different artificial selection pressure in each breeding program that may render genetic differentiation. There is evidence that selection can be detected from patterns of polymorphism, and these signatures of artificial selection acting on alleles may be captured starting with p < 0.2 with reasonably high probability (Innan and Kim, 2004).

Analysis of genetic diversity of core collection

After evaluating the genetic structure of the base collection, we reduced the number of genotypes to form a core collection suitable for associative mapping purposes. The reduction was performed to remove possible redundant genotypes. Therefore, 36% reduction in the number of individuals in each group was performed in the base collection to obtain a representative core subset. The choice of the most appropriate method for determining the core collections for association studies is an open issue requiring further investigation. To compare the performance of current state-of-the-art methods used to construct core subsets suitable for associative mapping of cultivated olive (Olea europaea L.), El Bakkali found that a sample size of 94 entries captures the total diversity and is suitable for field assessments with many replicates for association mapping. Linkage disequilibrium observed in this study was mainly explained by a genetic structure effect estimated by Structure analyses. In our study, the Bayesian method performed by Structure proved especially efficient for developing a core collection that can capture the allele diversity from a broad, diverse Brazilian germplasm collection, which comprises accessions with different agronomic features, such as disease resistance (anthracnose, angular leaf spot, and Fusarium wilt) and drought tolerance. Study of the genetic structure of 279 common bean genotypes, by using 67 microsatellite markers and four sequence characterized amplified regions (SCARs) by Burle , supported the efficiency of the Bayesian approach for germplasm analysis of genetic diversity and population structure. The strategy used to establish the core collection (Table S2) in this study resembles the approach by Blair . Similar to a core collection formation that is generated by selecting a small percentage of the base collection to represent most of the total genetic variation with a minimum of redundancy (Oliveira ), the accessions chosen to integrate the diversity panel should also preserve as much of genetic variability as possible. Therefore, to ensure the adequacy and usefulness of the chosen accessions for associative mapping, it is necessary to assess whether the characteristics and variability of the base collection have been maintained. Similarly to the base collection, the number of alleles present in the core collection varied between 2 and 10 alleles for the genomic-SSRs and from 2 to 12 alleles for the EST-SSRs. The average number of alleles per locus was slightly reduced (from 3.73 to 3.66 and from 3.35 to 3.26 for genomic-SSRs and EST-SSRs, respectively) suggesting that the allele richness was preserved in the reduced sample. The highest PIC and DP values were 0.87 and 0.96 for SSR-IAC66 and 0.86 and 0.97 for EST-SSR PvM21, respectively, indicating a high discriminatory power of these markers (Table 1). McClean evaluated a common bean core collection using 58 SSRs, and showed that the number of alleles varied between 2 and 8 alleles per locus. Blair evaluated 604 genotypes from the CIAT germplasm collection and reported PIC values ranging from 0.007 to 0.97. The number of alleles per locus and PIC in our core collection were in agreement with those in previous studies. The core collection dendrogram divided the accessions into clusters similar to those observed in the base dendrogram. The genetic distances varied at a similar magnitude from 0.13 to 0.88 (Figure S4), suggesting that the genetic variability was maintained and was still quite extensive within the core subset. The best K value obtained by the Bayesian analysis (Figure 2) divided the core accessions into four different groups (Figure 3), congruent with the Andean and Mesoamerican gene pools and the breeding program institution from which they were derived. Some accessions were grouped by grain size.

Figure 2

Graphical representation of the optimal number of groups in the program Structure inferred using the criterion of Evanno . The analysis was based on data obtained from 58 microsatellite loci in core collection evaluated for genetic diversity.

Figure 3

Representation of the core collection according to the Bayesian analysis of the program Structure. The accessions evaluated were divided into four groups (K = 4). The names of the genotypes are given in Table S2 (The numbers correspond to the names of the genotypes). The red color corresponds to Groups 1, color Green corresponds to Group 2, color Blue corresponds to Group 3 and color Yellow corresponds to Group 4.

Group 1 of the Structure analysis (Table 2, Figure 3) was composed predominantly of Andean large-seeded genotypes directed for export driven by market demand, such as Feijão Suíço, Chileno/Branco, Branco Argentino, Amendoim, Bagajo, Jalo, and Jalo-110. Another feature observed in this group was the reddish color of the tegument that characterizes the Red Kidney and Vermelhinho cultivars and most of the lines derived from the CAL-143 x IAC-UNA (C x U) and IAC-UNA x CAL-143 (U x C) crosses used for the UC map (Campos ; Oblessuc , 2014).

Table 2

The 180 accessions clustered into the four groups generated by the Structure analysis and their respective traits.

Structure colors	Group	Accessions	Principals characteristics
Red	1	FeijãoSuíço, Chileno/Branco, Vermelhinho, Bagajo, Jalo-110, Jalo, Amendoim, Gen05C 6-4-5-1-2, BrancoArgentino, UxC-1.1, UxC-2.20, UxC-1.2, UxC-1.19, UxC-1.5, UxC-3. 9, UxC-4.17, UxC-9.2, UxC-9.16, CxU-1.3, CxU-1.5, CxU-1.7, CxU-1.19, CxU-2.11, C xU-2.16, CxU-7.8, UxC-1.8, UxC-1.10, UxC-6.13, UxC-2.18, UxC-3.3, CAL-143, Red Kidney	Grain size (typically Andean)
Green	2	Flor de Mayo, 2-Mar, Michelite, DOR-390, DOR-391, DOR-476, Turrialba-1, AND-279, RAZ-56, RAZ-49, Carioca Comum, Carioca Lustroso, Carioca MG, Carioca Precoce, H96A28-P4-1-1-1-1, H96A102-1-1-152, H96A31-P2-1-1-1-1, IAC-Alvorada, I AC-Apuã, IAC-AYSÓ, IAC-Carioca, IAC-Carioca Akytã, IAC-Carioca Aruã, IAC-Carioca Pyatã, IAC-Carioca Tybatã, IAC-Votuporanga, IAC-Ybaté, IAPAR-81, IAPAR-31, Pérola, Gen05C5-2-5-1-2, Gen05C5-2-10-1-1, Gen05C6-3-5-2-1, Gen05C6-5-2-2-1, Gen05C6-5-7-1-2, Gen05C7-4-1-1-1, VAX1, A0774, BAT447, SEA-5, IAC-U NA, Sanilac, FEB-176, FEB-177, J/39-2-3-1, J/61-5-3-1, J/43-5-1, J/43-1-1-1, J/39-1-3-2, M/100-4-3-1, F/19-6, F/19-3-1, E/20-2-1, D/15-3-1, C/11-2-2, (1108xHarmonia)x(11 08xBoreal/Brese), 29/24-6-1-1, 22/16-1-3-2	Most have ‘carioca grain type’ with great importance to IAC and IAPAR breeding programs
Blue	3	TO, Gen05P3-1-6-1, Gen05P4-2-6-2, Gen05P5-3-8-1, Gen05P5-3-8-2, Gen05P5-4-8-2, Gen05Pr11-1-2-2, Gen05Pr11-1-7-1, Gen05Pr11-2-3-1, Gen05Pr11-2-13-1, Gen05Pr11-2-14-2, Gen05Pr11-3-5-1, Gen05Pr11-6-5-1, Gen05Pr11-6-12-2, Gen05PR12-2-5-1-2, Gen05PR12-2-2-1-1, Gen05PR12-2-4-1-2, Gen05PR13-1-8-1-2, Gen05PR13-1-8-1-1, G en05PR13-1-6-1-2, Gen05PR13-2-2-1-2, Gen05PR13-2-1-1-2, Gen05C1-3-2-1-1, Gen0 5C1-3-3-1-1, Gen05C2-1-1-2-1, Gen05C2-1-6-1-1, Gen05C2-1-1-1-3, Gen05C2-1-1-1-1, Gen05C3-2-4-1-1, Gen05C3-2-4-1-7, Gen05C4-3-1-1-2, Gen05C4-3-1-1-1, Gen05C4 -4-3-1-2, Gen05C4-6-2-1-2, Gen05C5-1-2-2-2, Gen05C5-1-2-1-1	Recent crossings performed at the IAC breeding program
Yellow	4	Frijol Negro, ECU-311, México-115, Baetão (30273), Preto-208, Preto-184, Honduras-32, Guatemala-479, Jamapa (CNF-1671), Mulatinho (VP-102), Tupi, Rosinha G2, Preto do Pocrone, Porrillo-1, México-498, Small White 59 Preto, Perry Marron, Mortiño, Rosado-13, Porrillo Sintético, Puebla-152 (CIAT), ARA-1, Caeté (preta), IAC-Maravilha, FEB179, Jamapa (CIAT), Puebla-152 (CNF-1807), EMP-81, ARC-3, ARC-4, LP-90-91R.Bac., EMP-407, FEB180, Oito e Nove, Alemão, Bat-93, Pinto-114, G2333, PI-165426, RAZ-55, Batista Brilhante (CB), 82 PVBZ-1783, A-449, Aporé, Branquinho, BRS-Cometa, BRS-Horizonte, BRS-Pontal, B RS-Requinte, BRSMG-Talismã, CampeãoII, Caneludo, Gen05C7-3-2-2-2, J/54-5-1	Most are from CIAT and EMBRAPA

The accessions clustered in the remaining three groups (2, 3, and 4; Table 2, Figure 3) had smaller seeds, they were of the Mesoamerican type, and were distributed according to the breeding institution. The genotypes allocated to group 2 showed carioca grain tegument with economic importance in the Brazilian market and had been extensively exploited by the IAC and the IAPAR (Agronomic Institute of Paraná, Brazil) breeding programs until the late 1990s, when common bean improvement in Brazil moved toward the development of cultivars that were more resistant to biotic and abiotic stress. Group 3 (Table 2, Figure 3) included genotypes obtained from recent crosses conducted by the Agronomic Institute between 2000 and 2007, which were designed for the introgression of resistance genes to major diseases for carioca and black tegument cultivars. It was possible to observe changes in the genetic basis of these accessions compared to those clustered in groups 2 and 4, as the IAC breeding program has begun to focus on the maintenance of tegument and grain features in these cultivars as its main goal, in addition to high grain yield and nutritional quality. The uppermost hierarchical level of the population structure that was identified using the ΔK (Evanno ) suggested that the 180 genotypes were divided into four groups; however, when K = 2 was considered (Figure S5), the samples were divided into two main genetic groups. A shared profile of alleles between the Andean and Mesoamerican genotypes was observed, most likely because some of the genotypes present in both parental crosses have both Andean and Mesoamerican origin (Figure S5). This mix is a result of the breeding process of common bean adopted by the institutions in Brazil. The two main clusters observed with the Structure analysis reflect our previous knowledge of the occurrence of two major wild gene pools of P. vulgaris (Blair ; Rossi ). Morphological and molecular markers showed that derived landraces are also generally organized into two gene pools and contain a subset of the wild-type genetic diversity (Gepts and Bliss 1986; Gepts ,b; Beebe ; Debouck ; McClean ; McClean and Lee 2007). AMOVA between the base and the core collection found only 2.75% change from the base collection to the core collection, but 97.2% of variation within each collection; in other words, most of the genetic variability of the base collection was retained in the core collection (Table 3).

Table 3

Analysis of variance considering the base collection of 500 accessions and the core collection containing 180 accessions (Group 1 - Base collection and Group 2 -Core Collection).

Sources of variation	Sum of squares	Variance components	Percentage variation	p-value	Average F-Statistics over all loci Fixation IndicesFst
Among Groups	213.355	0.40356	2.75377	< 0.001
Within Group 1 and Group 2	18416.941	14.25113	97.24623	< 0.001
Total	18630.296	14.65469			0.02754

According to the GDA analyses, the average expected heterozygosity (He) and observed heterozygosity (Ho) in the base collection were both 0.031, and in the core collection, they were equal to 0.034. The frequency of private alleles in the two collections indicated that there was no loss of genetic variability with the reduction of the base collection (500 genotypes) to the core collection (180 genotypes); however, it is worthy of note that in the base collection, three private alleles were found in loci PVM40, PVM73, and SSR-IAC181, whereas in the core collection, two private alleles were found in loci PVM04 and PVM40, and additionally, private allele PVM40 was preserved. Brown (1989a) proposed that a core collection should contain about 10% of the base collection. This sampling procedure should conserve about 0.80% of the alleles that occur in the base collection. Miklas reported that a sample size of 10% is adequate to represent the genetic diversity of a base collection in common beans. The AMOVA and GDA results demonstrated that the methodology used to establish the core collection was appropriate because it maintained the genetic diversity present in the base collection. The core collection for association mapping should include samples of mixed and/or admixed individuals from the most different genetic backgrounds. The presence of several genetic origins within the panels in different and unknown proportions induces linkage disequilibrium between unlinked loci and may increase the rate of false positives that are statistically associated with the analyzed trait without actually being causally involved in its phenotypic variation (Mezmouk ). For proper use of genetic resources of a germplasm bank, it is essential to know the genetic diversity among the available accessions. The knowledge of genetic diversity also allows selection of the appropriate genotype and selection methods, depending on the available resources and genetic distance between recombinant genotypes and according to the objectives of the breeding program (Singh, 2001). This study represents an efficient approach in developing a core collection suitable for association mapping studies by proper sampling of the core collection entries and assessment of the structure and relatedness within the samples. It is important to remark that the 180 selected genotypes are highly variable for important agronomic traits such as resistance to important common bean diseases (anthracnose, angular leaf spot, and bacterial blight) and drought tolerance. The proposed core collection should be periodically updated by including additional common bean germplasm in the base collection and adding novel molecular markers such as SNPs. At the current state, the developed core collection will be useful for conducting field assessments, and it is suitable for developing a long-term strategy for genome-wide association studies in common beans.

26 in total

1. Association mapping in structured populations.

Authors: J K Pritchard; M Stephens; N A Rosenberg; P Donnelly
Journal: Am J Hum Genet Date: 2000-05-26 Impact factor: 11.025

2. Inference of population structure using multilocus genotype data.

Authors: J K Pritchard; M Stephens; P Donnelly
Journal: Genetics Date: 2000-06 Impact factor: 4.562

3. Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data.

Authors: Elena Bitocchi; Laura Nanni; Elisa Bellucci; Monica Rossi; Alessandro Giardini; Pierluigi Spagnoletti Zeuli; Giuseppina Logozzo; Jens Stougaard; Phillip McClean; Giovanna Attene; Roberto Papa
Journal: Proc Natl Acad Sci U S A Date: 2012-03-05 Impact factor: 11.205

4. QTL analysis of yield traits in an advanced backcross population derived from a cultivated Andean x wild common bean (Phaseolus vulgaris L.) cross.

Authors: M W Blair; G Iriarte; S Beebe
Journal: Theor Appl Genet Date: 2006-01-24 Impact factor: 5.699

5. Genetic architecture of chalcone isomerase non-coding regions in common bean (Phaseolus vulgaris L.).

Authors: Phillip E McClean; Rian K Lee
Journal: Genome Date: 2007-02 Impact factor: 2.166

6. Evaluation of five strategies for obtaining a core subset from a large genetic resource collection of durum wheat.

Authors: P L Zeuli; C O Qualset
Journal: Theor Appl Genet Date: 1993-11 Impact factor: 5.699

Review 7. Association mapping: critical considerations shift from genotyping to experimental design.

Authors: Sean Myles; Jason Peiffer; Patrick J Brown; Elhan S Ersoz; Zhiwu Zhang; Denise E Costich; Edward S Buckler
Journal: Plant Cell Date: 2009-08-04 Impact factor: 11.277

8. Effect of population structure corrections on the results of association mapping tests in complex maize diversity panels.

Authors: Sofiane Mezmouk; Pierre Dubreuil; Mickaël Bosio; Laurent Décousset; Alain Charcosset; Sébastien Praud; Brigitte Mangin
Journal: Theor Appl Genet Date: 2011-01-11 Impact factor: 5.699

9. Genetic diversity in cultivated carioca common beans based on molecular marker analysis.

Authors: Juliana Morini Küpper Cardoso Perseguini; Alisson Fernando Chioratto; Maria Imaculada Zucchi; Carlos Augusto Colombo; Sérgio Augusto Moraes Carbonell; Jorge Mauricio Costa Mondego; Rodrigo Gazaffi; Antonio Augusto Franco Garcia; Tatiana de Campos; Anete Pereira de Souza; Luciana Benchimol Rubiano
Journal: Genet Mol Biol Date: 2011-03-01 Impact factor: 1.771

10. Construction of core collections suitable for association mapping to optimize use of Mediterranean olive (Olea europaea L.) genetic resources.

Authors: Ahmed El Bakkali; Hicham Haouane; Abdelmajid Moukhli; Evelyne Costes; Patrick Van Damme; Bouchaib Khadari
Journal: PLoS One Date: 2013-05-07 Impact factor: 3.240

8 in total

1. Marker association study of yield attributing traits in common bean (Phaseolus vulgaris L.).

Authors: Nancy Gupta; Sajad Majeed Zargar; Ravinder Singh; Muslima Nazir; Reetika Mahajan; R K Salgotra
Journal: Mol Biol Rep Date: 2020-08-27 Impact factor: 2.316

2. Diversification and genetic structure of the western-to-eastern progression of European Phaseolus vulgaris L. germplasm.

Authors: Barbara Pipan; Vladimir Meglič
Journal: BMC Plant Biol Date: 2019-10-23 Impact factor: 4.215

3. Population structure, genetic diversity and genomic selection signatures among a Brazilian common bean germplasm.

Authors: Jessica Delfini; Vânia Moda-Cirino; José Dos Santos Neto; Paulo Maurício Ruas; Gustavo César Sant'Ana; Paul Gepts; Leandro Simões Azeredo Gonçalves
Journal: Sci Rep Date: 2021-02-03 Impact factor: 4.379

4. Designing a Mini-Core Collection Effectively Representing 3004 Diverse Rice Accessions.

Authors: Angad Kumar; Shivendra Kumar; Kajol B M Singh; Manoj Prasad; Jitendra K Thakur
Journal: Plant Commun Date: 2020-04-24

5. Genome-wide association studies dissect the genetic architecture of seed shape and size in common bean.

Authors: Willian Giordani; Henrique Castro Gama; Alisson Fernando Chiorato; Antonio Augusto Franco Garcia; Maria Lucia Carneiro Vieira
Journal: G3 (Bethesda) Date: 2022-04-04 Impact factor: 3.154

6. Genome-Wide Association Studies of Anthracnose and Angular Leaf Spot Resistance in Common Bean (Phaseolus vulgaris L.).

Authors: Juliana Morini Küpper Cardoso Perseguini; Paula Rodrigues Oblessuc; João Ricardo Bachega Feijó Rosa; Kleber Alves Gomes; Alisson Fernando Chiorato; Sérgio Augusto Morais Carbonell; Antonio Augusto Franco Garcia; Rosana Pereira Vianello; Luciana Lasry Benchimol-Reis
Journal: PLoS One Date: 2016-03-01 Impact factor: 3.240

7. Genetic Diversity, Population Structure, and Andean Introgression in Brazilian Common Bean Cultivars after Half a Century of Genetic Breeding.

Authors: Caléo Panhoca de Almeida; Jean Fausto de Carvalho Paulino; Sérgio Augusto Morais Carbonell; Alisson Fernando Chiorato; Qijian Song; Valerio Di Vittori; Monica Rodriguez; Roberto Papa; Luciana Lasry Benchimol-Reis
Journal: Genes (Basel) Date: 2020-10-30 Impact factor: 4.096

8. Genome-wide transcriptional changes triggered by water deficit on a drought-tolerant common bean cultivar.

Authors: Josefat Gregorio Jorge; Miguel Angel Villalobos-López; Karen Lizeth Chavarría-Alvarado; Selma Ríos-Meléndez; Melina López-Meyer; Analilia Arroyo-Becerra
Journal: BMC Plant Biol Date: 2020-11-17 Impact factor: 4.215

8 in total