Literature DB >> 32140235

QTL mapping and identification of SNP-haplotypes affecting yield components of Theobroma cacao L.

Luciel Dos Santos Fernandes1, Fábio M Correa2, Keith T Ingram3, Alex-Alan Furtado de Almeida2, Stefan Royaert3.   

Abstract

Cacao is a crop of global relevance that faces constant demands for improved bean yield. However, little is known about the genomic regions controlling the crop yield and genes involved in cacao bean filling. Hence, to identify the quantitative trait loci (QTL) associated with cacao yield and bean filling, we performed a QTL mapping in a segregating mapping population comprising 459 trees of a cross between 'TSH 1188' and 'CCN 51'. All variables showed considerable phenotypic variation and had moderate to high heritability values. We identified 24 QTLs using a genetic linkage map that contains 3526 single nucleotide polymorphism (SNP) markers. Haplotype analysis at the significant QTL region on chromosome IV pointed to the alleles from the maternal parent, 'TSH 1188', as the ones that affect the cacao yield components the most. The recombination events identified within these QTL regions allowed us to identify candidate genes that may take part in the different steps of pod growth and bean filling. Such candidate genes seem to play a significant role in the source-to-sink transport of sugars and amino acids, and lipid metabolism, such as fatty acid production. The SNP markers mapped in our study are now being used to select potential high-yielding cacao varieties through marker-assisted selection in our existing cacao-breeding experiments.
© The Author(s) 2020.

Entities:  

Keywords:  Genetic markers; Plant breeding; Plant physiology

Year:  2020        PMID: 32140235      PMCID: PMC7049306          DOI: 10.1038/s41438-020-0250-3

Source DB:  PubMed          Journal:  Hortic Res        ISSN: 2052-7276            Impact factor:   6.793


Introduction

Breeding programs for many crops focus on the improvement of crop yield components. Cacao yield components refer to the tree’s organs that are harvested and converted into final crop production, such as the total number of healthy pods, the fresh and dry bean weight, and the final yield. Besides resistance to biotic and abiotic stress and bean quality, yield (measured as dry bean weight in kilograms per hectare (kg/ha)) is also the most economically important crop trait, and it depends on a complex interaction between different factors that include genetics, environment, crop management, and growth and development processes[1]. An essential step toward the selection of uniformly high-yielding cacao varieties is the identification of genomic regions and genes that control those yield components. A primary objective is to identify reliable molecular markers flanking the regions that control such traits. To date, few molecular markers have been associated with cacao yield components, such as pod traits and tree vigor[2], and bean size and bean weight[3], however, these studies did not report any candidate genes that control these traits. The association of candidate gene models with the genome-wide association and the mapping of QTL became possible after the publication of two cacao genomes, the Criollo genome[4] and the Amelonado genome[5]. The expectation is to integrate the information from QTL regions to identify gene sequences controlling the phenotypic variation of traits of interest. The genes that underpin many yield components exert regulatory control over biomass production and accumulation by acting in metabolic pathways that supply storage forms of nitrogen[6], synthesize and transport carbon reserve compounds[7,8] and lipids[9] required during crop yield formation. Little is known about the effects of genes that are involved in the transport of carbohydrates and in the synthesis of lipids during cacao bean filling. These genes may act in crucial stages of the reproductive and growth phases, such as the initial fertilization, pod set and further maintenance of pods, bean filling and seed germination. Just as they might be involved in the defense mechanisms against pathogens[10], such genes might also be crucial to the processes involved in photoassimilate fluxes from source-to-sink organs. Although the physiological mechanisms have been explored in other crops, the genomic regions controlling those mechanisms were not studied in T. cacao. To date, SNP-based QTL mapping of genomic regions associated with yield components has not been reported for T. cacao. Therefore, focusing on QTL mapping and on the identification of candidate genes affecting cacao yield components will provide an initial framework to understand the cacao yield formation. Moreover, this approach will allow us to investigate the inherent patterns of bean filling of each cacao genotype, and the identification of the regions that control such filling patterns is thereby an essential step in the direction of improving cacao yield. The main objectives of this study were (1) to map QTL regions associated with cacao yield components (number of pods, pod index, dry bean weight and yield), (2) to identify candidate genes that have a higher probability of affecting the phenotypic variation of those traits, and (3) to provide reliable SNP markers to support the selection of high-yielding cacao genotypes via marker-assisted selection. In addition, we discuss the effect of significant recombination events affecting the phenotypic variation of the yield components and the candidate genes involved in the synthesis and the source-to-sink transport and accumulation of carbohydrates and amino acids, as well as in the breakdown of lipids to produce fatty acids in the cacao beans.

Results

Phenotypic correlations and multivariate analysis of cacao yield components

We computed the Spearman correlation among variables evaluated for 459 trees of MP01. Figure 1 shows the correlation coefficients (r) estimated for each pair of variables evaluated, as well as their relationships (scatterplots) and frequency distributions. Correlations for all variables evaluated were significant (p-value < 0.001), those among the strong correlations were between NPH and NHPH (r = 0.85), NPH and Yield (r = 0.66), NHPH and Yield (r = 0.64), and between DBW and PI (r = −0.64). Table 1 includes a summary of the phenotypic data and broad-sense heritability for the variables evaluated. In general, the MP01 population has high broad-sense heritability for cacao yield components. The pod-related variables, NPH and NHPH, had the highest heritability values of 0.73 and 0.75, respectively. Heritability values for PI and DBW were 0.62 and 0.56, respectively, while the average yield had a low heritability of 0.10.
Fig. 1

Spearman rank correlations (upper right), scatterplots (lower left), and histograms (diagonal) and for the yield components evaluated in the MP01 mapping population.

Significance levels are ***p < 0.001; **p < 0.01; *p < 0.05. NPH is the total number of pods harvested; NHPH is the total number of healthy pods harvested; DBW is the dry bean weight of a single bean; PI is the pod index; Yield is the yield average

Table 1

Description, summary statistics, and broad-sense heritability of cacao yield components evaluated in the MP01 mapping population

VariablesDescriptionUnitMeanSDMinMaxH2
NPHTotal number of pods harvestednumber/tree3.083.260400.73
NHPHNumber of healthy pods harvestednumber/tree1.772.410300.75
DBWDry bean weight of a single beang1.470.350.373.260.56
PIPod indexnumber/tree21.2512.4362410.62
YieldYield average (dry bean weight)kg/ha144.83143.550.001635.700.10

Broad-sense heritability (H2), and SD stands for standard deviation

Spearman rank correlations (upper right), scatterplots (lower left), and histograms (diagonal) and for the yield components evaluated in the MP01 mapping population.

Significance levels are ***p < 0.001; **p < 0.01; *p < 0.05. NPH is the total number of pods harvested; NHPH is the total number of healthy pods harvested; DBW is the dry bean weight of a single bean; PI is the pod index; Yield is the yield average Description, summary statistics, and broad-sense heritability of cacao yield components evaluated in the MP01 mapping population Broad-sense heritability (H2), and SD stands for standard deviation We present the amount of variation retained (eigenvalues) by the first two principal components for the MP01 individuals and yield components variables (Fig. 2, Supplementary Table 1 and S2). The first two principal components (Dim.1 and Dim.2) accounted for 86.91% of the cumulative phenotypic variation, in which the first principal component (Dim.1) retained 54.4%, while the second (Dim.2) retained 32.5% (Fig. 2). Together with the third component (Dim.3 = 7.8%), the principal components explained 94.77% of the cumulative phenotypic variations (Supplementary Table S3). The first principal component showed a contrast between DBW and the variables NPH, NHPH, PI, and Yield, which indicates a regulation of partitioning of assimilated carbon because of the increased number of harvestable organs at the plant level. The variables that positively contributed in the first principal component were NPH, NHPH and Yield (Supplementary Table 2), which are the main variables contributing in the phenotypic variation of the individuals and yield components data (Fig. 2). In the second principal component, the variables DBW and PI showed the highest and most contrasting eigenvalues, 0.81 and −0.83, respectively (Supplementary Table 2).
Fig. 2

Biplot of the first and second principal components from phenotypic yield data from the MP01 mapping population.

NPH = number of total pods harvested, Yield = average yield, DBW = dry bean weight of a single bean, and PI = pod index. Arrow length and direction indicate the association with a particular component, while the clustering of vectors indicates the correlation among variables

Biplot of the first and second principal components from phenotypic yield data from the MP01 mapping population.

NPH = number of total pods harvested, Yield = average yield, DBW = dry bean weight of a single bean, and PI = pod index. Arrow length and direction indicate the association with a particular component, while the clustering of vectors indicates the correlation among variables

Detection of QTLs associated with cacao yield components

The QTL intervals flanked by SNP markers with LOD scores higher than 3.1 were considered as significant. We used the genetic linkage map from the MP01 population (‘TSH 1188’ × ‘CCN 51’), published in 2016[11]. Figure 3 provides the genomic positions and Table 2 the percentage of phenotypic variation explained by each region flanked by the SNP markers with significant LOD scores. In total, we mapped 24 QTL positions across eight different chromosomes, except for the chromosomes VII and X.
Fig. 3

Positions of QTLs identified for the yield components identified in the mapping population MP01 (‘TSH 1188’ × ‘CCN 51’).

On the top of each panel are the chromosome numbers, and the x-axis is showing the positions of the SNP markers in centimorgans (cM). The y-axis is presenting the logarithm of the odds (LOD) scores from the interval mapping analysis. The panels are presented as following: DBW (a), NPH (b), NHPH (c), Yield (d) and PI (e). The dotted line represents the LOD score threshold of 3.1 (p-values < 0.05)

Table 2

QTLs identified for the total number of pods harvested (NPH), the number of healthy pods harvested (NHPH), dry bean weight (DBW), pod index (PI), and average yield

VariableChr.ParentcMLOD% Expl.TSH 1188CCN 51
T1T2C1C2
NPHTcm001s01308520IP25.384.993.8CTCT
Tcm004s00615809IVP13.2923.519.7AGAG
NHPHTcm001s01308520IP15.384.213.3CTCT
Tcm004s00615809IVP23.2928.0223.6AGAG
DBWTcm001s18406546IP158.048.212.7TCTC
Tcm002s07453904IIP141.223.581.2GAGA
Tcm002s15300388IIP253.653.281.1TCTC
Tcm003s00621672IIIP23.27.182.4CAAC
Tcm003s27977985IIIP257.544.21.4AGGA
Tcm003s30366194IIIP168.3416.65.7CTCT
Tcm004s00289192IVP10.1137.5914.6AGAG
Tcm004s30466731IVP166.5710.663.6CTCT
Tcm005s29846151VP158.566.152GTGT
Tcm005s33377306VP272.3511.413.8TGTG
Tcm005s38257511VP294.524.241.4AGAG
Tcm006s20470604VIP233.8612.334.2CACA
Tcm006s22739149VIP147.269.273.1ACAC
Tcm008s04113686VIIIP227.4123.688.5CTCT
Tcm009s03430458IXP121.718.592.9CTCT
Tcm009s39845182IXP1101.354.351.4TCTC
PITcm001s18406546IP158.044.752.8TCTC
Tcm002s07453904IIP141.224.252.4GAGA
Tcm002s23708704IIP257.949.235.4GAGA
Tcm003s00621672IIIP23.23.251.4CAAC
Tcm003s29414854IIIP164.595.162.2CTCT
Tcm004s01127580IVP15.167.14.1GAAG
Tcm005s24838905VP144.454.182.4GAAG
Tcm006s22739149VIP147.268.815.2ACAC
Tcm008s00170802VIIIP20.114.42.5TCTC
Tcm009s39845182IXP1101.353.413TCTC
YieldTcm003s29414854IIIP164.593.113.4CTCT
Tcm004s00615809IVP13.37.977.2AGAG
Tcm009s36667972IXP285.13.43CTCT

Markers with the highest LOD scores for each trait are indicated in bold. P1 and P2 are referring to which parent the QTL is from, ‘TSH 1188’ and ‘CCN 51’, respectively, according to the MQM mapping

Positions of QTLs identified for the yield components identified in the mapping population MP01 (‘TSH 1188’ × ‘CCN 51’).

On the top of each panel are the chromosome numbers, and the x-axis is showing the positions of the SNP markers in centimorgans (cM). The y-axis is presenting the logarithm of the odds (LOD) scores from the interval mapping analysis. The panels are presented as following: DBW (a), NPH (b), NHPH (c), Yield (d) and PI (e). The dotted line represents the LOD score threshold of 3.1 (p-values < 0.05) QTLs identified for the total number of pods harvested (NPH), the number of healthy pods harvested (NHPH), dry bean weight (DBW), pod index (PI), and average yield Markers with the highest LOD scores for each trait are indicated in bold. P1 and P2 are referring to which parent the QTL is from, ‘TSH 1188’ and ‘CCN 51’, respectively, according to the MQM mapping Of those QTLs, we highlight a region at the top of chromosome IV that was flanked by three significant SNP markers; they are Tcm004s00289192, Tcm004s00615809, and Tcm004s01127580. Their LOD values ranged from 7.10 to 37.59, while the percentage of phenotypic variation ranged from 4.10 to 23.60%.

Analysis of haplotype–phenotype associations

The haplotypes from the mother ‘TSH 1188’ are represented with “T”, and with “C” for the father, ‘CCN 51’. For each trait-marker association, we performed the variance analysis on the means of the four parental haplotype combinations (T1C1, T1C2, T1C2, and T2C2), which showed different levels of significance (p < 0.001, p < 0.01 and p < 0.05) (Table 3). There was no statistically significant difference between the haplotype groups for the markers Tcm003s29414854 for PI, Tcm005s38257511, and Tcm009s03430458 for DBW. Tukey’s test showed that the haplotype T2 on chromosome IV (Tcm004s00615809, p < 0.001) had a significant effect on increase the total number of pods harvested (NPH and NHPH). For variable DBW, the haplotype T1 showed a significant effect on also chromosome IV (Tcm004s00289192 and Tcm004s30466731, p < 0.001), while T2 was significant for the SNP markers Tcm001s18406546, Tcm003s30366194, and Tcm006s22739149 (p < 0.001). For the SNP markers Tcm002s07453904, Tcm002s15300388, and Tcm008s04113686, the haplotype C1 has a significant effect in increasing DBW (p < 0.001). For the other markers, Tukey’s test did not show a clear distinction between groups.
Table 3

Tukey post hoc test for multiple comparisons to identify the difference in the effect of the parental haplotype combinations on the cacao yield components

VariableSNP nameMeanTukey groupsP-value
T1:C1T1:C2T2:C1T2:C2T1:C1T1:C2T2:C1T2:C2
NPHTcm001s013085207.395.966.684.79aabab<0.01
Tcm004s006158094.403.828.767.87bbaa<0.001
NHPHTcm001s013085204.833.904.363.26aabab<0.01
Tcm004s006158092.932.635.525.33bbaa<0.001
DBWTcm001s184065461.451.381.521.56bccaba<0.001
Tcm002s074539041.571.461.511.41abcabc<0.001
Tcm002s153003881.581.451.521.40abcabc<0.001
Tcm003s006216721.511.501.551.37aaab<0.001
Tcm003s279779851.411.471.531.54babaa<0.01
Tcm003s303661941.401.451.531.56cbcaba<0.001
Tcm004s002891921.591.611.381.35aabb<0.001
Tcm004s304667311.551.561.431.42aabb<0.001
Tcm005s298461511.491.561.421.48ababab<0.01
Tcm005s333773061.481.551.431.48ababab<0.001
Tcm005s382575111.551.491.451.48aabbab0.073
Tcm006s204706041.501.371.551.52abaa<0.001
Tcm006s227391491.391.451.501.61cbcba<0.001
Tcm008s041136861.401.541.441.59baba<0.001
Tcm009s034304581.531.511.461.45aaaa0.062
Tcm009s398451821.401.521.511.53baaa<0.001
PITcm001s1840654621.5320.4919.4618.62aabbb<0.01
Tcm002s0745390418.1419.9819.4421.97bbba<0.001
Tcm002s2370870417.6819.9219.2122.87cbbca<0.001
Tcm003s0062167219.6420.5318.2821.29ababa<0.01
Tcm003s2941485420.7420.2419.2419.35aaaa0.188
Tcm004s0112758019.2418.4021.1321.14abbaa<0.001
Tcm005s2483890518.9518.7520.7821.27bccaba<0.01
Tcm006s2273914921.1320.9019.0918.50aabbcc<0.01
Tcm008s0017080221.1919.1920.5218.67aababb<0.01
Tcm009s3984518221.3419.3418.9319.85aabbab<0.05
YieldTcm003s29414854235.30246.62322.09308.08babaa<0.01
Tcm004s00615809222.86195.62361.34337.28bbaa<0.001
Tcm009s36667972326.57243.07287.49238.24ababb<0.01
Tukey post hoc test for multiple comparisons to identify the difference in the effect of the parental haplotype combinations on the cacao yield components The markers from the mother are also the ones that affect the pod index (PI) for the majority of the QTL regions. T1 was significant for the marker Tcm001s18406546, Tcm006s22739149, Tcm008s00170802, and Tcm009s39845182. The marker with higher LOD score for PI, Tcm002s23708704, the haplotype C2 from the ‘CCN 51’ was associated with an increased pod index. Finally, the T2 for the marker Tcm004s00615809 (p < 0.001) is the one associated with increased yield (Table 3). Although we mapped multiple significant QTLs linked to multi-variables of cacao yield, we focused on the main QTL regions mapped on chromosome IV, flanked by the markers Tcm004s00289192, Tcm004s00615809, and Tcm004s01127580 (for NPH, NHPH, DBW, PI, and Yield), and on chromosome II with the marker Tcm002s23708704 (only for PI). Those QTL regions were selected because the markers on chromosome IV showed a higher LOD score for all variables evaluated, and Tcm002s23708704 showed LOD score of 9.23, the highest for PI. The distance from Tcm004s00289192 to Tcm004s00615809 is 326.6 kbp, while Tcm004s00615809 is 511.8 kbp away from Tcm004s01127580. For analysis of haplotype–phenotype associations, we considered the interval from Tcm004s00289192 to Tcm004s01127580 as a unique haplotype. For that, each variable evaluated was classified into four phenotypical classes, into which we counted the haplotype frequency of each parental haplotype (T1, T2, C1, and C2; Fig. 4) and their combinations (T1:C1, T1:C2, T2:C1, T2:C2, Fig. 5). The alleles from ‘TSH 1188’ (T1 and T2) were the most significantly associated with yield components (p < 0.01, Table 4, Figs. 4 and 5). The frequency of the T1 allele, which corresponds with SNP alleles AAG, increases along with a rise in DBW (Fig. 5a). On the other hand, the haplotype T2 (GGA) was associated with an increase in NPH, and yield (Fig. 5b, c). The same pattern observed for the marker Tcm002s23708704, in which the haplotype T2 (A) is associated with higher PI. Those results reflect the negative correlation between DBW and all the other variables. Therefore, the maternal parent, ‘TSH 1188’, is the one conferring the QTL on those chromosome regions.
Fig. 4

Haplotype frequencies for different phenotypical ranges of cacao yield components.

The haplotype frequencies are presented in panels (a) to (d). The phenotypic ranges of cacao yield components are shown on the x-axis, while the y-axis shows the percentages for each parental haplotype. Panels (a), (b), and (c) are for the main SNP markers on chromosome IV (Tcm004s00289192, Tcm004s00615809, and Tcm004s01127580), mapped for dry bean weight (a), the total number of pods harvested (b), and average yield (c). Panel (d) is for the main SNP marker (Tcm002s23708704) for pod index on chromosome II

Fig. 5

Haplotype combination frequencies for different phenotypical ranges of cacao yield components.

The haplotypes frequencies are presented in panels (a) to (d). The different phenotypical ranges of cacao yield components are shown on the x-axis, while the y-axis shows the percentages of each parental haplotype. Panels (a), (b), and (c) are for the main SNP markers on chromosome IV (Tcm004s00289192, Tcm004s00615809, and Tcm004s01127580), mapped for dry bean weight (a), the total number of pods harvested (b), and average yield (c). Panel (d) is for the main SNP marker (Tcm002s23708704) for pod index on chromosome II

Table 4

Associations between the parental haplotypes (T1, T2, C1, and C2) on chromosomes IV (Tcm004s00289192, Tcm004s00615809, and Tcm004s01127580) and II (Tcm002s23708704) and the relative frequency for each phenotypical class evaluated

VariableChr.SNPallelePhenotypical rangesP-value
1–33–55–77–10
NPHIVAAGT10.320.170.130.007.18E−76
GGAT20.180.330.370.502.13E−21
AAAC10.280.310.300.211.23E−45
GGGC20.220.190.200.291.01E−38
0.5–1.01.0–1.51.5–2.02.0–2.5
DBWIVAAGT10.050.190.350.501.77E−37
GGAT20.450.310.150.001.30E−52
AAAC10.230.290.280.302.22E−42
GGGC20.270.210.220.201.98E−30
22–486486–972972–14581458–1944
YieldIVAAGT10.280.143.34E−126
GGAT20.210.380.440.504.71E−70
AAAC10.290.260.190.503.52E−116
GGGC20.210.210.318.70E−81
11–2323–3535–4747–60
PIIVAAGT10.290.180.250.501.08E−76
GGAT20.210.320.250.003.70E−47
AAAC10.280.300.380.258.57E−64
GGGC20.220.200.130.258.71E−55
11–2323–3535–4747–60
PIIIGT10.290.220.130.003.11E−74
AT20.210.280.380.501.71E−46
GC10.290.150.250.005.20E−81
AC20.210.350.250.501.15E−46

Phenotypical ranges were defined based on the average data for each variable

Haplotype frequencies for different phenotypical ranges of cacao yield components.

The haplotype frequencies are presented in panels (a) to (d). The phenotypic ranges of cacao yield components are shown on the x-axis, while the y-axis shows the percentages for each parental haplotype. Panels (a), (b), and (c) are for the main SNP markers on chromosome IV (Tcm004s00289192, Tcm004s00615809, and Tcm004s01127580), mapped for dry bean weight (a), the total number of pods harvested (b), and average yield (c). Panel (d) is for the main SNP marker (Tcm002s23708704) for pod index on chromosome II

Haplotype combination frequencies for different phenotypical ranges of cacao yield components.

The haplotypes frequencies are presented in panels (a) to (d). The different phenotypical ranges of cacao yield components are shown on the x-axis, while the y-axis shows the percentages of each parental haplotype. Panels (a), (b), and (c) are for the main SNP markers on chromosome IV (Tcm004s00289192, Tcm004s00615809, and Tcm004s01127580), mapped for dry bean weight (a), the total number of pods harvested (b), and average yield (c). Panel (d) is for the main SNP marker (Tcm002s23708704) for pod index on chromosome II Associations between the parental haplotypes (T1, T2, C1, and C2) on chromosomes IV (Tcm004s00289192, Tcm004s00615809, and Tcm004s01127580) and II (Tcm002s23708704) and the relative frequency for each phenotypical class evaluated Phenotypical ranges were defined based on the average data for each variable

Identification of recombinant events and candidate genes

To identify candidate genes, we examined trees exhibiting recombination events for both parental haplotypes within this region and between flanking SNP markers. We identified 33 trees displaying recombination events in at least one parental haplotype (Fig. 6a, b). Of those trees, 19 trees possessed recombination events between the maternal haplotypes (Fig. 6a) in an interval from 196,163 to 1,140,441 bp, while another 14 trees had a recombination event between the paternal haplotypes within the interval from 351,282 to 1,086,667 bp (Fig. 6b). Within this region, we identified recombination events occurring in ten different genomic positions, but with some recombination events occurring at the same spots.
Fig. 6

Haplotype analysis of trees exhibiting recombination events in the main QTL region on chromosome IV.

a Trees showing recombination events between the maternal haplotypes, b trees showing recombination events between the paternal haplotypes. Maternal and paternal haplotypes are shown at the top of each panel. We highlighted the position where the recombination events are occurring. Boxes indicate the locations of the candidate genes at the top of each panel

Haplotype analysis of trees exhibiting recombination events in the main QTL region on chromosome IV.

a Trees showing recombination events between the maternal haplotypes, b trees showing recombination events between the paternal haplotypes. Maternal and paternal haplotypes are shown at the top of each panel. We highlighted the position where the recombination events are occurring. Boxes indicate the locations of the candidate genes at the top of each panel To evaluate the effect of those recombination events, we calculated the average yield and dry bean weight for each haplotype combination and the different recombination events (Fig. 7a, b). The first three SNPs alleles represent the maternal haplotype, separated by colons from the three alleles of the paternal haplotype (e.g., AAG:AAA, maternal and paternal alleles, respectively). We highlighted in bold the position of each recombination event. The MP01 recombinants in which the maternal haplotype switched from T1 to T2 showed a lower average yield (84.33 kg/ha) than the population average (144.85 kg/ha). Those trees possessed the haplotypes AGA:GGG, AGA:AAA, AAA:GGG, and AAA:AAA. The average yield was of 83.49 kg/ha in the other recombinant trees possessing the T1 haplotype but combined with a recombination event on the paternal haplotype. Those recombinant haplotypes are AAG:GAA, AAG:GGA, and AAG:AAG (Fig. 7a).
Fig. 7

Boxplot of yield (a) and dry bean weight (b) data per haplotype in the main QTL region on chromosome IV. The trees with recombination events within the region are colored according to the parental haplotype where the recombination event occurred: non-recombinants AAG:AAA (T1:C1) and AAG:GGG (T1:C2) are colored in green; non-recombinants GGA:AAA (T2:C1) and GGA:GGG (T2:C2) are colored in dark green. Trees possessing recombination events on the maternal haplotype T1 (AGA:GGG, AAA:GGG, AAA:AAA, and AGA:AAA) are colored in dark blue. Trees possessing recombination events on the maternal haplotype T2 (GAG:AAA, GGG:AAA, GGG:GGG, and GAG:GGG) are colored in blue. Recombination events on paternal haplotype C1 (AAG:GAA, AAG:AGG, AAG:GGA, and AAG:AAG) are colored in dark red, and recombination events on C2 (GGA:GGA, GGA:GAA, and GGA:AAG) are colored in red. The dotted line represents the population average for yield and dry bean weight, respectively

Boxplot of yield (a) and dry bean weight (b) data per haplotype in the main QTL region on chromosome IV. The trees with recombination events within the region are colored according to the parental haplotype where the recombination event occurred: non-recombinants AAG:AAA (T1:C1) and AAG:GGG (T1:C2) are colored in green; non-recombinants GGA:AAA (T2:C1) and GGA:GGG (T2:C2) are colored in dark green. Trees possessing recombination events on the maternal haplotype T1 (AGA:GGG, AAA:GGG, AAA:AAA, and AGA:AAA) are colored in dark blue. Trees possessing recombination events on the maternal haplotype T2 (GAG:AAA, GGG:AAA, GGG:GGG, and GAG:GGG) are colored in blue. Recombination events on paternal haplotype C1 (AAG:GAA, AAG:AGG, AAG:GGA, and AAG:AAG) are colored in dark red, and recombination events on C2 (GGA:GGA, GGA:GAA, and GGA:AAG) are colored in red. The dotted line represents the population average for yield and dry bean weight, respectively On the other hand, we observed an average yield of 123.85 kg/ha in the recombinant trees in which the maternal haplotype switched from T2 to T1, and in the ones in which the maternal haplotype T2 is combined with a recombinant haplotype from the paternal parent (Fig. 7a). For those recombinant trees the average yield was 164.62 kg/ha, which is 13% more than the population average. Those results confirm that the haplotype T2 (GGA) indeed is the favorable one to increase the yield in MP01 population. In contrast, trees possessing the haplotype T1 (AAG) showed higher dry bean weight compared with the others (Fig. 7b). The dry bean weight was 1.60 g in trees with recombination events on the haplotype T1 (Fig. 7b), while for the MP01 recombinants on the T2 haplotype was 1.49 g. Based on the examination of the trees with recombination events between the maternal haplotypes, we delimited the region regulating the phenotypic variation of yield components from 196,163 to 1,140,441 (944.3 kbp). In this region, we found 13 gene models that may be considered as potential candidates regulating the yield components (Table 5). Of those 13 candidate genes, nine are annotated as transmembrane transporters (GO:0055085) that are specialized in sugar transport (PF03083), two genes are involved in carbohydrate metabolism (GO:0005975), one gene is involved in lipid metabolism (ko00001), and one gene is involved in glucose metabolism (GO:0006011).
Table 5

Candidate genes identified within the main region on chromosome IV

NameStartEndLength (bp)StrandDescriptionCriollo v2
Thecc1EG016788t1192,183196,3374154+GluconokinaseTc04v2_t000080.1
Thecc1EG016835t1427,029431,4014372Nodulin MtN21/EamA-like transporter familyTc04v2_t000370.1
Thecc1EG016836t1431,839433,5231684Nodulin MtN21/EamA-like transporter familyTc04v2_t000380.1
Thecc1EG016837t1433,570434,5921022Nodulin MtN21/EamA-like transporter familyTc04v2_t000390.1
Thecc1EG016838t1436,287438,7342447Nodulin MtN21/EamA-like transporter familyTc04v2_t000400.1
Thecc1EG016840t1443,603446,1172514Nodulin MtN21/EamA-like transporter familyTc04v2_t000420.1
Thecc1EG016842t1450,314452,8532539Nodulin MtN21/EamA-like transporter familyTc04v2_t000410.1
Thecc1EG016844t1455,719458,6582939Nodulin MtN21/EamA-like transporter familyTc04v2_t000440.1
Thecc1EG016865t1569,595571,7492154+Nodulin MtN3/SWEET familyTc04v2_t000620.1
Thecc1EG016866t1574,635577,4502815+Nodulin MtN3/SWEET familyTc04v2_t000630.1
Thecc1EG016882t1645,786649,2403454GDSL-like Lipase/AcylhydrolaseTc04v2_t000790.1
Thecc1EG016942t1999,6691,007,7728103+Raffinose synthase family proteinTc04v2_t001280.1
Thecc1EG016965t11,134,8681,140,9846116+ABC transporter B family member 26Tc04v2_t001480.1
Candidate genes identified within the main region on chromosome IV

Hierarchical clustering on principal components of the yield components

We applied the hierarchical clustering on those principal components[12], to identify the groups of more productive trees (higher NPH, NHPH, and Yield) from MP01 population. The analysis identified three groups (clusters), of which DBW defines group 1, group 2 by PI and group 3 by the variables NPH, NHPH, and Yield (Fig. 8a and Supplementary Table 5). We provide the values for the cluster assignments for each individual of MP01 (Supplementary Table 5). From this analysis, group 3 contains the more productive trees from the MP01 population and may be used for a further selection of new higher-yielding varieties.
Fig. 8

Phenotypical profile and haplotype frequency within each cluster identified after hierarchical clustering analysis on the principal components of the yield variables from the MP01 population.

Panel (a) presents the radial charts for each cluster and five yield-related traits (DBW, NHP, NHPH, Yield, and PI), while panel (b) shows the frequency of the main haplotype on chromosome IV for each cluster

Phenotypical profile and haplotype frequency within each cluster identified after hierarchical clustering analysis on the principal components of the yield variables from the MP01 population.

Panel (a) presents the radial charts for each cluster and five yield-related traits (DBW, NHP, NHPH, Yield, and PI), while panel (b) shows the frequency of the main haplotype on chromosome IV for each cluster

Source of the alleles that increases the yield component and yield

To investigate the ancestry of the QTL with high LOD score on chromosome IV that increases the yield and yield components, we created a neighbor-joining tree of the ancestral haplotypes from a diversity panel that includes the parents of the F1 mapping population (‘TSH 1188’ and ‘CCN 51’), to identify the ancestry of the parental alleles (T1, T2, C1, and C2) associated with the cacao yield components on chromosome IV. The phylogenetic analysis of the main region on chromosome IV showed that ‘TSH 1188’ most likely inherited the haplotype T1 (AAG) from one of its great grandparents, ‘SCAVINA 6’ (SCA 6). In contrast, the second haplotype T2 (GGA) grouped with three varieties belonging to Iquitos genetic cluster, i.e., ‘IMC 12’, ‘IMC 47’, and ‘IMC 50’[13]. These results were expected, since another variety from Iquitos cluster[14], ‘IMC 67’, is part of the ‘TSH 1188’ lineage[15,16]. In turn, the first paternal haplotype, C1 (AAA), grouped with the first haplotype of ‘CCN 10’. The second haplotype of ‘CCN 51’, C2 (GGG), grouped with varieties from Iquitos genetic group, including ‘IMC 67’, which is the paternal parent of ‘CCN 51’[14] (Fig. 9). The haplotypes associated with increased yield in the mapping population, T2 (GGA) and C2 (GGG), are inherited from a member of the Iquitos genetic group.
Fig. 9

Neighbor-Joining tree analysis showing the origins of the haplotypes on the main QTL region on chromosome IV.

Sixty-two markers in the QTL on chromosome IV, corresponding to 838.8 kb. The blue highlighted names are the first (H1) and second (H2) haplotype for each mapping population parent, ‘TSH 1188’ and ‘CCN 51’

Neighbor-Joining tree analysis showing the origins of the haplotypes on the main QTL region on chromosome IV.

Sixty-two markers in the QTL on chromosome IV, corresponding to 838.8 kb. The blue highlighted names are the first (H1) and second (H2) haplotype for each mapping population parent, ‘TSH 1188’ and ‘CCN 51’

Discussion

Correlation and multivariate analysis of the yield components

It is well known that plant productivity and crop yield rely on the translocation and partitioning of assimilated carbon (photoassimilates) and nutrients over the development of vegetative and harvestable organs[17]. Growing vegetative tissues and harvestable organs compete for the available assimilated carbon in the translocation stream. Competition for assimilated carbon has been shown in which changes in number of fruits (fruit load) affect fruit growth, fresh weight and carbohydrate concentration in Malus domestica[18], Citrus clementina[19], and Actinidia Chinensis[20]. In our study, we found a significant negative correlation between DBW with all the other variables, which indicates a competition for the assimilated carbon among those harvestable organs. The strongest correlation was between DBW and PI, which is essentially the number of pods required to obtain 1 kg of dry bean weight (Fig. 1). In quantitative terms, a heavier dry bean weight reflects in more photoassimilates allocated to the beans, which reduces the pod index. In this sense, PI is a key yield component that represents the extent of photoassimilate partitioning to the cacao beans. Therefore, it is an important variable to support the selection of high-yielding varieties. The variables NPH, NHPH, and Yield, are more correlated with the first principal component (Dim.1) and, therefore, they are the most important in explaining the variability in the yield components from MP01. On the other hand, the variables PI and DBW are more correlated in the second (Dim.2) principal component (Fig. 2 and Supplementary Table 2). Therefore, combing the PCA eigenvalues from the first and second principal component would result in the selection of varieties that will be more productive but also have a lower PI.

QTL mapping analysis

In our QTL mapping analysis, we found 24 genomic positions spread over eight chromosomes (no QTLs on chromosomes VII and X) associated with cacao yield components, and those regions may contain the primary candidate genes that control the phenotypic variation observed in MP01. Overall, most of the QTLs explained a low percentage of the variance, in which the highest values were for the markers on chromosome IV (from 4.1% to 23.60%), chromosome VIII (8.5%), chromosome III (5.7%), chromosome II (5.4%), and chromosome 6 (from 4.2% to 5.2%). Although the variance explained for most QTL was low, the total QTL variance explained per trait was large. The QTLs for NPH and NHPH explained 23.5% and 26.9% of the phenotypic variation, respectively. The variables DBW and PI showed the highest values, 60.0% and 31.4%, respectively. Finally, the variable Yield explained only 9.0% of the phenotypic variation. Among the traits evaluated, Yield showed the lowest value of heritability (H2 = 0.10) what demonstrated a high influence of the environment on the expression of this trait. The variability in rainfall may be one of the main causes of yield losses in rainfed farming systems[21], such as in the area where the MP01 was grown. The yield components and yield are complex traits, and therefore, are controlled by multiple genes that can be affected by the different interactions with the environment, crop management, and growth and development processes. In our study, DBW and PI were the traits that showed the majority of QTLs with a lower contribution to the total percentage of variation explained. Those are quantitative traits, which are controlled by multiple genes involved in the physiological process that control seed size and weight[22], and in the supply-demand balance of photoassimilates to sink organs[23]. The actual phenotype that breeders are looking for commercial production is a tree that produces at least 50 pods per year, with a dry bean weight ranging from 1.0 to 2.0 g, such as ‘CCN 51’, the paternal parent of the MP01 population. ‘CCN 51’ is the most important variety in many countries in Central America and South America, with a potential production of at least 3 tons/ha. Since there are only two QTLs controlling NPH and NHPH, compared with the large number of QTLs found for the variable DBW, it will be more feasible to develop marker-assisted selection for trees producing higher number of pods. That is one of the reasons why we selected the regions on chromosome IV for discussion in this paper. The beginning of chromosome IV seems to be linked to many important traits. In our study, the region is associated with yield components. In other studies, the region from 1414 bp to 1,686,245 bp on chromosome IV is associated with other traits that may directly affect the cacao production, such as cherelle wilt ratio, fresh bean weight, fresh weight/pod, total number of pods, and disease resistance[24], fatty acid composition[25] and sexual-compatibility[26]. Another QTL associated with fat content in the MP01 population was also found on chromosome IV from 19,637,361 bp to 26,233,319 bp[25]. The top position of this QTL is located at the distance of 4,132,875 bp of one SNP mapped in our study for DBW, Tcm003s30366194. In this study, fat content represents 50.2–62.4% of the total dry weight of a bean[25], therefore, it is also an important yield component. In our study, the most significant region was located from 0.11 to 5.16 cM of chromosome IV. This region harbored at least one significant SNP marker for each of the yield components evaluated and provided a significant level of contribution to the observed phenotypic variation (Table 2). Multiple QTL regions associated with cacao yield components were also mapped[2] on chromosomes I, II, IV, V, and IX using phenotypic data from several years. In a second study, were found QTL regions at 72.3 cM on chromosome IV for traits such as bean length, shape index (the ratio of bean length to thickness), and fresh weight and the number of ovules per ovary[3]. Compared with our study, both QTL positions[2,3] are closer to our second minor QTL located at 66.6 cM on chromosome IV for DBW (Table 2). In addition, the regions we mapped on chromosome II at 41.2 and 57.9 cM for DBW and PI are most likely the same regions mapped for bean traits (length, width, and thickness) and the number of ovules[3]. A significant region at the top of chromosome IV was also reported by Livingstone et al.[24] for cherelle wilt, the total number of pods, and fresh bean weight. This QTL falls within one of the regions mapped for self-incompatibility[26], which is a crucial yield factor[27]. The yield efficiency component, defined as the ratio of production to a cross-section of the trunk in kg cm−3, is significantly higher in self-compatible cacao varieties than in self-incompatible groups[28]. Those authors found a significant positive correlation between self-compatibility and yield efficiency, which also indicates a positive relationship between self-compatibility and final yield. Our findings demonstrate the importance of this region on chromosome IV, flanked by the markers Tcm004s00289192, Tcm004s00615809, Tcm004s01127580, to select high-yielding varieties.

Main haplotypes increasing yield components and selection of candidate genes

Our analysis showed that the maternal haplotypes had the more significant effect on the phenotypical variation of the traits evaluated (Tables 3–4 and Figs. 4–7). So far in the MP01 population, the maternal alleles have been reported as the ones more affecting the economically important traits, such as disease resistance[11,29], and fat content and fatty acid composition[25]. Likewise, our study showed that the alleles from ‘TSH 1188’ are the ones controlling the QTL regions with a higher LOD scores, such as the one found on chromosome IV. Only for the SNP markers Tcm002s23708704 and Tcm008s04113686 that the alleles from ‘CCN 51’ had a significant effect controlling the phenotypic variation of the yield components in our study. These results indicated a high maternal influence on the inheritance of complex traits in the MP01 population. Given that we found a QTL explaining a large percentage of the phenotypic variation of the yield components, we can use the region on chromosome IV combined with haplotype effects on the other QTL regions, to select higher yield cacao varieties. A clear effect of the maternal haplotypes on the yield components was observed for the markers on the region on chromosome IV. In the MP01 population, the presence of the maternal haplotype T1 (AAG) was associated with an increase in dry bean weight. On the other hand, we observed a significant association of the haplotype T2 (GGA) with an increase in the number of pods and higher yield. These results are most likely reflecting the negative correlations found between the total number of pods and dry bean weight, as also observed between DBW and PI. Examination of MP01 trees that showed recombination events in this region confirmed that the inheritance of each maternal haplotype differentially influenced the dry bean weight (T1 = AAG) or the cacao yield (T2 = GGA). Besides, the frequencies of those haplotypes also changed according to the groups generated from hierarchical clustering on the principal components analysis (Fig. 8b). The frequency of T1:C1 and T1:C2 are higher on group 1 (cluster 1), which is defined by DBW, while decreased from group 2 (cluster 2) to 3 (cluster 3). However, the haplotype T2:C1 and T2:C2 are more frequent on group 3, which contains the more productive trees from MP01 population. Within this third group 85% of the trees possess the haplotype T2, being 50% for T2:C1 (GGA:AAA) and 35% for T2:C2 (GGA:GGG) (Fig. 8b). Those results point to the importance of haplotype T2 for selection of higher-yielding cacao varieties. The trees displaying recombination events between the main markers on chromosome IV also allowed us to select candidate genes associated with cacao yield components. We identified the genes within a 944.1 kbp genomic region from 196,163 to 1,140,441 bp on chromosome IV. In this region, we identified candidate genes potentially involved in source-sink regulation and lipid metabolism. For instance, Thecc1EG016788t1 encodes a gluconate kinase, which is an essential enzyme of the oxidative pentose phosphate pathway[30] that supplies NADPH during fatty acid synthesis in developing embryos[31]. Another possible lipid enzyme, Thecc1EG016882t1, may act in the breakdown of lipids and conversion to carbohydrates, and participate in the fatty acid metabolism during seed germination[32]. We also identified candidate genes belonging to this lipid enzyme family within recombination events near the markers on chromosomes II, V, VII, and IX (Supplementary Table 6). A major group of nine candidate genes belong to the transmembrane solute carrier family were found located between 427,029 and 577,450 bp (150 kb) on chromosome IV, and twelve copies of the same group of genes are present within the QTL regions identified on chromosomes II, III, V and VI. The list with the names of the markers, marker position (bp), and the recombination interval and region size is provided in Supplementary Table 6. Such transmembrane carriers are crucial for multiple aspects of plant growth and development, particularly in plant responses to biotic and abiotic stresses[33]. Seven of those genes are annotated as MtN21/EamA-like transporters, which might play a significant role in the transport of amino acids[34] and auxin[35] throughout the whole plant. Those genes are within the recombination region close to the markers Tcm003s27977985 and Tcm003s30366194 on chromosome III (Supplementary Table 6). The transport of amino acids represents the main route to supply reduced nitrogen from source-to-sink tissues[36]. Besides, the growth and development of fruit and seeds require a stable supply of nitrogen for the production of storage proteins[37]. Finally, the supply of amino acids also controls biomass production and seed yield in Pisum sativum L.[6]. Two other genes are part of the MtN3/SWEET gene family, which encodes a protein that principally acts like glucose, fructose, or sucrose transporters[38], but it can also export plant essential micronutrients from source-to-sink organs[39]. Besides, we found the MtN3/SWEET genes within the QTL regions on chromosomes II (Tcm002s15300388) and III (Tcm003s00621672 and Tcm003s27977985) (Supplementary Table 6). The MtN3/SWEET proteins are crucial to phloem loading and soluble carbohydrate transportation during fruit development and seed filling in many crops[7,40,41]. Various SWEET genes participate in the partitioning of non-structural carbohydrates to the fruits in M. domestica[42], and seed filling in Oryza sativa and Zea mays L.[43]. The genomic region where we identified copy-number variations of SWEET genes falls between the markers Tcm004s00289192 and Tcm004s00615809 on chromosome IV. Two other candidate genes are located between markers Tcm004s00615809 and Tcm004s01127580. The gene model Thecc1EG016942t1 is a raffinose synthase (EC: 2.4.1.82). Members of the raffinose oligosaccharide family function as carbon reserve compounds required during seed maturation and protection against abiotic stresses[8]. The other T. cacao candidate gene, Thecc1EG016965t1, is a member of ATP-binding cassette transporter family (ABC transporter) that are associated with transportation of diverse metabolites[44]. Furthermore, those proteins transport fatty acids for lipid synthesis during the seed filling of Arabidopsis thaliana[45] and are crucial during pollen development in Ananas comosus[9]. Other candidate genes involved in synthesis and breakdown of lipid and carbohydrates were also found nearby the QTL regions on the other chromosomes identified in this study. The candidate genes annotated as phosphatidic acid phosphatase, non-specific phospholipase and beta-ketoacyl-[acyl-carrier-protein] synthase II (Supplementary Table 6) appear to affect the production and accumulation of storage lipid during seed development of Jatropha curcas[46]. A putative soluble inorganic pyrophosphatase and two putative plant invertase/pectin methylesterase inhibitors (Supplementary Table 6) may participate in starch and sucrose metabolism pathways[47]. All candidate genes flanked by the marker Tcm004s30466731 on chromosome IV seem to be involved in the synthesis of 1-aminocyclopropane-1-carboxylate oxidase homolog 1, which is a direct precursor of ethylene during fruit ripening in Solanum lycopersicum[48]. Likewise, the putative protein UDP-glucosyltransferase, found near the markers Tcm001s18406546 and Tcm008s00170802, also appears to be involved in fruit ripening[49], and regulates secondary metabolites availability in peach[50]. Overall, the majority of the candidate genes identified within the QTL regions from our study may mediate important steps of lipid and carbohydrate metabolic pathways. Therefore, our study identified not only crucial candidate genes to be tested in functional gene expression studies but may also contribute in the development of molecular tools for improvement of cacao yield via breeding efforts.

Conclusion

Here we report SNP-based QTL regions that are associated with different cacao yield components such as the total number of pods harvested, dry bean weight, yield, and pod index. The total number of healthy pods, yield, and pod index were the most important for the identification of the higher productive genotypes from the MP01 population. Then, those variables may be used for further selection of new varieties. The SNP markers associated with these yield components will be used to screen and to select high-yielding varieties via marker-assisted selection and genomic selection. Identification of the QTLs combined with the information from trees that showed recombination events in these QTL regions helped to identify candidate gene models affecting the phenotypic variation of yield components in cacao. Those candidate genes are not specific to the traits evaluated, because those genes may have multiples functions, but certainly they do contribute to the yield formation in other crops. Therefore, they are important to understand the yield formation in T. cacao as well. In other crops, such genes seem to play a significant role in source-sink transport of sugars and lipid metabolism. Therefore, those genes are the primary candidate to influence the preferential remobilization of carbohydrates, for instance, to set pods and the subsequent pod development and bean filling, which are economically essential components of cacao yield. In addition, the identification of haplotypes that contribute to either more pods per tree or to heavier bean weight per pod might help to select trees that are better fitting for the farmers in terms of labor, pest control, and management of the (post) harvest processes.

Materials and methods

MP01 segregating mapping population

The mapping population evaluated in this study, referred to as MP01, is part of the Mars cacao-breeding program located at the Mars Center for Cocoa Science (MCCS), Bahia, Brazil. MP01 comprises 459 trees from a cross between ‘TSH 1188’, used as the female parent, and ‘CCN 51’, used as the male parent. The female parent is related to Iquitos and Nanay genetic groups[13,24], and the male parent has predominant ancestries from Iquitos, Criollo, and Amelonado[13,14,24]. Those parents contrast for many important traits, and the MP01 progenies segregate for disease resistance[11,29,51], pod color[5], fat content and fat composition[25]. The MP01 population also segregates for yield. ‘CCN 51’ is self-compatible[52,53] and worldwide recognized as one of the most productivity cacao varieties with the capacity to reach 3 tons/ha in high productive farming system. ‘CCN 51’ also has a good level of cross-compatibility with many varieties, including ‘TSH 1188’. ‘TSH 1188’ is self-incompatible[53], which results in lower pod production per tree and lower yield, compared with ‘CCN 51’. A study that evaluated the economically important traits showed that ‘TSH 1188’ had a dry weight of a single bean of 1.24 g and pod index ranging from 13 to 25[15]. ‘CCN 51’ had a dry weight of a single bean 1.62 g and pod index of 18. In terms of yield (dry bean weight in tons/hectare), ‘CCN 51’ showed values from 1034 to 1332 kg ha−1[16]. The evaluation of 20 trees in a clonal trial at MCCS from 2017 to 2019, demonstrated that we harvested three pods pods/tree/year for ‘TSH 1188’, while for ‘CCN 51’ we harvested 21 pods/tree/year. Overall, ‘CCN 51’ produced 0.539 kg/tree/year and ‘TSH 1188’ 0.074 kg/tree/year. Thus, the parents of the MP01 are also contrasting for yield components and yield, and their offspring segregate for those traits.

Fresh to dry bean weight conversion factor

We counted and weighed the fresh beans (FBW) of five to ten pods of 100 progenies in MP01 to measure the fresh to dry bean weight conversion factor. We counted the beans and weighed the fresh beans for each pod. We then fermented the beans in polyethylene net bags for 7 days. After that, the beans were sun-dried to 7% moisture content as measured with a grain moisture and impurity analyzer (model G650, Gehaka, São Paulo, Brazil). The dry beans were weighed, and we found that the average dry weight was 0.36 of the fresh bean weight. After that, we used this ratio to estimate dry bean weight from fresh bean weight.

Yield phenotypic data collection

The MP01 progeny dataset used in this study was collected from January 2007 through September 2018. Data were collected monthly on an individual tree basis. Per tree, we evaluated the total number of harvested pods (NPH), the number of healthy pods harvested (NHPH), which was calculated as the difference between NPH and the total pod lost by disease and pests. The total yield (Yield, kg/ha/year) was estimated 0.36 × FBW × 1111.11 (plant density, as number of trees per ha))/1000 for each year evaluated. The pod index (PI), which is defined as the number of pods required to obtain 1 kg of dry bean weight, was calculated as PI = NHPH × 1000/FBW × 0.36. The calculations were based on Mustiga et al.[54], with modifications in fresh to dry bean weight conversation factor, as explained above. We also calculated the single dry bean weight (DBW), which was calculated as DBW = FBW × 0.36/total number of beans per pod. The DBW data were not calculated for 2009, 2010, and 2011 as FBW was not measured during those years.

Data analysis

We calculated the best linear unbiased predictions (BLUP) from a linear mixed model fitted by maximum likelihood as described in Bates et al.[55]. The equation for the general linear mixed model fitted for each variable was: Where y is the vector of the response variable, μ is the variable mean, a is the random effect vector for individuals, b is the fixed-effect parameter for years, and e is the error. X and Z are incidence matrix values. Variance component estimates were used to calculate broad-sense heritability (H2). Also, we used BLUP values to carry out the principal components analysis for cacao yield components.

Linkage map and QTL mapping

To perform the QTL mapping, we used a cacao linkage[11] map containing 3526 SNPs and based on 459 trees from the MP01 population. For the initial detection of QTL with main effects, the first round of interval mapping (IM) was carried out to select SNP markers that significantly segregate for the traits evaluated from MP01. We used a regression approximation to maximum likelihood interval[56] to estimate the QTL effects. The significant logarithm of odds (LOD) was determined by analyzing 10,000 permutations with p-values ≤0.05[57]. Afterward, the SNP markers showing the highest LOD values were selected as cofactors for multiple interval mapping (MQM) with MapQTL (version 6; Kyazma BV, Wageningen, the Netherlands)[58]. Graphical representations of chromosomes containing QTLs with significant effects and LOD score peaks were drawn using MapChart software, version 2.3[59].

Haplotype–phenotype associations

We used phased SNP haplotype data[60] to identify the favorable haplotype–phenotype associations between yield and related variables. First, we calculate average data for each variable, and then we defined the phenotypic ranges using the frequency distribution function available at the R package ‘fdth’[61], with the number of class intervals (k) set to four. The analysis applied Pearson’s chi-squared test for counting each allele combination within phenotypical classes for each variable evaluated, to test the significant haplotype–phenotype associations for each SNP marker with the highest LOD score. Parental haplotype/allele combinations were designated as T1 and T2 for ‘TSH 1188’, and C1 and C2 for ‘CCN 51’. We recorded the frequency of each parental haplotype and their combinations (T1C1, T1C2, T2C1, and T2C2) presented in each progeny of MP01.

Candidate genes within the QTL regions

To identify potential candidate genes that drive the QTL effects and the phenotypic variations observed, we used the genomic region of the QTLs mapped in this study with the gene model annotation from Matina 1–6 v1.1[5] and Criollo genome v2.0[4]. We explored the available gene model annotations in conjunction with the families, domains, and functional sites of proteins (INTERPRO), and other enrichment tools, such as biological pathways maps (KEGG) and enzymatic reactions (EC). We looked for the genes classified as involved in carbohydrate metabolism, e.g., membrane transport of sugars and amino acids, and synthesis and degradation of carbohydrates and starch. Besides, we searched for genes that might be involved in lipid metabolism, e.g., fatty acid biosynthesis and degradation.

Phylogeny analysis of the main QTL region

We used the phased data of 106 SNP sequences from a diversity panel with 52 members from the ten T. cacao genetic groups[13]. Sixty-two SNP markers on the main QTL region on chromosome IV were used for this phasing. The markers go from Tcm004s00289192 to Tcm004s01127580, and correspond to a region of 838.8 kb. We used the neighbor-joining tree method[62] to infer the phylogenetic relationship between diversity panel members. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches[63]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The distances were computed using the maximum composite likelihood method[64]. All positions containing gaps and missing data were eliminated (complete deletion option). The phylogeny was conducted in MEGA X[65], and we edited the neighbor-joining tree according to Letunic and Bork[66]. Supplementary Table S6- Candidate genes found in the QTLs regions associated with yield components on MP01 population Supplementary Table S1 - Coordinates of the first two Principal components for MP01 individuals. Supplementary Table S2 - Coordinates of the first two Principal components for yield components. Supplementary Table S3 - The principal component eigenvalues and their percentage of variance. Supplementary Table S4 - Description of each cluster by quantitative variables. Supplementary Table S5 - Cluster assignments from hierarchical clustering on principal components.
  52 in total

1.  MapChart: software for the graphical presentation of linkage maps and QTLs.

Authors:  R E Voorrips
Journal:  J Hered       Date:  2002 Jan-Feb       Impact factor: 2.645

2.  Prospects for inferring very large phylogenies by using the neighbor-joining method.

Authors:  Koichiro Tamura; Masatoshi Nei; Sudhir Kumar
Journal:  Proc Natl Acad Sci U S A       Date:  2004-07-16       Impact factor: 11.205

3.  Fatty acid synthesis and the oxidative pentose phosphate pathway in developing embryos of oilseed rape (Brassica napus L.).

Authors:  David Hutchings; Stephen Rawsthorne; Michael J Emes
Journal:  J Exp Bot       Date:  2004-12-20       Impact factor: 6.992

4.  Siliques are Red1 from Arabidopsis acts as a bidirectional amino acid transporter that is crucial for the amino acid homeostasis of siliques.

Authors:  Friederike Ladwig; Mark Stahl; Uwe Ludewig; Axel A Hirner; Ulrich Z Hammes; Ruth Stadler; Klaus Harter; Wolfgang Koch
Journal:  Plant Physiol       Date:  2012-02-06       Impact factor: 8.340

5.  Leaf fructose content is controlled by the vacuolar transporter SWEET17 in Arabidopsis.

Authors:  Fabien Chardon; Magali Bedu; Fanny Calenge; Patrick A W Klemens; Lara Spinner; Gilles Clement; Giorgiana Chietera; Sophie Léran; Marina Ferrand; Benoit Lacombe; Olivier Loudet; Sylvie Dinant; Catherine Bellini; H Ekkehard Neuhaus; Françoise Daniel-Vedele; Anne Krapp
Journal:  Curr Biol       Date:  2013-04-11       Impact factor: 10.834

6.  Emerging functions of nodulin-like proteins in non-nodulating plant species.

Authors:  Nicolas Denancé; Boris Szurek; Laurent D Noël
Journal:  Plant Cell Physiol       Date:  2014-01-27       Impact factor: 4.927

7.  Impaired phloem loading in zmsweet13a,b,c sucrose transporter triple knock-out mutants in Zea mays.

Authors:  Margaret Bezrutczyk; Thomas Hartwig; Marc Horschman; Si Nian Char; Jinliang Yang; Bing Yang; Wolf B Frommer; Davide Sosso
Journal:  New Phytol       Date:  2018-02-16       Impact factor: 10.151

8.  The FGGY carbohydrate kinase family: insights into the evolution of functional specificities.

Authors:  Ying Zhang; Olga Zagnitko; Irina Rodionova; Andrei Osterman; Adam Godzik
Journal:  PLoS Comput Biol       Date:  2011-12-22       Impact factor: 4.475

9.  Apoplastic and intracellular plant sugars regulate developmental transitions in witches' broom disease of cacao.

Authors:  Joan Barau; Adriana Grandis; Vinicius Miessler de Andrade Carvalho; Gleidson Silva Teixeira; Gustavo Henrique Alcalá Zaparoli; Maria Carolina Scatolin do Rio; Johana Rincones; Marcos Silveira Buckeridge; Gonçalo Amarante Guimarães Pereira
Journal:  J Exp Bot       Date:  2014-12-24       Impact factor: 6.992

10.  Deciphering the Theobroma cacao self-incompatibility system: from genomics to diagnostic markers for self-compatibility.

Authors:  Claire Lanaud; Olivier Fouet; Thierry Legavre; Uilson Lopes; Olivier Sounigo; Marie Claire Eyango; Benoit Mermaz; Marcos Ramos Da Silva; Rey Gaston Loor Solorzano; Xavier Argout; Gabor Gyapay; Herman Ebai Ebaiarrey; Kelly Colonges; Christine Sanier; Ronan Rivallan; Géraldine Mastin; Nicholas Cryer; Michel Boccara; Jean-Luc Verdeil; Ives Bruno Efombagn Mousseni; Karina Peres Gramacho; Didier Clément
Journal:  J Exp Bot       Date:  2017-10-13       Impact factor: 6.992

View more
  2 in total

1.  Theobroma cacao L. cultivar CCN 51: a comprehensive review on origin, genetics, sensory properties, production dynamics, and physiological aspects.

Authors:  Ramon E Jaimez; Luigy Barragan; Miguel Fernández-Niño; Ludger A Wessjohann; George Cedeño-Garcia; Ignacio Sotomayor Cantos; Francisco Arteaga
Journal:  PeerJ       Date:  2022-01-05       Impact factor: 2.984

2.  Genome-wide association analysis of anthracnose resistance in sorghum [Sorghum bicolor (L.) Moench].

Authors:  Girma Mengistu; Hussein Shimelis; Ermias Assefa; Dagnachew Lule
Journal:  PLoS One       Date:  2021-12-20       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.