| Literature DB >> 35692843 |
Anderson Antonio Carvalho Alves1, Rebeka Magalhães da Costa1, Larissa Fernanda Simielli Fonseca1, Roberto Carvalheiro1,2, Ricardo Vieira Ventura3, Guilherme Jordão de Magalhães Rosa4, Lucia Galvão Albuquerque1,2.
Abstract
This study aimed to perform a genome-wide association analysis (GWAS) using the Random Forest (RF) approach for scanning candidate genes for age at first calving (AFC) in Nellore cattle. Additionally, potential epistatic effects were investigated using linear mixed models with pairwise interactions between all markers with high importance scores within the tree ensemble non-linear structure. Data from Nellore cattle were used, including records of animals born between 1984 and 2015 and raised in commercial herds located in different regions of Brazil. The estimated breeding values (EBV) were computed and used as the response variable in the genomic analyses. After quality control, the remaining number of animals and SNPs considered were 3,174 and 360,130, respectively. Five independent RF analyses were carried out, considering different initialization seeds. The importance score of each SNP was averaged across the independent RF analyses to rank the markers according to their predictive relevance. A total of 117 SNPs associated with AFC were identified, which spanned 10 autosomes (2, 3, 5, 10, 11, 17, 18, 21, 24, and 25). In total, 23 non-overlapping genomic regions embedded 262 candidate genes for AFC. Enrichment analysis and previous evidence in the literature revealed that many candidate genes annotated close to the lead SNPs have key roles in fertility, including embryo pre-implantation and development, embryonic viability, male germinal cell maturation, and pheromone recognition. Furthermore, some genomic regions previously associated with fertility and growth traits in Nellore cattle were also detected in the present study, reinforcing the effectiveness of RF for pre-screening candidate regions associated with complex traits. Complementary analyses revealed that many SNPs top-ranked in the RF-based GWAS did not present a strong marginal linear effect but are potentially involved in epistatic hotspots between genomic regions in different autosomes, remarkably in the BTAs 3, 5, 11, and 21. The reported results are expected to enhance the understanding of genetic mechanisms involved in the biological regulation of AFC in this cattle breed.Entities:
Keywords: beef cattle; candidate genes; ensemble learning; fertility traits; non-parametric methods; physiological epistasis
Year: 2022 PMID: 35692843 PMCID: PMC9178659 DOI: 10.3389/fgene.2022.834724
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Density plot of estimated breeding values (EBV) for age at first calving in Nellore cattle (A) and their respective accuracy (B), according to the sex category.
FIGURE 2Influence of the random forest hyperparameters (Mtry and Ntree) in the out-of-bag prediction error for age at first calving in Nellore cattle.
FIGURE 3Manhattan plots for age at first calving (AFC) in Nellore cattle considering the relative importance scores computed for each SNP in five independent Random Forest (RF) analyses (A–E) and averaged across the RF replicates (F). Negative importance scores were plotted as zero. The blue dashed line corresponds to the threshold value for SNP selection.
FIGURE 4Heatmap of the linkage disequilibrium (LD), measured with the r 2 metric, among the 117 SNPs identified in the Random Forest analysis. SNPs located in the same chromosome are identified with the BTA label on the left or right sides.
FIGURE 5Gene network for Age at First Calving (AFC) in Nellore cattle. Different node colors represent the functional groups in which the candidate genes are involved.
FIGURE 6Heatmap of the −log10 (p-values) for the marginal (diagonal) and pairwise interaction effects (off-diagonal) computed via mixed model analyses for the 117 lead SNPs identified in the Random Forest genome-wide scan for Age at First Calving in Nellore Cattle. The heatmap color key (right side) indicates the significance magnitude for the main and interaction effects in the −log10 (p-value) scale. Side color bars (top and left) indicate the Bos taurus autosome (BTA) where each marker is located.
Significant pairwise epistatic effects in the mixed model analyses considering the subset of SNP pre-selected with the Random-Forest-based genome-wide scan for Age at First Calving in Nellore. Only pairs with at least one marker within a different candidate gene are shown.
| SNP 1 | SNP 2 | GPD |
| ||||||
|---|---|---|---|---|---|---|---|---|---|
| BTA | Position (bp) | Nearest Gene | Distance | BTA | Position (bp) | Nearest Gene | Distance | ||
| 3 | 1207469 |
| intron | 11 | 4969068 |
| intron | 0.075 | 3.0 × 10−4* |
| 3 | 633619 |
| intron | 11 | 4969068 |
| intron | 0.101 | 2.7 × 10−5** |
| 3 | 261913 |
| exon | 11 | 4969068 |
| intron | 0.075 | 3.6 × 10−5** |
| 3 | 981696 |
| intron | 11 | 4969068 |
| intron | 0.081 | 1.0 × 10−4* |
| 3 | 1135161 |
| intron | 11 | 4969068 |
| intron | 0.084 | 7.3 × 10−4* |
| 3 | 1207469 |
| intron | 11 | 2811617 |
| intron | 0.068 | 3.0 × 10−4* |
| 5 | 19823030 |
| 108817 | 18 | 63426736 |
| intron | 0.051 | 7.7 × 10−4* |
| 5 | 19823030 |
| 108817 | 21 | 5004507 |
| intron | 0.039 | 1.0 × 10−4* |
| 5 | 19823030 |
| 108817 | 21 | 1164298 |
| exon | 0.043 | 6.4 × 10−4* |
| 5 | 19823030 |
| 108817 | 21 | 1195551 |
| exon | 0.043 | 6.4 × 10−4* |
| 5 | 19823030 |
| 108817 | 21 | 1940618 |
| intron | 0.043 | 5.7 × 10−4* |
| 11 | 2811617 |
| intron | 24 | 55609449 |
| 133740 | 0.069 | 1.0 × 10−5** |
| 11 | 4969068 |
| intron | 24 | 55609449 |
| 133740 | 0.053 | 6.2 × 10−4* |
BTA, Bos taurus autosome; Bp, base pairs; GPD, Gametic-phase disequilibrium.
*, **, Significant at the false discovery rate (FDR) threshold of 0.1 and 0.05, respectively.
FIGURE 7Protein-protein interaction analysis of genes surrounding SNPs involved in significant inter-chromosomal hotspots (p < 8.9 × 10−4) for age at first calving in Nellore cattle. Different node colors represent genes clustered according to their functional similarity. Edges represent protein-protein associations. The edges thickness represents the interaction confidence degree (the thicker the highest is the confidence). Dotted lines represent interactions between clusters. The original figure was edited for including the autosomes (BTAs) in which the genes are located.