| Literature DB >> 35563516 |
Ataul Haleem1,2, Selina Klees1,3, Armin Otto Schmitt1,3, Mehmet Gültas2,3.
Abstract
Maize is one of the most widely grown cereals in the world. However, to address the challenges in maize breeding arising from climatic anomalies, there is a need for developing novel strategies to harness the power of multi-omics technologies. In this regard, pleiotropy is an important genetic phenomenon that can be utilized to simultaneously enhance multiple agronomic phenotypes in maize. In addition to pleiotropy, another aspect is the consideration of the regulatory SNPs (rSNPs) that are likely to have causal effects in phenotypic development. By incorporating both aspects in our study, we performed a systematic analysis based on multi-omics data to reveal the novel pleiotropic signatures of rSNPs in a global maize population. For this purpose, we first applied Random Forests and then Markov clustering algorithms to decipher the pleiotropic signatures of rSNPs, based on which hierarchical network models are constructed to elucidate the complex interplay among transcription factors, rSNPs, and phenotypes. The results obtained in our study could help to understand the genetic programs orchestrating multiple phenotypes and thus could provide novel breeding targets for the simultaneous improvement of several agronomic traits.Entities:
Keywords: gene expression profiles; hierarchical network model; incremental feature selection; markov clustering; multi-omics; random forest; regulatory SNPs
Mesh:
Year: 2022 PMID: 35563516 PMCID: PMC9100765 DOI: 10.3390/ijms23095121
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1Overview of the analyses pipeline highlighting key machine learning algorithms for the identification of pleiotropic signatures of regulatory SNPs (rSNPs) to establish complex interplay of transcription factors (TFs), rSNPs and multiple phenotypes. The genotypic data (A1), consisting of 1.03 m SNP markers was filtered for MAF (<0.05) and 31,000 SNPs found within promoter regions of 37,407 maize genes were considered for association analysis with 20 quantitative agronomic traits (A2). RNA-seq (A3) dataset was utilized for the validation of pleiotropic rSNPs on the underlying gene expression. As of first step in the data analysis, rSNPs were identified (B) for their impact on the gain or loss of TFBSs, after which their association with multiple phenotypes was determined using random forest (RF) using the Boruta algorithm and incremental feature selection (IFS) technique (C). Pleiotropic signatures of rSNPs were then established by pruning weaker connections in the overall network into smaller non-overlapping fully connected clusters, using Markov clustering (MCL) algorithm (D) which provided the basis for the construction of hierarchical network models with three distinct layers modelling the complex interplay of TFs, rSNPs and multiple phenotype (B). Further, the boxplots show the impact of pleiotropic rSNPs at gene expression level as a function of gain or loss of TFBSs (E).
Figure 2A plot to show the change of values versus the number of rSNPs in association with the phenotype pollen shed. The incremental feature selection (IFS) curves were drawn using the ranking of rSNPs. The value reached a peak when considering the first 90 rSNPs. These rSNPs were used for the further analysis of this phenotype.
Phenotypes and the optimal numbers of their associated rSNPs determined by incremental feature selection (IFS) procedure.
| Phenotype | Max | #rSNPs |
|---|---|---|
| Leaf number above ear | 0.490740 | 89 |
| Ear leaf width | 0.484029 | 70 |
| Cob diameter | 0.445720 | 64 |
| Ear height | 0.509523 | 109 |
| Kernel width | 0.418115 | 172 |
| Ear leaf length | 0.553292 | 112 |
| Tassel main axis length | 0.498562 | 96 |
| Pollen shed | 0.581765 | 90 |
| Heading date | 0.537987 | 49 |
| Ear length | 0.434011 | 82 |
| Silking time | 0.506520 | 122 |
| Ear diameter | 0.481445 | 110 |
| Cob weight | 0.460850 | 37 |
| X100 grain weight | 0.389332 | 51 |
| Tassel branch number | 0.507112 | 142 |
| Ear row number | 0.491663 | 46 |
| Kernel number per row | 0.350717 | 27 |
| Plant height | 0.532837 | 72 |
| kernel length | 0.580691 | 168 |
| Kernel thickness | 0.437589 | 64 |
Figure 3Number of associated rSNPs determined by the incremental feature selection (IFS) procedure for each phenotype and their overlap represented in matrix layouts using the UpSet technique [67]. Black circles in the matrix layout are related to the phenotypes that are part of the intersection. For the sake of clarity, not all intersections are displayed.
Result of Markov clustering algorithm (MCL) including the numbers of rSNPs together with their related genes and their associated multiple phenotypes.
| Cluster | Numbers of Pleiotropic | Phenotypes | |
|---|---|---|---|
| rSNPs | Genes | ||
| Cluster-1 | 15 | 10 | Heading date, Pollen shed, Silking time and Ear height |
| Cluster-2 | 9 | 7 | Cob weight, Heading date, Pollen shed, Tassel main axis length, Ear leaf length, Plant height, Ear leaf width, Ear row number and Ear height |
| Cluster-3 | 7 | 6 | Kernel length, Kernel thickness, Kernel number per row, Ear diameter and X100 grain weight |
| Cluster-4 | 6 | 5 | Ear diameter, Cob diameter and Ear row number |
| Cluster-5 | 6 | 4 | Heading date, Pollen shed, Silking time, Ear height and Tassel branch number |
| Cluster-6 | 5 | 3 | Ear diameter, Cob diameter and Ear row number |
| Cluster-7 | 3 | 3 | Ear height, Plant height and Ear row number |
| Cluster-8 | 3 | 3 | Kernel width, Kernel length, Kernel thickness and X100 grain weight |
| Cluster-9 | 2 | 2 | Ear leaf length, Leaf number above ear, Kernel length |
| Cluster-10 | 2 | 2 | Ear length and Kernel number per row |
| Cluster-11 | 2 | 2 | Ear diameter, Tassel main axis length and Cob weight |
Figure 4Hierarchical network model constructed using Cluster-7 to elucidate the complex interplay among TFs−rSNPs(genes)−Phenotypes. (A–C) show the significant changes in the gene expression values resulting from the consequences of pleiotropic rSNPs. (D) Hierarchical network model with three layers.