| Literature DB >> 33046742 |
Maria B Rabaglino1, Haja N Kadarmideen2.
Abstract
The main goal was to apply machine learning (ML) methods on integrated multi-transcriptomic data, to identify endometrial genes capable of predicting uterine receptivity according to their expression patterns in the cow. Public data from five studies were re-analyzed. In all of them, endometrial samples were obtained at day 6-7 of the estrous cycle, from cows or heifers of four different European breeds, classified as pregnant (n = 26) or not (n = 26). First, gene selection was performed through supervised and unsupervised ML algorithms. Then, the predictive ability of potential key genes was evaluated through support vector machine as classifier, using the expression levels of the samples from all the breeds but one, to train the model, and the samples from that one breed, to test it. Finally, the biological meaning of the key genes was explored. Fifty genes were identified, and they could predict uterine receptivity with an overall 96.1% accuracy, despite the animal's breed and category. Genes with higher expression in the pregnant cows were related to circadian rhythm, Wnt receptor signaling pathway, and embryonic development. This novel and robust combination of computational tools allowed the identification of a group of biologically relevant endometrial genes that could support pregnancy in the cattle.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33046742 PMCID: PMC7550564 DOI: 10.1038/s41598-020-72988-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Characteristics of each dataset selected for data integration and analysis.
| Accession number in GEO | Animal breed | Animal category | Method to induce pregnancy | Method of sample collection | Number of samples | Platform for transcriptomic determination | Authors & date of data publication |
|---|---|---|---|---|---|---|---|
| GSE115756 | Holstein | Lactating cows (1st to 3rd lactation) | IVP-ET | Biopsy instrument | R = 8 nonR = 9 | Illumina HiSeq 2500 (RNA- sequencing) | Mazzoni et al.[ |
| GSE107741 | Japanese Black | Cows | AI / IV-ET | Biopsy instrument | R = 6 nonR = 5 | Agilent-023647 B. taurus Oligo Microarray v2 | Matsuyama et al. |
| GSE29853 | Charolais × Limousine | Heifers | AI | Postmortem peeling from the uterine myometrium | R = 6 nonR = 6 | Affymetrix Bovine Genome Array | Killen et al.[ |
| GSE36080 | Simmental | Heifers | IV-ET | Cytobrush | R = 3 nonR = 3 | Ponsuksili and Wimmers[ | |
| GSE20974 | R = 3 nonR = 3 | Salilew-Wondim et al.[ |
For all the experiments, endometrial samples from Bos taurus cattle were obtained around day 7 of the estrous cycle, and they were classified retrospectively or prospectively according to the pregnancy results. IVP-ET: transfer of in-vitro produced embryos; IV-ET: transfer in-vivo produced embryos; AI: artificial insemination; R and nonR: animals classified as receptive or not, respectively.
List of the 50 endometrial genes identified as biomarkers to determine pregnancy status around day 7 of the estrous cycle in the Bos taurus cattle.
| Ensembl gene ID | Gene symbol | Gene name | Direction |
|---|---|---|---|
| ENSBTAG00000001069 | TP53 | Tumor protein p53 | UP |
| ENSBTAG00000001568 | PPIC | Peptidylprolyl isomerase C | UP |
| ENSBTAG00000002108 | YWHAQ | Tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein theta | UP |
| ENSBTAG00000002130 | SMPD4 | Sphingomyelin phosphodiesterase 4 | UP |
| ENSBTAG00000003397 | CTBP2 | C-terminal binding protein 2 | DOWN |
| ENSBTAG00000003532 | TLE4 | Transducin like enhancer of split 4 | UP |
| ENSBTAG00000003718 | HACL1 | 2-hydroxyacyl-CoA lyase 1 | DOWN |
| ENSBTAG00000003843 | SMARCAL1 | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a like 1 | UP |
| ENSBTAG00000004459 | TMEM45A | Transmembrane protein 45A | UP |
| ENSBTAG00000004769 | NEIL2 | Nei like DNA glycosylase 2 | DOWN |
| ENSBTAG00000005092 | ROR2 | Receptor tyrosine kinase like orphan receptor 2 | UP |
| ENSBTAG00000005462 | FXR2 | FMR1 autosomal homolog 2 | DOWN |
| ENSBTAG00000006002 | DIDO1 | Death inducer-obliterator 1 | UP |
| ENSBTAG00000007007 | WDR20 | WD repeat domain 20 | DOWN |
| ENSBTAG00000008083 | SEL1L | SEL1L ERAD E3 ligase adaptor subunit | UP |
| ENSBTAG00000008181 | CHAF1A | Chromatin assembly factor 1 subunit A | UP |
| ENSBTAG00000008943 | ZSCAN12 | Zinc finger and SCAN domain containing 12 | DOWN |
| ENSBTAG00000009121 | STAG2 | Stromal antigen 2 | DOWN |
| ENSBTAG00000009541 | SUCLG2 | Succinate-CoA ligase GDP-forming beta subunit | DOWN |
| ENSBTAG00000009863 | BHLHE40 | Basic helix-loop-helix family member e40 | UP |
| ENSBTAG00000010416 | RIN3 | Ras and Rab interactor 3 | UP |
| ENSBTAG00000011205 | PPP1R42 | Protein phosphatase 1 regulatory subunit 42 | UP |
| ENSBTAG00000011818 | COL26A1 | Collagen type XXVI alpha 1 chain | UP |
| ENSBTAG00000012454 | SLC35A3 | Solute carrier family 35 member A3 | DOWN |
| ENSBTAG00000014217 | HHEX | Hematopoietically expressed homeobox | UP |
| ENSBTAG00000014393 | CLK2 | CDC like kinase 2 | UP |
| ENSBTAG00000014644 | DUS2 | Dihydrouridine synthase 2 | UP |
| ENSBTAG00000014713 | RARRES1 | Retinoic acid receptor responder 1 | UP |
| ENSBTAG00000014838 | VPS26B | VPS26, retromer complex component B | DOWN |
| ENSBTAG00000015390 | FN3KRP | Fructosamine 3 kinase related protein | DOWN |
| ENSBTAG00000016977 | FUNDC2 | FUN14 domain containing 2 | UP |
| ENSBTAG00000017505 | PAXIP1 | PAX interacting protein 1 | UP |
| ENSBTAG00000017833 | RNF19A | Ring finger protein 19A, RBR E3 ubiquitin protein ligase | DOWN |
| ENSBTAG00000019155 | FRS2 | Fibroblast growth factor receptor substrate 2 | DOWN |
| ENSBTAG00000020611 | GLB1L | Galactosidase beta 1 like | UP |
| ENSBTAG00000020943 | CTU2 | Cytosolic thiouridylase subunit 2 | DOWN |
| ENSBTAG00000021151 | MYH10 | Myosin heavy chain 10 | UP |
| ENSBTAG00000021680 | SKA2 | Spindle and kinetochore associated complex subunit 2 | UP |
| ENSBTAG00000021768 | CCNG2 | Cyclin G2 | DOWN |
| ENSBTAG00000023179 | TRIB1 | Tribbles pseudokinase 1 | UP |
| ENSBTAG00000024240 | ACADM | acyl-CoA dehydrogenase, C-4 to C-12 straight chain | DOWN |
| ENSBTAG00000026290 | PIK3C3 | Phosphatidylinositol 3-kinase catalytic subunit type 3 | DOWN |
| ENSBTAG00000031385 | RFWD2 | Ring finger and WD repeat domain 2 | UP |
| ENSBTAG00000032613 | SCG5 | Secretogranin V | UP |
| ENSBTAG00000034693 | SYT1 | Synaptotagmin 1 | UP |
| ENSBTAG00000037757 | EBF4 | Early B-cell factor 4 | UP |
| ENSBTAG00000038866 | UBE2I | Ubiquitin conjugating enzyme E2 I | UP |
| ENSBTAG00000039980 | CLECL1 | C-type lectin-like 1 | UP |
| ENSBTAG00000043964 | ARL5B | ADP ribosylation factor like GTPase 5B | DOWN |
| ENSBTAG00000045550 | TSPAN6 | Tetraspanin 6 | UP |
The column named “direction” indicates if the gene was more (UP) or less (DOWN) expressed in the endometria of the animals that resulted pregnant.
Evaluation metrics corresponding to the classifications on pregnancy status based on the expression of the 50 endometrial genes for each breed, using Support Vector Machine as classifier, trained with all the samples except for the samples of the particular breed.
| Metric | Holstein | Charolais × Limousine | Japanese black | Simmental |
|---|---|---|---|---|
| Accuracy | 0.941 | 0.917 | 1.000 | 1.000 |
| Kappa | 0.883 | 0.833 | 1.000 | 1.000 |
| Accuracy Lower | 0.713 | 0.615 | 0.715 | 0.735 |
| Accuracy Upper | 0.999 | 0.998 | 1.000 | 1.000 |
| Accuracy Null | 0.529 | 0.500 | 0.545 | 0.500 |
| Accuracy PValue | 0.000 | 0.003 | 0.001 | 0.000 |
| McNemar PValue | 1.000 | 1.000 | NA | NA |
| Sensitivity | 0.889 | 1.000 | 1.000 | 1.000 |
| Specificity | 1.000 | 0.833 | 1.000 | 1.000 |
| Positive Predictive Value | 1.000 | 0.857 | 1.000 | 1.000 |
| Negative Predictive Value | 0.889 | 1.000 | 1.000 | 1.000 |
| Precision | 1.000 | 0.857 | 1.000 | 1.000 |
| Recall | 0.889 | 1.000 | 1.000 | 1.000 |
| F1 | 0.941 | 0.923 | 1.000 | 1.000 |
| Prevalence | 0.529 | 0.500 | 0.455 | 0.500 |
| Detection Rate | 0.471 | 0.500 | 0.455 | 0.500 |
| Detection Prevalence | 0.471 | 0.583 | 0.455 | 0.500 |
| Balanced Accuracy | 0.944 | 0.917 | 1.000 | 1.000 |
Figure 1Principal component analysis of samples’ distribution according to the expression of the 50 biomarker genes. The plots show the distribution of samples collected from receptive (R) and non-receptive (nonR) cows, and samples obtained from: (A) pregnant cows but treated with a progesterone device from day 3 (PH) or with normal progesterone levels (PN); (B) ovariectomized cows receiving a progesterone treatment for 6 days plus estradiol at day 6 (E2 + P4) or only the progesterone treatment (P4).