| Literature DB >> 28903352 |
Faheem Ahmed Khan1, Hui Liu1, Hao Zhou1, Kai Wang1, Muhammad Tahir Ul Qamar2, Nuruliarizki Shinta Pandupuspitasari1,3, Zhang Shujun1.
Abstract
The biology of sperm, its capability of fertilizing an egg and its role in sex ratio are the major biological questions in reproductive biology. To answer these question we integrated X and Y chromosome transcriptome across different species: Bos taurus and Sus scrofa and identified reproductive driver genes based on Weighted Gene Co-Expression Network Analysis (WGCNA) algorithm. Our strategy resulted in 11007 and 10445 unique genes consisting of 9 and 11 reproductive modules in Bos taurus and Sus scrofa, respectively. The consensus module calculation yields an overall 167 overlapped genes which were mapped to 846 DEGs in Bos taurus to finally get a list of 67 dual feature genes. We develop gene co-expression network of selected 67 genes that consists of 58 nodes (27 down-regulated and 31 up-regulated genes) enriched to 66 GO biological process (BP) including 6 GO annotations related to reproduction and two KEGG pathways. Moreover, we searched significantly related TF (ISRE, AP1FJ, RP58, CREL) and miRNAs (bta-miR-181a, bta-miR-17-5p, bta-miR-146b, bta-miR-146a) which targeted the genes in co-expression network. In addition we performed genetic analysis including phylogenetic, functional domain identification, epigenetic modifications, mutation analysis of the most important reproductive driver genes PRM1, PPP2R2B and PAFAH1B1 and finally performed a protein docking analysis to visualize their therapeutic and gene expression regulation ability.Entities:
Keywords: WGCNA; fertilization; reproduction; spermatogenesis; spermiogenesis
Year: 2017 PMID: 28903352 PMCID: PMC5589591 DOI: 10.18632/oncotarget.17081
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1A multi-step Strategy
The schematic diagram for a multi-step strategy to identify reproductive driver genes.
Figure 2Adjacency matrix weight parameter power plot
The horizontal axis represents the weight parameter power, the vertical axis represents the square of the correlation coefficient between log (k) and log (p (k)).
Figure 3Module clustering tree of Bos taurus (A) and Sus scrofa (B). Different colours in module bar mean different modules. D1 and D2 refer to Bos taurus and Sus scrofa datasets, respectively. M1 to M11 mean different module numbers.
Figure 4Correspondence of Bos taurus-specific modules and Sus scrofa-specific modules
Numbers in boxes refer to overlapped genes in every two modules. Color bar in the left means significance p value of module consensus,0-20 represents –log2(p value).
Characterization of Bos taurus and Sus scrofa modules
| Bos taurus Module | Sus scrofa Module | Module color | Bos taurus module gene count | Sus scrofa module gene count | Overlap number | Overlap p value |
|---|---|---|---|---|---|---|
| D1M1 | D2M2 | black-blue | 49 | 69 | 12 | 0.00817 |
| D1M1 | D2M7 | black-red | 49 | 38 | 2 | 0.0423 |
| D1M2 | D2M2 | blue-blue | 85 | 69 | 10 | 0.00846 |
| D1M2 | D2M3 | blue-brown | 85 | 50 | 11 | 0.00679 |
| D1M2 | D2M8 | blue-turquoise | 85 | 69 | 12 | 0.000142 |
| D1M3 | D2M2 | brown-blue | 76 | 69 | 9 | 0.00035 |
| D1M3 | D2M8 | brown-turquoise | 76 | 69 | 10 | 0.000482 |
| D1M4 | D2M3 | green-brown | 54 | 50 | 8 | 0.0116 |
| D1M4 | D2M8 | green-turquoise | 54 | 69 | 10 | 0.0047 |
| D1M6 | D2M2 | pink-blue | 49 | 69 | 8 | 0.0144 |
| D1M8 | D2M2 | turquoise-blue | 114 | 69 | 10 | 0.0102 |
| D1M8 | D2M4 | turquoise-green | 114 | 41 | 12 | 0.00047 |
| D1M8 | D2M6 | turquoise-pink | 114 | 33 | 11 | 0.008203 |
| D1M8 | D2M8 | turquoise-turquoise | 114 | 69 | 11 | 0.00848 |
| D1M8 | D2M10 | turquoise-magenta | 114 | 31 | 7 | 0.00134 |
| D1M9 | D2M7 | yellow-red | 68 | 38 | 7 | 0.00123 |
| D1M9 | D2M8 | yellow-turquoise | 68 | 69 | 8 | 0.00136 |
| D1M9 | D2M9 | yellow-yellow | 68 | 44 | 9 | 0.000435 |
Figure 5(A) Volcano plot of statistic test for gene expression in Bos taurus. (B) Heatmap of DEGs in Bos taurus.
Figure 6Co-expression network of 67 selected genes
Red and green nodes refer to up and down regulated genes in Bos taurus; dash and solid edge mean negative and positive correlation coefficient.
Figure 7Barplot for reproduction related GO annotations
The horizontal axis represents gene count, the vertical axis represents GO annotations, color bar shows –log (p), where p is enrichment significant.
Reproduction related GO annotations list
| Term | Count | P Value | Genes |
|---|---|---|---|
| GO:0003006˜reproductive developmental process | 6 | 0.002729 | INHBB, HMGB2, PAFAH1B1, EPOR, PRM1, PPP2R2B |
| GO:0019953˜sexual reproduction | 6 | 0.026424 | OVGP1, HMGB2, GPX4, PAFAH1B1, PRM1, PPP2R2B |
| GO:0007283˜spermatogenesis | 5 | 0.026725 | HMGB2, GPX4, PAFAH1B1, PRM1, PPP2R2B |
| GO:0007565˜female pregnancy | 4 | 0.00777 | OVGP1, EPOR, PLAU, RPL29 |
| GO:0007286˜spermatid development | 3 | 0.016952 | PAFAH1B1, PRM1, PPP2R2B |
| GO:0048515˜spermatid differentiation | 3 | 0.018774 | PAFAH1B1, PRM1, PPP2R2B |
KEGG pathway annotations list
| Term | Count | PValue | Genes |
|---|---|---|---|
| hsa04514:Cell adhesion molecules (CAMs) | 6 | 0.002756 | SELP, CD86, CD34, PECAM1, CD2, CD4 |
| hsa04640:Hematopoietic cell lineage | 5 | 0.003632 | IL4, CD34, CD2, EPOR, CD4 |
Related TF list
| Category | Term | Count | P Value | Genes |
|---|---|---|---|---|
| UCSC_TFBS | ISRE | 27 | 0.005515 | FGFR2, LALBA, PPARG, PRDX5, CACNB4, PNN, CDKN2B, CCL21, SULT1A1, CD2, PLA2G1B, FAU, PAFAH1B1, CD4, CHRNA1, PPP2R2B, IL4, SELP, MAOA, ACACA, F9, PIGR, RGS16, IFNAR1, GPI, CD86, EPOR |
| UCSC_TFBS | AP1FJ | 19 | 0.008982 | FGFR2, FHL3, ACACA, CACNB4, PTGFR, RPL29, MMP20, CD86, CKM, CDKN2B, CD34, CCL21, GPX4, PECAM1, EPOR, PAFAH1B1, CD4, NGB, PPP2R2B |
| UCSC_TFBS | RP58 | 30 | 0.032744 | FGFR2, LALBA, PPARG, FHL3, DCN, CACNB4, MMP20, CDKN2B, GPX4, PLA2G1B, FAU, PAFAH1B1, CHRNA1, PPP2R2B, FGF2, OPTC, SELP, MAOA, ACACA, F9, PIGR, GUCY2C, RGS16, PTGFR, INHBB, GPI, CD34, PECAM1, NPPC, EPOR |
| UCSC_TFBS | CREL | 12 | 0.044246 | IL4, MMP20, CD34, GPX4, PPARG, FHL3, ACACA, EPOR, CD4, CACNB4, PIGR, FGF2 |
Figure 8miRNA-targeted co-expression network
Red and green nodes refer to up and down regulated genes in Bos taurus, white square is miRNAs; dash and solid edge mean negative and positive correlation coefficient, red line with arrow mean miRNAs target genes.
Figure 9Phylogenetic tree of PPP2R2B, PRM1 and PAFAH1B1 in Bos taurus and Sus scrofa
Figure 10Consensus sequences of PAFAH1B1 (A) PRM1 (B) and PPP2R2B (C) in Bos taurus and Sus scrofa. Sequences with black background are consensus sequences.
Domain information list of PPP2R2B, PRM1 and PAFAH1B1
| 79-93 | PR55_1 | Protein phosphatase 2A regulatory subunit PR55 signature 1 |
| 170-184 | PR55_2 | Protein phosphatase 2A regulatory subunit PR55 signature 2 |
| 187-201 | WD_REPEATS_1 | Trp-Asp (WD) repeats signature |
| 2-13 | PROTAMINE_P1 | Protamine P1 signature |
| 7-39 | LISH | LIS1 homology (LisH) motif |
| 104-410 | WD_REPEATS_2 | Trp-Asp (WD) repeats |
Figure 11The visualization result of Domain in PPP2R2B, PRM1 and PAFAH1B1
Figure 12CpG islands in PPP2R2B and PAFAH1B1 in Bos taurus and Sus scrofa
Sequence variations from dbSNP
| rs11547494 | 146,701,106(-) | CACGG(G/T)AGAAT | nc-transcript-variant, reference, missense |
| rs160967 | 146,803,933(+) | gaggc(C/T)gaggc | intron-variant |
| rs160968 | 146,829,614(+) | TGGCC(C/T)ATTTC | intron-variant |
| rs160969 | 146,837,254(-) | atcat(C/G/T)tcatg | intron-variant |
| rs160970 | 146,836,216(-) | AAAAC(A/G)TAACC | intron-variant |
| rs121434482 5 4 | 2,670,209(+) | AGGAC(A/G)TACAG | reference, missense |
| rs121434484 5 4 | 2,670,268(+) | CCTGT(C/T)CTGCA | reference, missense |
| rs121434486 5 4 | 2,665,431(+) | AGTTT(C/T)TAAAA | reference, missense |
Structural variations from Database of Genomic Variants (DGV)
| dgv1656e212 | CNV | Loss | 25503493 |
| dgv356n21 | CNV | Loss | 19592680 |
| esv1159529 | CNV | Insertion | 17803354 |
| esv2053966 | CNV | Deletion | 18987734 |
| esv2422173 | CNV | Deletion | 20811451 |
| esv2659662 | CNV | Deletion | 23128226 |
| esv2660911 | CNV | Deletion | 23128226 |
| esv2664858 | CNV | Deletion | 23128226 |
| esv26695 | CNV | Loss | 19812545 |
| esv2670593 | CNV | Deletion | 23128226 |
| esv2672689 | CNV | Deletion | 23128226 |
| esv2675913 | CNV | Deletion | 23128226 |
| esv2730881 | CNV | Deletion | 23290073 |
| esv2730882 | CNV | Deletion | 23290073 |
| esv2730883 | CNV | Deletion | 23290073 |
| esv2759385 | CNV | Loss | 17122850 |
| esv2763505 | CNV | Loss | 21179565 |
| esv3304864 | CNV | mobile element insertion | 20981092 |
| esv3306710 | CNV | mobile element insertion | 20981092 |
| esv3310476 | CNV | novel sequence insertion | 20981092 |
| esv3324641 | CNV | Insertion | 20981092 |
| esv3394455 | CNV | Insertion | 20981092 |
| esv3427676 | CNV | Duplication | 20981092 |
| esv3429928 | CNV | Insertion | 20981092 |
| esv3444421 | CNV | Insertion | 20981092 |
| esv3570481 | CNV | Loss | 25503493 |
| esv3570482 | CNV | Loss | 25503493 |
| esv3570485 | CNV | Loss | 25503493 |
| esv3607080 | CNV | Loss | 21293372 |
| esv3607081 | CNV | Loss | 21293372 |
| esv3607083 | CNV | Loss | 21293372 |
| esv3607084 | CNV | Loss | 21293372 |
| esv3607085 | CNV | Loss | 21293372 |
| esv3607086 | CNV | Loss | 21293372 |
| esv988317 | CNV | Insertion | 20482838 |
| nsv1117590 | CNV | Deletion | 24896259 |
| nsv1145843 | CNV | Deletion | 26484159 |
| nsv327242 | CNV | Insertion | 16902084 |
| nsv328338 | CNV | Deletion | 16902084 |
| nsv499741 | CNV | Loss | 21111241 |
| nsv5052 | CNV | Insertion | 18451855 |
| nsv5053 | CNV | Deletion | 18451855 |
| nsv514327 | CNV | Loss | 21397061 |
| nsv519847 | CNV | gain+loss | 19592680 |
| nsv528827 | CNV | Loss | 19592680 |
| nsv599938 | CNV | Gain | 21841781 |
| nsv823286 | CNV | Loss | 20364138 |
| nsv969003 | CNV | Duplication | 23825009 |
| nsv1040428 | CNV | Gain | 25217958 |
| nsv571453 | CNV | Loss | 21841781 |
| esv2660206 | CNV | Deletion | 23128226 |
| esv2664869 | CNV | Deletion | 23128226 |
| esv2715502 | CNV | Deletion | 23290073 |
| esv2715503 | CNV | Deletion | 23290073 |
| esv275250 | CNV | gain+loss | 21479260 |
| esv3572343 | CNV | Gain | 25503493 |
| esv3582486 | CNV | Loss | 25503493 |
| esv3639714 | CNV | Loss | 21293372 |
| esv3639715 | CNV | Loss | 21293372 |
| nsv1055844 | CNV | Gain | 25217958 |
| nsv1065489 | CNV | Gain | 25217958 |
| nsv1070794 | CNV | Deletion | 25765185 |
| nsv1123087 | CNV | Deletion | 24896259 |
| nsv516756 | CNV | Gain | 19592680 |
| nsv833339 | CNV | Loss | 17160897 |
| nsv954919 | CNV | Deletion | 24416366 |
Docking results of vital reproductive genes
| MPD3 ID | Ligands | S-Score | RMSD Value | Interacting Residues |
|---|---|---|---|---|
| 2068 | Tannic Acid | -21.31 | 3.61 | Ser76, Leu194, Asn104, His79, Arg141, His106, Gln78, Thr12, Pro13, Gln15, Gln50, Gly73, Val17, Asp20, Arg22, Asp16, Trp23, Met24, Gly21, Leu26 |
| 778 | 5,6,7,4'- Tetrahydroxyflavanone 6,7-diglucoside | -18.41 | 2.10 | Asp213, Cys174, Gly173, Pro175, Glu299, Glu300, Lys170, Lys147, Asp137, Val139 |
| 536 | Oolonghomobisflavan A | -26.58 | 2.42 | Leu46, Leu54, Ile56, Ile84, Val92, Met107, Glu108, Lys91, Pro38, Leu39, Leu87, Asn90, Phe105, Ile71, Gln120, Leu58, Val41, Lys60, Lys69, Tyr42, Leu122 |
Figure 13Interaction analys is of docked proteins; (A) (1, 2, 3) PAFAH1B1, (B) (1, 2, 3) PPP2R2B, (C) (1, 2, 3) PRM1.