| Literature DB >> 30736359 |
Daniil Nikitin1,2,3, Andrew Garazha4,5, Maxim Sorokin6,7, Dmitry Penzar8, Victor Tkachev9, Alexander Markov10, Nurshat Gaifullin11, Pieter Borger12, Alexander Poltorak13, Anton Buzdin14,15,16.
Abstract
BACKGROUND: Retroelements (REs) are transposable elements occupying ~40% of the human genome that can regulate genes by providing transcription factor binding sites (TFBS). RE-linked TFBS profile can serve as a marker of gene transcriptional regulation evolution. This approach allows for interrogating the regulatory evolution of organisms with RE-rich genomes. We aimed to characterize the evolution of transcriptional regulation for human genes and molecular pathways using RE-linked TFBS accumulation as a metric.Entities:
Keywords: ChIP-seq; Human genome evolution; gene ontology; molecular pathways; omics approach in genetics; retrotransposons; transcription factor
Mesh:
Substances:
Year: 2019 PMID: 30736359 PMCID: PMC6406739 DOI: 10.3390/cells8020130
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1RE insertion in the proximity of transcription start sites can bring new TFBS and drastically alter gene expression.
Figure 2Comparison of NGRE and NPII scores between different cell lines for all REs. Colors denote human organs of cell lines origin. (A) Anatomical map of cell line origins investigated in this study. (B) Distribution of Pearson correlation coefficients of NGRE score of K562 cell line with NGRE scores of the 12 other cell lines investigated. (C) Distribution of Pearson correlation coefficients of NPII score of K562 cell line with NGRE scores of the 12 other cell lines. (D) Distribution of all pairwise Pearson correlation coefficients of NGRE scores of all 13 cell lines. (E) Distribution of all pairwise Pearson correlation coefficients of NPII scores of all 13 cell lines.
Overall TFBS statistics. Note the significant proportion of round numbers for mapped TFBS because of IDR thresholds used in the standard ENCODE peak called pipeline [46].
| Cell Line | Number of TFs Profiled | Number of Mapped TFBS | Number of TFBS Mapped on SINEs | Percentage of TFBS Mapped on SINEs | Number of TFBS Mapped on LINEs | Percentage of TFBS Mapped on LINEs | Number of TFBS Mapped on LR/ERVs | Percentage of TFBS Mapped on LR/ERVs |
|---|---|---|---|---|---|---|---|---|
|
| 265 | 78,021,500 | 25,078,428 | 32.1 | 22,646,141 | 29 | 10,394,662 | 13.3 |
|
| 175 | 51,982,065 | 16,406,062 | 31.6 | 140,104,77 | 27 | 6,493,562 | 12.5 |
|
| 177 | 53,100,000 | 19,214,015 | 36.2 | 15,708,687 | 29.6 | 6,089,813 | 11.5 |
|
| 127 | 37,688,353 | 10,185,415 | 27 | 9,927,943 | 26.3 | 4,727,719 | 12.5 |
|
| 80 | 23,851,396 | 7,039,271 | 29.5 | 7,499,703 | 31.4 | 3,297,443 | 13.8 |
|
| 44 | 13,044,409 | 3,930,407 | 30.1 | 3,569,214 | 27.4 | 1,667,464 | 12.8 |
|
| 15 | 4,500,000 | 1,112,669 | 24.7 | 1,036,667 | 23 | 566,618 | 12.6 |
|
| 15 | 4,500,000 | 874,572 | 19.4 | 958,401 | 21.3 | 473,342 | 10.5 |
|
| 4 | 1,200,000 | 402,872 | 33.6 | 333,771 | 27.8 | 181,562 | 15.1 |
|
| 4 | 1,200,000 | 268,209 | 22.4 | 266,191 | 22.2 | 141,530 | 11.8 |
|
| 17 | 5,100,000 | 129,7870 | 25.4 | 1,725,589 | 33.8 | 618,745 | 12.1 |
|
| 3 | 900,000 | 214,458 | 23.8 | 215,576 | 24 | 103,925 | 11.5 |
|
| 7 | 2,100,000 | 559,260 | 26.6 | 433,194 | 20.6 | 253,577 | 12.1 |
Gene neighborhood-linked TFBS statistics.
| Cell Line | Number of TFs Profiled | Number of Mapped TFBS | Number of TFBS Mapped on SINEs | Percentage of TFBS Mapped on SINEs | Number of TFBS Mapped on LINEs | Percentage of TFBS Mapped on LINEs | Number of TFBS Mapped on LR/ERVs | Percentage of TFBS Mapped on LR/ERVs |
|---|---|---|---|---|---|---|---|---|
|
| 260 | 12,547,055 | 4,667,810 | 37.2 | 2,508,956 | 20 | 1,081,084 | 8.6 |
|
| 175 | 8,803,748 | 3,023,414 | 34.3 | 1,559,929 | 17.7 | 669,907 | 7.6 |
|
| 177 | 8,074,128 | 3,235,573 | 40.1 | 1,569,002 | 19.4 | 587,798 | 7.3 |
|
| 127 | 5,626,411 | 1,699,882 | 30.2 | 970,626 | 17.3 | 400,502 | 7.1 |
|
| 80 | 3,067,441 | 1,029,118 | 33.5 | 598,982 | 19.5 | 238,005 | 7.8 |
|
| 44 | 2,035,359 | 662,458 | 32.5 | 358,084 | 17.6 | 142,842 | 7 |
|
| 15 | 720,963 | 193,365 | 26.8 | 109,683 | 15.2 | 47,759 | 6.6 |
|
| 15 | 679,254 | 139,230 | 20.5 | 90,313 | 13.3 | 36,751 | 5.4 |
|
| 4 | 202,223 | 71,590 | 35.4 | 38,293 | 18.9 | 16,878 | 8.3 |
|
| 4 | 194,053 | 52,211 | 26.9 | 28,867 | 14.9 | 12,768 | 6.6 |
|
| 17 | 518,287 | 162,440 | 31.3 | 115,555 | 22.3 | 42,242 | 8.2 |
|
| 3 | 145,892 | 43,930 | 30.1 | 25,500 | 17.5 | 10,304 | 7.1 |
|
| 7 | 427,212 | 94,464 | 22.1 | 48,947 | 11.5 | 21,515 | 5 |
Figure 3Comparison of GRE and NGRE scores across cell lines for all REs. (A) Comparison of mean GRE and mean NGRE scores. Each dot represents a single gene; GRE and NGRE scores were averaged across all cell lines. Genes enriched in RRE (regulation by retroelements) are shown as red dots. Genes deficient in RRE are shown as green dots. (B) Comparison of mean GRE and mean NGRE scores. Both scores were averaged across cell lines. Color depth is congruent with the number of single dots (each dot represents a single gene) in one grain. Univariate distributions of GRE and NGRE are shown in plot margins. (C) Comparison of GRE and NGRE scores for individual cell lines.
Figure 4Comparison of PII and NPII scores across cell lines for all REs. (A) Comparison of mean PII and mean NPII scores. Each dot represents a single pathway; PII and NPII scores were averaged across all cell lines. Pathways enriched in RRE (regulation by retroelements) are shown as red dots. Pathways deficient in RRE are shown as green dots. (B) Comparison of mean PII and mean NPII scores. Both scores were averaged across cell lines. Color depth is proportional to the number of single dots (each dot represents a single pathway) in one grain. Univariate distributions of PII and NPII are shown in plot margins. (C) Comparison of PII and NPII scores for individual cell lines.
RRE-enriched and deficient intracellular processes according to Gene Ontology (GO) and molecular pathway analysis (all REs).
| ID | Group of Processes | RRE Enrichment by Pathway Analysis | RRE Enrichment by GO Analysis | Overall Status | ||
|---|---|---|---|---|---|---|
| Enriched pws | Deficient pws | Enriched GO terms | Deficient GO-terms | |||
| 1 | Posttranscriptional silencing by small RNAs | 1 | 0 | 1 | 0 | RRE enriched |
| 2 | DNA repair | 2 | 0 | 5 | 0 | RRE enriched |
| 3 | Amino acids, Peptides and Polyamines Metabolism | 20 | 5 | 13 | 8 | RRE enriched |
| 4 | Lipid Metabolism | 14 | 7 | 11 | 0 | RRE enriched |
| 5 | Detoxication, Metabolism of Xenobiotics and Rare Molecules | 13 | 0 | 4 | 0 | RRE enriched |
| 6 | Sensory Perception and Neurotransmission | 7 | 0 | 10 | 0 | RRE enriched |
| 7 | Fertilization | 1 | 0 | 9 | 0 | RRE enriched |
| 8 | Cellular Immune Response (T cells and NK cells) | 11 | 0 | 7 | 6 | RRE enriched |
| 9 | Nucleic Base, Nucleosides and Nucleotides Metabolism | 6 | 9 | 0 | 24 | RRE deficient |
| 10 | DNA metabolism and Chromatin structure | 0 | 4 | 0 | 151 | RRE deficient |
| 11 | Translation and Protein Quality Control | 0 | 12 | 8 | 130 | RRE deficient |
| 12 | Intracellular Signaling | 22 | 94 | 5 | 48 | RRE deficient |
| 13 | Response to Viruses | 0 | 3 | 0 | 17 | RRE deficient |
| 14 | Vitamin Metabolism | 4 | 0 | 0 | 0 | RRE enriched |
| 15 | Hormones | 6 | 0 | 0 | 0 | RRE enriched |
| 16 | Molecular Transport | 10 | 0 | 0 | 0 | RRE enriched |
| 17 | Sulfur Metabolism and Linked Redox Reactions | 5 | 0 | 0 | 0 | RRE enriched |
| 18 | Metal Metabolism | 0 | 0 | 6 | 0 | RRE enriched |
| 19 | Response to Phorbol Acetate | 0 | 0 | 0 | 3 | RRE deficient |
| 20 | Electron Transfer Reactions | 0 | 0 | 5 | 17 | RRE deficient |
| 21 | Mitochondria | 0 | 0 | 5 | 17 | RRE deficient |
| 22 | RNA Synthesis and Degradation | 0 | 0 | 0 | 139 | RRE deficient |
| 23 | Cell Adhesion and Interaction | 0 | 0 | 0 | 15 | RRE deficient |
| 24 | Cell Cycle and Mitosis | 0 | 0 | 0 | 55 | RRE deficient |
| 25 | Cell Death | 0 | 0 | 0 | 41 | RRE deficient |
| 26 | Protein Localization and Modification | 0 | 0 | 0 | 19 | RRE deficient |
| 27 | Response to Physical and Chemical Stress | 0 | 0 | 0 | 24 | RRE deficient |
| 28 | Carbohydrates Metabolism | 5 | 3 | 0 | 9 | Ambiguous Pattern |
| 29 | Immunity | 36 | 16 | 23 | 45 | Shown separately |
| 30 | Other/Too General Terms | 0 | 0 | 13 | 17 | N/A |
RRE-enriched and deficient immunity-linked processes according to Gene Ontology (GO) and molecular pathway analysis (all REs).
| Group of Processes | RRE Enrichment by Pathway Analysis | RRE Enrichment by GO Analysis | Overall Status | ||
|---|---|---|---|---|---|
| Enriched pws | Deficient pws | Enriched GO terms | Deficient GO-terms | ||
| Autoimmunity | 4 | 0 | 0 | 0 | RRE enriched |
| Blood Clotting | 2 | 0 | 0 | 0 | RRE enriched |
| Innate Immunity | 8 | 0 | 0 | 5 | Ambiguous |
| Inflammation | 3 | 5 | 0 | 0 | Ambiguous |
| Cellular Immune Response (T cells and NK cells) | 11 | 0 | 7 | 6 | RRE enriched |
| Activation of Antigen-Presenting Cells by T-helper cells | 2 | 7 | 0 | 0 | RRE deficient |
| Other/Too General Terms | 6 | 1 | 8 | 11 | Ambiguous |
| Immune Cells Migration and Activation | 0 | 0 | 7 | 0 | RRE enriched |
| Activity and maturation of B cells | 0 | 0 | 0 | 6 | RRE deficient |
Figure 5Hierarchically ordered GO annotated terms detected by Gorilla software for RRE-enriched (A) and RRE-deficient (B) genes for all REs.
RRE-enriched and deficient microRNA and lncRNA genes (all REs).
|
|
|
|
|
|
| RRE-enriched | 177 | 1219 | 2.416 × 10−18 | miRNA are enriched |
| RRE-deficient | 72 | 1219 | 0.0138 | miRNA are not enriched |
| Totally—1865 miRNA genes in 25,075 genes of the human genome | ||||
|
|
|
|
|
|
| RRE-enriched | 150 | 1219 | 1.9500× 10−17 | lncRNA are enriched |
| RRE-deficient | 18 | 1219 | 2.42× 10−16 | lncRNA are not enriched |
| Totally—1505 lncRNA genes in 25,075 genes of the human genome | ||||
Figure 6Top five RRE-enriched and deficient pathways sorted by NPII for all REs.
Figure 7RRE-enriched and RRE-deficient molecular processes for all REs.
Figure 8Random control of GO processes enrichment in RRE-enriched (A) and RRE-deficient (B) genes for all REs.
RRE-enriched and deficient intracellular processes according to Gene Ontology (GO) and molecular pathway analysis (evolutionary young REs).
| ID | Group of Processes | RRE Enrichment by Pathway Analysis | RRE Enrichment by GO Analysis | Overall Status | ||
|---|---|---|---|---|---|---|
| Enriched pws | Deficient pws | Enriched GO terms | Deficient GO-Terms | |||
| 1 | Lipids metabolism | 22 | 8 | 2 | 0 | RRE enriched |
| 2 | Signaling | 28 | 40 | 8 | 31 | RRE deficient |
| 3 | Immune System | 18 | 14 | 3 | 12 | Shown Separately |
| 4 | Cell cycle | 1 | 6 | 0 | 62 | RRE deficient |
| 5 | Cell death | 7 | 6 | 0 | 35 | Ambiguous Pattern |
| 6 | Amino acids and polyamines metabolism | 13 | 5 | 0 | 0 | RRE enriched |
| 7 | Metabolism and detoxication of xenobiotics | 8 | 0 | 2 | 0 | RRE enriched |
| 8 | Sulfur-linked reactions | 6 | 0 | 0 | 0 | RRE enriched |
| 9 | Vitamins metabolism | 10 | 0 | 0 | 0 | RRE enriched |
| 10 | Carbohydrates and related molecules metabolism | 9 | 5 | 0 | 6 | Ambiguous Pattern |
| 11 | Nucleic base, nucleotides and nucleosides metabolism | 5 | 3 | 0 | 14 | Ambiguous Pattern |
| 12 | Transport of small molecules | 4 | 0 | 0 | 0 | RRE enriched |
| 13 | Blood Clotting | 3 | 0 | 0 | 0 | RRE enriched |
| 14 | Cytosketeton, cell adhesion and migration | 0 | 18 | 0 | 16 | RRE deficient |
| 15 | Endocytosis | 0 | 4 | 0 | 0 | RRE deficient |
| 16 | Translation and protein quality control | 0 | 23 | 0 | 105 | RRE deficient |
| 17 | Viruses | 0 | 7 | 0 | 18 | RRE deficient |
| 18 | Signal perception and neurotransmission | 0 | 0 | 22 | 0 | RRE enriched |
| 19 | RNA Synthesis and Degradation | 0 | 0 | 0 | 80 | RRE deficient |
| 20 | DNA metabolism and chromatin | 0 | 0 | 0 | 66 | RRE deficient |
| 21 | Protein Localization and Modification | 0 | 0 | 0 | 20 | RRE deficient |
| 22 | Response to Physical and Chemical Stress | 0 | 0 | 0 | 15 | RRE deficient |
| 23 | Oxidative Phosphorylation in Mitochondria | 0 | 0 | 0 | 19 | RRE deficient |
| 24 | Other/Too General Terms | 18 | 12 | 18 | 231 | N/A |
RRE-enriched and deficient immunity-linked processes according to Gene Ontology (GO) and molecular pathway analysis (evolutionary young REs).
| Group of Processes | RRE enrichment by pathway analysis | RRE enrichment by GO analysis | Overall status | ||
|---|---|---|---|---|---|
| Enriched pws | Deficient pws | Enriched GO terms | Deficient GO-terms | ||
| Innate immunity | 5 | 2 | 0 | 9 | Ambiguous Pattern |
| Inflammation | 3 | 3 | 0 | 0 | Ambiguous Pattern |
| T-cells mediated immunity | 4 | 3 | 0 | 0 | RRE enriched |
| Other/Too General Terms | 5 | 0 | 3 | 3 | |
Figure 9Hierarchically ordered GO annotated terms detected by Gorilla software for RRE-deficient genes for evolutionary young REs.
Figure 10RRE-enriched and RRE-deficient molecular processes for evolutionary young Res.