| Literature DB >> 30537933 |
Fabian Tobar-Tosse1, Patricia E Veléz2, Eliana Ocampo-Toro3, Pedro A Moreno4.
Abstract
BACKGROUND: Repetitive DNA sequences (Repeats) are significant regions in the human genome that have a specific genomic distribution, structure, and several binding sites for genome architecture and function. In consequence, the possible configurations of Repeats in specific and dynamic regions like the gene promoters could define footprints for molecular mechanisms, pathways, and cell function beyond their density in the genome. Here we explored the distribution of Repeats in the upstream promoter region of the human coding genes with the aim to identify specific configurations, clusters and functional meaning of those elements. Our method includes structural descriptions, hierarchical clustering, pathway association, and functional enrichment analysis.Entities:
Keywords: Cell signaling pathway; Functional enrichment; Human coding genes; Repetitive DNA sequences; Upstream promoter region
Mesh:
Year: 2018 PMID: 30537933 PMCID: PMC6288848 DOI: 10.1186/s12864-018-5196-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Structural features of Repeats in the upstream promoter region of the human coding genes. a. Absolute frequency of genes according to the number of Repeats in the UPR. b. Absolute frequency of gene lengths according to the number of Repeats in the UPR c. Comparison of patterns with two Repeats at the UPR by using the number of genes at the original and inverted distribution. d. Absolute frequency of genes by types of Repeat. e. Cumulative frequency of Repeats respect the transcription start sites, and in the range of exploration of 1000 bp, orange, blue, purple and green distant lines represent Simple_Repeats and Low_complesity repeats. f. Mean of Repeat length by types
Fig. 2Dendrogram of Repeats based on Hierarchical Clustering Analysis. Colors represent the high (red) or less (Blue) Frequency of Repeats in the UPR. Asterisks represent the exceptions in the tendency of frequent Repeats in subclusters with a high frequency of configuration
Configurations of Repeats in the upstream promoter region of the human coding genes. This table includes 18 of 2729 configurations analyzed in the study
| Repeat Configurations | # Genes | # Repeats |
|---|---|---|
| TSS<>Alu | 1240 | 1 |
| TSS<>Low_complexity | 1062 | 1 |
| TSS<>MIR | 1012 | 1 |
| TSS<>Simple_repeat | 722 | 1 |
| TSS<>LINE/L2 | 553 | 1 |
| TSS<>Alu<>Alu | 439 | 2 |
| TSS<>LINE/L1 | 359 | 1 |
| TSS<>Low_complexity<>Low_complexity | 309 | 2 |
| TSS<>hAT-Charlie | 236 | 1 |
| TSS<>MIR<>MIR | 222 | 2 |
| TSS<>MIR<>Alu | 218 | 2 |
| TSS<>Low_complexity<>Alu | 190 | 2 |
| TSS<>ERV1 | 179 | 1 |
| TSS<>Low_complexity<>MIR | 160 | 2 |
| TSS<>Low_complexity<>Simple_repeat | 160 | 2 |
| TSS<>Simple_repeat<>Low_complexity | 149 | 2 |
| TSS<>LINE/L2<>Alu | 133 | 2 |
| TSS<>Alu<>Alu<>Alu | 126 | 3 |
Fig. 3Hierarchical clustering of Repeats based on their frequency in the UPR and a gene value of association to cell signaling pathways. a. Hierarchical Clustering for six general categories of pathways, whose minimum frequency of configuration was higher than 0.7 for the definition of subcluster G (General). b. Hierarchical Clustering for 33 subcategories of pathways, the same threshold (grey line) was applied for the description of subclusters S (Specific)
Clusters of Repeats with a high frequency of configuration with their specific pathway
| Subcluster | Repeats | General Pathway | Specific Pathways |
|---|---|---|---|
| S1 | LINE/L2, Alu, LINE/L1, hAT-Charlie | Genetic Information Processing (GIP) | GIP_Replication and repair |
| GIP_Transcription | |||
| GIP_Translation | |||
| S3 | MIR, Simple_repeat | Environmental Information Processing (EIP) | EIP_Signal transduction |
| OS_Aging | |||
| OS_Environmental adaptation | |||
| OS_Excretory system | |||
| CP_Cellular community | |||
| S10 | ERVL-MaLR, ERV1 | Metabolism (M) | M_Carbohydrate metabolism |
| M_Energy metabolism | |||
| M_Lipid metabolism | |||
| M_Xenobiotics biodegradation and metabolism |
Fig. 4Functional enrichment analysis based on gene ontologies. a. Hierarchical clustering of GO-Cellular-Component; b. Significant compartment by Repeat, where the value in parenthesis represents Log10 (p-value). c. Clustering of GO-Biological-Process. The p-values defined the hierarchy and dendrograms the co-associations among categories
Genes with a significant number of Repeats at the upstream promoter region
| Name | Gene ID | Function | # Rep | Repeats |
|---|---|---|---|---|
| S1PR4 | GeneID:8698 | Sphingosine-1-phosphate receptor 4 | 15 | LINE/L2(1),MIR(14) |
| FBXL12 | GeneID:54850 | F-box and leucine rich repeat protein 12 | 9 | Simple_repeat(2),Alu(2),LINE/L1(1),Alu(3),LINE/L1,(1) |
| PSMF1 | GeneID:9491 | Proteasome inhibitor subunit 1 | 9 | LINE/L2(1),hAT-Tip100(1),Low_complexity(1),Alu(1),Low_complexity(1),LINE/L1(3),Alu(1) |
| IFITM3 | GeneID:10410 | Interferon Induced Transmembrane Protein 3 | 8 | Alu(1),ERV1(7) |
| ZFP69 | GeneID:339559 | ZFP69 Zinc Finger Protein | 8 | Simple_repeat(1),LINE/L2(1),Low_complexity(1),Simple_repeat(2),Alu(2),LINE/L1(1) |
| SRY-Box 8 | GeneID:30812 | SRY (Sex Determining Region Y)-Box 8 | 8 | Simple_repeat(1),Low_complexity(3),Simple_repeat(3),Low_complexity(1) |
| CFAP73 | GeneID:387885 | Cilia And Flagella Associated Protein 73 | 8 | Alu(1),MIR(1),LINE/L1(1),Low_complexity(2),Alu(1),Simple_repeat(1),Alu(1) |
| PFN3 | GeneID:345456 | Profilin 3 | 8 | Alu(1),Low_complexity(1),Alu(1),Low_complexity(1),hAT-Charlie(1),Low_complexity(1),Simple_repeat(1),Alu(1) |
| COMMD5 | GeneID:28991 | Hypertension-Related Calcium-Regulated Gene Protein | 8 | ERV1(1),LINE/L1(2),Alu(1),ERVL-MaLR(1),Alu(1),hAT-Charlie(1),Alu(1) |
| TOMM5 | GeneID:401505 | Translocase Of Outer Mitochondrial Membrane 5 | 8 | Alu(1),Simple_repeat(1),MIR(1),Low_complexity(3),Alu(1),LINE/L1(1) |
(n) = number of copies