| Literature DB >> 23267051 |
Masatoshi Matsunami1, Naruya Saitou.
Abstract
Vertebrate genomes include gene regulatory elements in protein-noncoding regions. A part of gene regulatory elements are expected to be conserved according to their functional importance, so that evolutionarily conserved noncoding sequences (CNSs) might be good candidates for those elements. In addition, paralogous CNSs, which are highly conserved among both orthologous loci and paralogous loci, have the possibility of controlling overlapping expression patterns of their adjacent paralogous protein-coding genes. The two-round whole-genome duplications (2R WGDs), which most probably occurred in the vertebrate common ancestors, generated large numbers of paralogous protein-coding genes and their regulatory elements. These events could contribute to the emergence of vertebrate features. However, the evolutionary history and influences of the 2R WGDs are still unclear, especially in noncoding regions. To address this issue, we identified paralogous CNSs. Region-focused Basic Local Alignment Search Tool (BLAST) search of each synteny block revealed 7,924 orthologous CNSs and 309 paralogous CNSs conserved among eight high-quality vertebrate genomes. Paralogous CNSs we found contained 115 previously reported ones and newly detected 194 ones. Through comparisons with VISTA Enhancer Browser and available ChIP-seq data, one-third (103) of paralogous CNSs detected in this study showed gene regulatory activity in the brain at several developmental stages. Their genomic locations are highly enriched near the transcription factor-coding regions, which are expressed in brain and neural systems. These results suggest that paralogous CNSs are conserved mainly because of maintaining gene expression in the vertebrate brain.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23267051 PMCID: PMC3595034 DOI: 10.1093/gbe/evs128
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Number of Paralogous CNS Harboring Genes: Gene and CNS Loss Pattern of Duplicated Regions
| Conservation Level | No. of Paralogous Gene Group (No. of Genes) | No. of Paralogous CNS Group (No. of CNSs) |
|---|---|---|
| 4 | 50 (50 × 4 = 200) | 0 |
| 3 | 220 (220 × 3 = 660) | 3 (3 × 3 = 9) |
| 2 | 861 (861 × 2 = 1,722) | 150 (150 × 2 = 300) |
| 1 | 8,036 (8,036 × 1 = 8,036) | 7,341 (7,341 × 1 = 7,341) |
| Total | 9,167 (10,618) | 7,494 (7,650) |
Number of Paralogous CNS Harboring Genes
| No. of Paralogous CNSs | No. of CNS Harboring Genes |
|---|---|
| Quartets of paralogous genes | |
| 4 | 0 |
| 3 | 2 |
| 2 | 13 |
| 0 | 36 |
| Total | 50 |
| Trios of paralogous genes | |
| 3 | 1 |
| 2 | 29 |
| 0 | 190 |
| Total | 220 |
| Pairs of paralogous genes | |
| 2 | 31 |
| 0 | 830 |
| Total | 861 |
FParalogous synteny blocks within the human genome. Genomic distribution of paralogous synteny blocks is shown. (A) Di-, (B) tri-, and (C) tetraparalogous blocks are identified by the gene order and homology.
FScheme of Hox-linked paralogous block. Paralogous gene conservations and paralogous CNSs conservations are shown. Hox-linked paralogous synteny blocks also show prominent conservation of not only coding regions but also noncoding regions. Tetraparalogous, triparalogous, and diparalogous genes are represented as red, blue, and green thin lines, respectively, and diparalogous CNSs represent green thick lines. We could not identify tetraparalogous or triparalogous CNSs in these regions.
FParalogous CNSs shared between POU3F2 and POU3F3 genes. Genomic locations of each orthologous CNS in the human genome and the alignment of paralogous CNS are shown. This paralogous CNS pair is located nearby POU3 paralogs, POU3F2 (BRN2), and POU3F3 (BRN1), that is derived from the 2R WGDs. These are strong candidates of gene regulatory sequences of these paralogs.
List of Paralogous CNSs Harboring Genes: Pair of Paralogous CNSs
| Number of Pairs | Pair of Harboring Gene |
|---|---|
| 6 | FOXP1&FOXP2, ZNF503&ZNF703 |
| 5 | IRX1&IRX3 |
| 4 | PBX1&PBX3, SALL1&SALL3 |
| 3 | EBF1&EBF3, EVX1&EVX2, NR2F1&NR2F2, POU4F1&POU4F2, SOX5&SOX6 |
| 2 | ESRP1&ESRP2, FOXB1&FOXB2, HOXA5&HOXB5, LMO1&LMO3, LRBA&NBEA, LRP3&LRP12, NEUROD1&NEUROD2, NRXN1&NRXN3, OTX1&OTX2, POU3F1&POU3F2, POU3F2&POU3F3, PRDM16&MECOM, SLIT2&SLIT3, SOX14&SOX21, TCF4&TCF12, TFAP2A&TFAP2B, TOX&TOX3, TSHZ1&TSHZ2, VRK1&VRK2 |
| 1 | ACTL6A&ACTL6B, ARL5A&ARL5C, ARSB&ARSI, ARSJ&ARSB, BACE1&BACE2, BMP3&GDF10, CCNL1&CCNL2, CPA1&CPA2, CUX1&CUX2, DNM1&DNM3, ENPP2&ENPP3, FOXO1&FOXO3, FOXP2&FOXP4, GNB2&GNB4, GPC2&GPC6, GPC3&GPC5, GPM6A&PLP1, GRIA2&GRIA3, HMGB1&HMGB3, HOXA4&HOXD4, HSF2&HSF4, ING1&ING2, INPP5D&SH2D1A, IRX2&IRX5, KANK1&KANK4, KCNK9&KCNK15, KHDRBS2&KHDRBS3, LASS3&LASS6, MACF1&DST, MBNL1&MBNL2, MCTP1&MCTP2, MEF2A&MEF2C, MEIS1&MEIS2, NFIA&NFIB, NPNT&EGFL6, ODZ2&ODZ3, P4HA1&P4HA2, PDE4B&PDE4C, PIK3C2A&PIK3C2B, PLS1&PLS3, PTCH1&PTCH2, QSOX1&QSOX2, R3HCC1&c10orf28, RALA&RALB, RHAG&RHCG, RNF38&RNF44, SALL1&SALL4, SEC24C&SEC24D, SEPT6&SEPT10, SGMS1&SGMS2, SH3RF3&SORBS2, SHH&IHH, SLC12A1&SLC12A3, SLC12A2&SLC12A3, SLC4A4&SLC4A10, SLC6A15&SLC6A18, SLC9A2&SLC9A3, SLIT1&SLIT2, SMAD2&SMAD3, SOX1&SOX2, SOX2&SOX3, ST8SIA3&ST8SIA4, SULF1&GNS, TFAP2A&TFAP2C, ZEB2&KIAA0087, ZFHX3&ZFHX4, ZIC2&ZIC3, ZNF423&ZNF521 |
List of Paralogous CNSs Harboring Genes: Trio of Paralogous CNSs
| Number of Trio | Trio of Harboring Gene |
|---|---|
| 1 | MEF2A&MEF2C&MEF2D, NFIA&NFIB&NFIX, and GRIA1&GRIA2&GRIA4 |
FThe locations of paralogous CNS. The triparalogous CNSs were identified only near (A) GRIA gene family, (B) NFI gene family, and (C) MEF2 gene family. The paralogous CNS-harboring genes having more than three pair-paralogous CNS pairs are only five gene families. These are (D) FOXP1/P2, (E) ZNF503/703, (F) IRX1/3, (G) SALL1/3, and (H) PBX1/3. Protein-coding regions of paralogous CNS-harboring genes are represented by black boxes. The orange ellipse is paralogous CNSs. The connected lines show paralogous conservation of each CNS.
Overrepresented Gene Functions of Host Genes
| GO Term | |
|---|---|
| Sequence-specific DNA binding (GO:0043565) | 3.39E–15 |
| Ionotropic glutamate receptor activity (GO:0004970) | 7.69E–05 |
| Phosphoinositide binding (GO:0035091) | 6.05E–05 |
| Lipid kinase activity (GO:0001727) | 5.33E–04 |
| 1-Phosphatidylinositol-3-kinase activity (GO:0016303) | 8.92E–06 |
| Follicle-stimulating hormone receptor activity (GO:0004963) | 1.77E–04 |
| Low-density lipoprotein receptor activity (GO:0005041) | 3.41E–04 |
Note.—Adjusted P values are calculated by comparing the distribution of the host genes with that of human genes.
Proportion of Enhancer Activities
| Expression | Paralogous CNSs | All Sequences in Database |
|---|---|---|
| No expression | 22 (26.51%) | 815 (50.34%) |
| At brain region | 42 (50.60%) | 416 (25.69%) |
| At other region | 19 (22.89%) | 388 (23.97%) |
| Total | 83 | 1,619 |
FRelationship between numbers of paralogs and paralogous CNSs derived from the 2R WGDs. Paralogs and paralogous CNSs within the synteny blocks were counted and were plot to a scatter plot. The horizontal axis is the number of conserved paralogous genes. Vertical axis is the number of conserved paralogs CNSs. The red line is an approximate linear regression of tetraparalogous block points. There is a clear positive correlation.