| Literature DB >> 29149249 |
Anastasia Levchenko1, Alexander Kanapin1,2, Anastasia Samsonova1,2, Raul R Gainetdinov1,3.
Abstract
The review discusses, in a format of a timeline, the studies of different types of genetic variants, present in Homo sapiens, but absent in all other primate, mammalian, or vertebrate species, tested so far. The main characteristic of these variants is that they are found in regions of high evolutionary conservation. These sequence variations include single nucleotide substitutions (called human accelerated regions), deletions, and segmental duplications. The rationale for finding such variations in the human genome is that they could be responsible for traits, specific to our species, of which the human brain is the most remarkable. As became obvious, the vast majority of human-specific single nucleotide substitutions are found in noncoding, likely regulatory regions. A number of genes, associated with these human-specific alleles, often through novel enhancer activity, were in fact shown to be implicated in human-specific development of certain brain areas, including the prefrontal cortex. Human-specific deletions may remove regulatory sequences, such as enhancers. Segmental duplications, because of their large size, create new coding sequences, like new functional paralogs. Further functional study of these variants will shed light on evolution of our species, as well as on the etiology of neurodevelopmental disorders.Entities:
Keywords: deletions; duplications; genes; neurodevelopmental disorders; psychiatry; substitutions
Mesh:
Year: 2018 PMID: 29149249 PMCID: PMC5767953 DOI: 10.1093/gbe/evx240
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Examples of sequence variants described in the present review. (a–c) Actual examples of a HAR, several duplications, and a deletion. Screen shots from the UCSC Genome Browser are used. The species, genome assembly, chromosomal coordinates, and size of the region in bp are shown. (a) HAR1 is depicted. The coordinates of the region in the human NCBI genome assembly 35 are as defined in Pollard et al. (2006a). The twelve human-specific substitutions are denoted with asterisks. Dots within the alignment denote bases identical in human and other species. (b) The paralog SRGAP2B and the partially duplicated and inverted pseudogene SRGAP2D are pointed by arrows. Neighboring genes FAM72 are depicted; this picture however contradicts results in Dennis et al. (2017): SRGAP2B duplicated within the same genomic fragment as FAM72B and SRGAP2D duplicated within the same genomic fragment as FAM72D. This contradiction illustrates the incompleteness of the current human genome assembly. The depicted gene FAM72C is actually located on chromosome 1p11.2. The gene NBPF15 with primate-specific DUF120 domain amplification, denoted with a star, is also shown in this genomic fragment. The multispecies alignment depicts factual absence of this gene in nonprimate animals. The picture is similar for the 100 vertebrates’ alignment. (c) hCONDEL.122 as defined in McLean et al. (2011). Build 2 Version 1 (Oct 2005) of the chimpanzee genome assembly, produced by the Chimpanzee Sequencing and Analysis Consortium, is shown. The corresponding human sequence contains a gap (the deletion). Other species are shown for comparison. Single lines signify gaps (sequence absent) and double lines signify more complex cases that are more difficult to resolve. In cases where multiple chains of the same species align over a region of the chimpanzee genome, the chains with single-lined gaps are often due to retrotransposed pseudogenes, whereas chains with double-lined gaps are more often due to paralogs and duplicated pseudogenes. (d and e) Generalized examples of functional variants. The examples do not represent particular regulatory sequences or coding genes, but illustrate principles upon which these sequences function, according to studies, described in the present review. The schematics are not to scale. (d) A HAR can act as an enhancer. In this case, transcriptional activity and/or expression pattern in tissues are altered. (e) Segmental duplications may result in paralogs with new functions. The paralogs are found in different chromosomal regions. The new paralogs are also often inverted relative to the parental sequence. The fact that the duplications are partial explains differences between the ancestral gene and the paralogs. Additionally, new sequence variants, denoted with a star, may be introduced in paralogs and will result in further alterations of the protein sequence. An example is the splicing variant in the gene ARHGAP11B. Although pseudogenes are often regarded as “dead genes,” there is evidence that these noncoding gens may generate functional RNAs that regulate gene expression (Pink et al. 2011).
Summary of Genomics Regions, Affected by Human-Specific Genetic Variations
| Genomic Region Category | Evidence of Conservation | Number of Regions | Associated Genes with Some Functional Evidence | References |
|---|---|---|---|---|
| HARs (Pollard) | vertebrates | 202 | ( | |
| HACNSs (Prabhakar) | vertebrates | 992 | ( | |
| ANCs (Bird) | vertebrates | 1,356 | ( | |
| HARs (Bush) | mammals | 63 | unknown | ( |
| 2xHARs (Lindblad-Toh) | mammals | 563 | ( | |
| haDHSs (Gittelman) | primates | 524 | unknown | ( |
| HSDs | apes | 218 | ( | |
| hCONDELs | vertebrates | 510 | ( |
Note.—Associated genes with an experimentally confirmed role in brain development are also indicated.
The shown genes are limited to the scope of this review.
To distinguish the different data sets of HARs, the name of the first author of original publication is indicated in parentheses.
. 2.—The timeline of studies describing human-specific sequence variations, mentioned in the present review.
. 3.—Intersection statistics for the six HAR data sets. (a) A heatmap showing pairwise comparison between data sets. (b) Intersection between data sets. Each data set is represented by a black filled circle. A vertical black line connects the circles to emphasize intersections between corresponding data sets. The number of intersections between HARs is shown as a bar chart. The arrow indicates intersection between the six data sets. Bush HARs from Bush and Lahn (2008), Pollard HARs from Pollard et al. (2006a), Gittelman haDHSs from Gittelman et al. (2015), Lindblad-Toh 2xHARs from Lindblad-Toh et al. (2011), Prabhakar HACNSs from Prabhakar et al. (2006), and Bird ANCs from Bird et al. (2007).
Comparison of HARs Methods
| HAR Data Set | Only Noncoding Regions Considered? | Human Genome Used to Define Conserved Regions? | Evidence of Conservation | Tools Used for Alignments and Estimation of Conservation | Statistical Method Used to Estimate Acceleration | Percentage of HARs Explained by Positive Selection | Additional Functional Evidence |
|---|---|---|---|---|---|---|---|
| HARs (Pollard) | No | Yes | 17 vertebrates | MultiZ, PhastCons | LRT | 76% | no |
| HACNSs (Prabhakar) | Yes | Yes | 8 vertebrates | UCSC Genome Browser (PhastCons) | Human-acceleration | not estimated | no |
| ANCs (Bird) | Yes | Yes | 17 vertebrates | MultiZ, PhastCons | χ2-based relative rate test | 15–19% | no |
| HARs (Bush) | Yes | Yes | 6 mammals | MultiZ, PhastCons | LRT | not estimated | no |
| 2xHARs (Lindblad-Toh) | No | No | 29 mammals | MultiZ, PhastCons | LRT | ∼85% | no |
| haDHSs (Gittelman) | No | Yes | 6 primates | Ensembl Genome Browser | LRT | 70% | yes |
LRT, likelihood ratio test.
The total numbers of discovered HARs are indicated in table 1.
A likelihood ratio test with different author-defined parameters.