| Literature DB >> 35884681 |
Abstract
Olduvai protein domains (also known as DUF1220 or NBPF) have undergone the greatest human-specific increase in the copy number of any coding region in the genome. Their repeat number was strongly associated with the evolutionary expansion of brain volumes, neuron counts and cognitive abilities, as well as with disorders of the autistic spectrum. Nevertheless, the domain function and cellular mechanisms underlying the positive selection of Olduvai DNA sequences in higher primates remain obscure. Here, I show that the inclusion of Olduvai exon doublets in mature transcripts is facilitated by a potent splicing enhancer that was created through duplication within the first exon. The enhancer is the strongest among the NBPF transcripts and further promotes the already high splicing activity of the unexpanded first exons of the two-exon domains, safeguarding the expanded Olduvai exon doublets in the mature transcriptome. The duplication also creates a predicted RNA guanine quadruplex that may regulate the access to spliceosomal components of the super-enhancer and influence the splicing of adjacent exons. Thus, positive Olduvai selection during primate evolution is likely to result from a combination of multiple targets in gene expression pathways, including RNA splicing.Entities:
Keywords: DUF1220; NBPF; Olduvai; RNA guanine quadruplex; autism; brain evolution; brain size; neurons; pre-mRNA splicing; repeat
Year: 2022 PMID: 35884681 PMCID: PMC9313022 DOI: 10.3390/brainsci12070874
Source DB: PubMed Journal: Brain Sci ISSN: 2076-3425
Figure 1Splicing enhancer activities of the first exon in expanded and unexpanded Olduvai exon doublets and RNA G4 predictions. (A) ESE/ESSseq scores for overlapping hexamers in representative pG4-containing (pG4+, top panel) and pG4-lacking (pG4-, bottom panel) NBPF10 exons. Exon (E) numbers correspond to the longest transcripts, with E21 and E23 representing examples of HLS sub-types with unexpanded first exons of the Olduvai doublet and E31 representing the expanded version. Horizontal dotted lines at the top of each chart denote the maximum ESE/ESSseq values (1.034 for the strongest splicing hexamer AGAAGA, ref. [8]). The horizontal dashed line shows mean values for control human exons [9]. Horizontal black boxes at the top panel denote duplicated regions in the expanded first exons of Olduvai doublets. The 68 nt pG4 sequence [6] is in a red box. (B) The alignment of the NBPF10 exons that contain (+) or lack (−) pG4. Exonic sequences are in upper case and flanking intronic sequences are in lower case. pG4 is in a red box; black boxes denote duplicated regions. (C) Predicted effects of Olduvai exon expansions on RNA processing. Top, non-homologous allelic recombination was previously proposed to explain Olduvai amplifications [6]. Black dots denote pG4 sequences, red line denotes the location of an intron recombination breakpoint [6]. exp, expanded Olduvai sub-type. Bottom, the putative impact of the recombination event on pre-mRNA splicing. Splicing is shown as diagonal lines for canonical (dotted lines) or alternative (dashed lines) events; the line widths correspond to expected exon usage frequencies. Red arrows at penultimate exons illustrate a lack of splicing dependencies downstream of pG4-containing exons in CON3. Retention of the last intron might also lead to stable translation of truncated proteins, bypassing nonsense-mediated RNA decay of transcripts with premature termination codons further upstream [10]. (D) The ESE/ESSseq profile in an NBPF9 region between the first expanded HLS1 exon and the terminal CON3 exon (top). Nt numbering is from the first position of the expanded HLS1 exon. Bottom, pG4-containing insertion in HLS1 predicted to form RNA G4 by the indicated methods. Score predictions were carried out with the G4RNA screener [11] (v. 0.2, window length 60, window step 10). Thresholds for the consecutive guanine over consecutive cytosine (cGcC) scores were >4.5, for the Genehunter (G4H) scores were >0.9 and for the neural network (G4NN) scores were >0.5. Each method identified the RNA G4 structures in the absence of ligands.