| Literature DB >> 32778141 |
Stuart Cantsilieris1,2, Susan M Sunkin3, Matthew E Johnson4, Fabio Anaclerio5, John Huddleston6,7, Carl Baker1, Max L Dougherty1, Jason G Underwood8, Arvis Sulovari1, PingHsun Hsieh1, Yafei Mao1, Claudia Rita Catacchio5, Maika Malig1,9,10, AnneMarie E Welch1,11, Melanie Sorensen1, Katherine M Munson1, Weihong Jiang12, Santhosh Girirajan13, Mario Ventura5, Bruce T Lamb14, Ronald A Conlon12, Evan E Eichler15,16.
Abstract
BACKGROUND: The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP).Entities:
Keywords: Gene fusion; Genomic instability; LCR16a; Nuclear pore interacting protein; Segmental duplication
Mesh:
Year: 2020 PMID: 32778141 PMCID: PMC7419210 DOI: 10.1186/s13059-020-02074-4
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Estimation of LCR16a copy number in primate lineages
| Primate lineage | Genomic library | Coverage | # clones | Copy number^ |
|---|---|---|---|---|
| Gray mouse lemur | CHORI-257 | 7.7x | 4 | 1 |
| Dusky titi | LBNL-5 | 9.4x | 12 | < 2 |
| Owl monkey | CHORI-258 | 5x | 15 | < 2 |
| Squirrel monkey | CHORI-254 | N/A | 40 | 6 |
| Marmoset | CHORI-259 | N/A | 110 | 18 |
| Macaque | CHORI-250 | 5.5-6x | 14 | 1* |
| Baboon | RPCI-41 | 5.2x | 9 | 1* |
| Orangutan | CHORI-253 | 5-6x | 127 | 20* |
| Gorilla | CHORI-255 | 6-7x | 113 | 16* |
| Chimpanzee | RPCI4-43/CHOR-251 | 5-6x | 212 | 37* |
| Human | RPCI-11 | 5-7x | – | 17* |
*Reported previously [14]
^Copy number estimated based on number of LCR16a BAC clones
Fig. 1Distribution of LCR16a duplications in primate genomes. The location of LCR16a duplication blocks in marmoset (green dash), macaque (black dash), orangutan (blue dash), and human (red dash) are mapped against the GRCh38 human ideogram. The single-copy macaque locus maps to chr16p13, the ancestral origin from which all other copies were derived. Human LCR16a has expanded intrachromosomally across chromosome 16 predominantly on the short arm of chr16p. A more ancient copy of LCR16a, which is no longer expressed, locates to human chromosome 18. Marmoset and orangutan show expansions on other chromosomes including chr11 and chr13, respectively. Single-copy LCR16a duplication blocks are also mapped to chromosomes 20, 4, and 17/13 in the marmoset lineage
LCR16a-associated gene-containing segmental duplications
| Lineage identified | HSA Location | Size (kbp) | Duplicon | Genes* | RefSeq gene |
|---|---|---|---|---|---|
| Primates | chr16:14711689-14726338 | 9 | LCR16a | Nuclear pore complex interacting protein | |
| Marmoset/Orangutan | chr16:11527314-11548048 | 20.7 | LCR16a-001 | Lipopolysaccharide induced TNF factor | |
| Marmoset/Orangutan | 24.0 | LCR16a-001a | RecQ mediated genome instability 2 | ||
| Marmoset/Orangutan | chr16:11320235-11452411 | 132.8 | LCR16a-002 | Pseudogene--predicted transcript | |
| Primates | chr4:155451842-155466390 | 14.5 | LCR16a-003 | Pseudogene-predicted transcript | |
| Primates | chr9:33565330-33577380 | 12 | LCR16a-004 | Ankyrin repeat domain 18B | |
| RNA exonuclease 1 homolog | |||||
| Marmoset | chr11:134239610-134264044 | 24.5 | LCR16a-005 | Retromer complex component B | |
| Thymocyte nuclear protein 1 | |||||
| Acyl-CoA dehydrogenase family member 8 | |||||
| Marmoset | chr11:93073398-93231243 | 157.8 | LCR16a-007 | Solute carrier family 36 member 4 | |
| Primates | chr16:14965442-15044835 | 79.3 | LCR16a-009 | Pyridoxal dependent decarboxylase domain containing 1 | |
| Primates | chr16:15387600-15416537 | 28.9 | LCR16a-010 | Mitochondrial inner membrane protein like | |
| Marmoset/Gorilla | chr16:14681632-14616125 | 65.5 | LCR16a-011 | Poly(A)-specific ribonuclease | |
| Phospholipase A2 | |||||
| Bifunctional apoptosis regulator | |||||
| Primates | chr1:148979684-149033477 | 36.9 | LCR16a-012 | Phosphodiesterase 4D interacting protein | |
| Marmoset | chr20:19864676-20010962 | 146.3 | LCR16a-015 | Regulation of Rab5-mediated early endocytosis | |
| Marmoset | chr20:20016850-20035552 | 18.7 | LCR16a-016 | N (alpha)-acetyltransferase 20 | |
| Crooked neck pre-mRNA splicing factor 1 | |||||
| Marmoset | chr16:10666581-10697471 | 30.9 | LCR16a-017 | Tektin 5 | |
| Marmoset | chr3:150318404-150342624 | 24.2 | LCR16a-018 | Long intergenic non-protein-coding RNA 1214 | |
| Marmoset | chr2:132835355-132908086 | 72.7 | LCR16a-019 | NCK associated protein 5 | |
| Marmoset | chr11:924494-962797 | 38.3 | LCR16a-26 | Adaptor related protein complex 2 subunit alpha 2 | |
| Marmoset | chr8:50623663-50639745 | 16.1 | LCR16a-27 | Syntrophin gamma 1 | |
| Primates | chr16:15062396-15097775 | 35.4 | LCR16a-20 | RNA polymerase I transcription factor | |
| African Ape/Prosimian | chr16:20411068-20501378 | 90.3 | LCR16a-25 | Acyl-CoA synthetase medium-chain family member |
*Most duplicate genes are incomplete, and annotation is based on RefSeq annotation of human reference genome (GRCh38)
Sequence composition analysis for donor/acceptor duplications and pre-integration sites
| LCR16a type | Nonredundant sites | %GC ( | SINE ( | LINE ( |
|---|---|---|---|---|
| Donors | 63 | 1.07, 1.6 × 10−5 ± 0.0005 * | 1.47, 3.7 × 10−8 ± 2.4 × 10−5 * | 1.03, 0.2 ± 0.05 |
| Acceptors | 27 | 1.11, 2 × 10−9 ± 6 × 10−6 * | 1.68, 7.9 × 10−6 ± 0.00038 * | 1.19, 0.009 ± 0.013 |
| Pre-integration sites | 13 | 1.18, 0.09 ± 0.08 | 1.66, 0.0077 ± 0.024 | 1.29, 0.024 ± 0.04 |
Note: we corrected for multiple hypothesis testing using FWER for a total of nine tests. The associations that had a corrected p value + SE ≤ 0.05 are denoted with an asterisk “*” 10,000 permutations. The “E” value represents the enrichment coefficient that was calculated based on the observed value divided by the expected, where the latter was defined as the mean of 10,000 genome-wide permutations. The retrotransposon statistics refer to the enrichment in LINE and SINE counts relative to the distributed segments
Sequence composition of LCR16a sites of integration
| Primate | Build | Breakpoint coordinate | Deletion at pre-integration site (kbp) | Repeats (%) | LTR (%) | LINE (%) | SINE (%) | Unique (%) | Duplication insertion (kbp) | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Marmoset | hg38 | chr4 | 160005326 | 160008761 | 3.4 | 99.6 | . | 81.3 | 17.6 | 0.03 | 21.6 |
| Marmoset | hg38 | chr20 | 20011993 | 20016575 | 4.6 | 66.8 | 4.9 | 19.3 | 37.5 | 33 | 25.7 |
| Marmoset | hg38 | chr11 | 134267373 | 134272307 | 4.9 | 48 | 10.3 | 5.8 | 26.6 | 52 | 243 |
| Marmoset | hg38 | chr11 | 93232994 | 93232995 | 0 | . | . | . | . | . | 148 |
| Marmoset | hg38 | chr16 | 11452411 | 11525004 | 72.6 | 57.6 | 3.66 | 14.7 | 29.3 | 42.3 | 21 |
| Marmoset^ | hg38 | chr17_chr13 | 83228403 | 19349530 | . | . | . | . | . | . | 358.5 |
| Chimpanzee | rheMac8 | chr20 | 15098676 | 15131780 | 32.5 | 65.8 | 10.4 | 3.41 | 50 | 0.5 | 69.8 |
| Squirrel monkey | hg38 | chr2 | 100587592 | 100601735 | 14.1 | 69.9 | 2.6 | 4.8 | 35.3 | 30.1 | 52.5 |
| Chimpanzee* | hg38 | chr17 | 44694772 | 44700734 | 5.9 | 92.7 | 0 | 8.3 | 70.6 | 7.3 | 80 |
| Orangutan* | hg38 | chr13 | 24407831 | 24480143 | 80.1 | 55.6 | 5 | 23.1 | 22 | 44.4 | 140 |
| Orangutan* | hg38 | chr13 | 25361286 | 25366987 | 5.7 | 74.1 | 2.2 | 55.5 | 15.2 | 25.9 | 90 |
| Gorilla* | hg38 | chr16 | 27185611 | 27191436 | 5.8 | 83.7 | 30 | 26.3 | 22.8 | 16.3 | 100 |
| Gorilla* | hg38 | chr16 | 23303992 | 23307429 | 3.4 | 91.2 | 0 | 23.6 | 71.3 | 4.2 | 50 |
| Chimpanzee* | hg38 | chr16 | 2782469 | 2798105 | 16.1 | 61.8 | 17.3 | 13.4 | 23.7 | 38.2 | 30 |
Sequence composition of pre-integration site based on analysis of the orthologous location in the human genome (GRCh38)
*Reported previously [14]
^Cytogenetic rearrangement between chromosomes
Fig. 2LCR16a-associated chromosomal evolutionary rearrangements in marmoset. a A chromosome ideogram schematic from marmoset chromosome 11 (CJA11) is compared to human and the predicted primate ancestor (PA). Synteny blocks are distinguished by colors and numbers while the position of the FISH probe is depicted by a red mark. The colored arrows indicate evolutionary inversions, and black arrows denote the ancestral orientation. A ~ 33 Mbp pericentromeric inversion in marmoset (green) is defined at the centromeric boundary by an LCR16a-associated duplication block. Both the predicted primate ancestor and human (chr11q22.2-q25) are shown to be in direct orientation based on the order of the blocks analyzed in other primate lineages [15]. b A complex chromosomal rearrangement on marmoset chromosome 5 (CJA5) is identified between the ancestral chromosomes of HSA17 and HSA13; again LCR16a defines the boundary of this event. Note that the evolutionary order of the two inversions that led to HSA17 is unknown and the sequence shown in the figure is only one of two possibilities. c Single-color FISH analysis using metaphase spreads is used to confirm the presence of chromosomal rearrangements between marmoset and human. A probe mapping to the telomeric region of HSA11 (RP11-265F9) shows a signal mapping to a syntenic region at the CJA11 centromere. At CJA5, two adjacent FISH signals from (RP11-481P7 and RP11-110 K18) map to the ancestral telomeric region of HSA17 and the centromeric region of HSA13. CJA, Callithrix jacchus; HSA, Homo sapiens; PA, primate ancestor; NC, neocentromere
Fig. 3Phylogeny and sequence composition of LCR16a duplication blocks. a Phylogenetic analysis of LCR16a copies in the marmoset lineage. The size and complexity of the mosaic LCR16a duplications are depicted by colored duplication blocks adjacent to each node (refer to Additional file 2: Table S4 for individual duplicon map locations). Map locations for the duplication blocks are depicted against the GRCh38 reference assembly. The LCR16a core element is shown as dashed lines. Nodes with < 90% bootstrap support are indicated by stars. Phylogenetic analysis reveals two distinct clades in marmoset, one mapping to chromosome 16 (group 1), the other mapping to chromosome 11 (group 2). Duplications with similar block architectures cluster together and the two clades suggest multiple independent events in marmoset. b Phylogenetic sites of recurrent LCR16a duplication in the orangutan and human lineages. The duplication blocks are numbered according to genomic location of a locus in the chromosome, and block coordinates correspond to the GRCh38 reference assembly. Phylogenetic analysis predicts two distinct clades depicting the independent origins of human and orangutan LCR16a duplications. Regions of recurrent microdeletion/microduplication associated with intellectual disability and autism in humans are highlighted in gray. Nodes with < 90% bootstrap support are indicated by stars
Fig. 4Structure of the ancestral chromosome 16p13 locus. The structure and organization of chr16p13 in four primate lineages is shown based on sequencing of a tiling path of BAC clones for each primate haplotype. SDs (colored arrows) and gene models (black arrows) are shown with respect to lineage-specific duplications (blue bars) identified based on sequence read-depth (WSSD) [1]. a The chromosome 16p13 region has expanded and contracted hundreds of kilobases due to lineage-specific duplication. Note the ancestral ~ 160-kbp inversion between the human RP11 haplotype and all other primates. The ancestral LCR16a duplicon in macaque shows a single copy of NPIP, compared to three copies in marmoset and chimpanzee, and five copies in human. b A Miropeats comparison between two human haplotypes at the ancestral locus on chr16p13. CHM1 BACs tiling across the chr16p13 region were sequenced and assembled using PacBio SMRT sequencing to create a super contig. The SD organization is depicted using colored arrows. Miropeats between RP11 and CHM1 contigs shows pairwise differences between orthologous regions. A ~ 400-kbp inversion is detected in the CHM1 haplotype, flanked by LCR16a core duplicons (blue lines). CHM1 also carries an additional duplication corresponding to LCR16a-009, which contains PDXDC1 (maroon arrow) and incomplete duplication of LCR16a-021 NOMO1 (blue arrow)
Fig. 5Continued
Fig. 6Comparison of mouse BAC transgenic and NPIP expression. a RT-PCR of cDNA obtained from a panel of mouse transgenic tissues. Mouse lines (I13.49) derived from baboon (RPCI41-285I13) BAC integration demonstrate robust expression in the testis. This is a single copy of NPIP and orthologous to the ancestral location. b RT-PCR of cDNA generated from a panel of mouse transgenic tissues carrying human BAC clones with different NPIP paralogs. Mouse transgenic A15.3.8 (RP11-1381A15 NPIPA7), H15.1 (RP11-344H15 PKD1P6-NPIPP1), and O14.1 (RP11-1236O14 NPIPA1) each show a ubiquitous pattern of tissue expression. The H15.1 line shows two distinct bands because this locus contains a tandem duplication of exon 2 resulting two sets of products by RT-PCR. c Organization and sequence composition of large-insert BAC clones used in mouse transgenic experiments. Annotations include SDs (colored arrows), gene models with the direction of transcription, and DupMasker annotation [20]. The baboon and three distinct human NPIP loci exhibit a diverse set of flanking duplicons and LCR16a copies
Fig. 7In situ hybridization of NPIP transgenic lines in brain tissue. a The transgenic line is indicated in each column (A15 (human), H15 (human), and I13 (baboon), respectively) and a representative sagittal section from each transgenic line is shown. (i) Expression of NPIP is undetectable in I13. (ii) Expression in hippocampal subregions. High expression is evident in the hippocampus pyramidal cells for both the A15 and H15 lines (derived from human BAC integrations). (iii) Cerebellar expression. High, widespread expression is apparent in the cerebellar granule cells and molecular layer in H15. In A15, large scattered cells in the granule layer are expressing NPIP. (iv) Examples of cortical expression patterns. High, widespread expression is evident in the cortex for both the A15 and H15 lines. b In situ hybridization of NPIP to human visual cortex. Human visual cortex, containing Brodmann’s areas 17 and 18, is shown hybridized to the riboprobe to the NPIP in the H15 transgenic line. Strong widespread expression is clearly visible throughout the visual cortex gray matter (ii and iii) as well as the white matter (i)
Fig. 8Model of NPIP/LCR16a evolution. Comparative analysis in primates reveals changes associated with both LCR16a duplication and the NPIP model. Diversity in map location, structural variation, expression, and selection of the NPIP family has occurred over ~ 58 mya of primate evolution. Note the disproportionate amount of change in the great ape lineage