| Literature DB >> 32143403 |
Daniela C Soto1,2, Colin Shew1,2, Mira Mastoras1, Joshua M Schmidt3, Ruta Sahasrabudhe4, Gulhan Kaya1, Aida M Andrés3, Megan Y Dennis1,2.
Abstract
Recent efforts to comprehensively characterize great ape genetic diversity using short-read sequencing and single-nucleotide variants have led to important discoveries related to selection within species, demographic history, and lineage-specific traits. Structural variants (SVs), including deletions and inversions, comprise a larger proportion of genetic differences between and within species, making them an important yet understudied source of trait divergence. Here, we used a combination of long-read and -range sequencing approaches to characterize the structural variant landscape of two additional Pan troglodytes verus individuals, one of whom carries 13% admixture from Pan troglodytes troglodytes. We performed optical mapping of both individuals followed by nanopore sequencing of one individual. Filtering for larger variants (>10 kbp) and combined with genotyping of SVs using short-read data from the Great Ape Genome Project, we identified 425 deletions and 59 inversions, of which 88 and 36, respectively, were novel. Compared with gene expression in humans, we found a significant enrichment of chimpanzee genes with differential expression in lymphoblastoid cell lines and induced pluripotent stem cells, both within deletions and near inversion breakpoints. We examined chromatin-conformation maps from human and chimpanzee using these same cell types and observed alterations in genomic interactions at SV breakpoints. Finally, we focused on 56 genes impacted by SVs in >90% of chimpanzees and absent in humans and gorillas, which may contribute to chimpanzee-specific features. Sequencing a greater set of individuals from diverse subspecies will be critical to establish the complete landscape of genetic variation in chimpanzees.Entities:
Keywords: chimpanzee; chromatin organization; comparative genomics; gene regulation; nanopore sequencing; natural selection; optical mapping; structural variation
Mesh:
Year: 2020 PMID: 32143403 PMCID: PMC7140787 DOI: 10.3390/genes11030276
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Genomic features of identified SVs. (A) Deletions (red), inversions (cyan), and large-scale cytogenetic inversions (yellow) are interspersed across all 24 human orthologous chromosomes, depicted as ideograms. (B) Novel variants in our dataset defined as lacking 50% reciprocal overlap with previous reported variants in great apes. (C) Size distribution of deletions (red) and inversions (cyan). Median size is depicted as dashed lines. (D) Observed average distance of deletions (red line) and inversions (cyan line) to SDs, compared to randomly sampled regions across the genome of the same size of deletions (red distribution) and inversion (green distribution). We observed an enrichment of SV breakpoints residing near SDs (empirical p-value = 1 × 10−4).
Figure 2Description of genes overlapping identified SVs. (A) Categories of genes overlapping deletion regions ±2.5 kbp and inversion breakpoints ±50 kbp as defined by ENSEMBL biotypes. (B) Number of protein-encoding genes classified as LoF tolerant (pLI ≤ 0.1), intolerant (pLI ≥ 0.9) and middle range (pLI > 0.1 and pLI < 0.9) affected by deletions regions ±2.5 kbp and inversion breakpoints ±50 kbp. Some affected genes lack LoF information (missing category). All genes impacted by deletions were classified by VEP as either highly impacted (feature ablation or truncation) or modified, while genes impacted by inversions were either modified or no effect was predicted (overlap only). Transcribed elements with no corresponding ENSEMBL transcript ID in humans were classified as no orthology (blue). (C) Overrepresented GO terms in genes impacted by deletions and inversions as reported by DAVID (* q-value < 0.05; ** q-value < 0.001). Counts represent the number of genes annotated with each GO term.
Figure 3Enrichment and depletion tests of SVs with genomic features. Both deletions and duplications were tested within 2.5 kbp (resolution of the SV calls) and 50 kbp. All annotated genes (GENCODE v27) and protein-encoding genes were tested for depletion of SVs (top two rows) via permutation testing. Human TADs from the LCL GM12878 were tested for depletion of putatively disrupting SVs (i.e., SVs generating PDTs, third row). Human–chimpanzee DE genes from LCLs and iPSCs were also tested for enrichment in SVs via permutation testing (fourth and fifth rows). Circles are sized proportionally to the negative log of the empirical p-values and colored according to the strength of enrichment or depletion, represented by the log ratio of observed (obs; number of features intersecting SVs) and expected (exp; mean number of features intersecting 1000 permuted coordinate sets) counts.
Figure 4Genome organization of human and chimpanzee across regions with identified SVs. The Hi-C genomic landscape of human (top) and chimpanzee (bottom) are depicted for iPSCs using Juicebox for (A) chromosome 2q12.2-q13 (chr2:106095001-109905000, GRCh38) and (B) chromosome 9q22.2-q22.32 (chr9:90200001-94010000, GRCh38). Predicted TADs (yellow triangles) were compared between species, noting differences at SVs (dotted boxes) including deletions and inversions. SDs are depicted as colored bars, taken from the UCSC Genome Browser track. Genes showing significant DE in chimpanzee versus humans are colored as red (up in chimpanzee) or blue (down in chimpanzee). Genes not included in the DE analysis are in gray (Tables S11 and S12).
Protein-encoding genes impacted by chimpanzee-specific deletions and inversions.
| Gene | ENSEMBL ID | SV Type | Description |
|---|---|---|---|
|
|
|
|
|
|
| ENSG00000100342 | deletion | Apolipoprotein L1 |
|
| ENSG00000100336 | deletion | Apolipoprotein L4 |
|
| ENSG00000196296 | deletion | Sarcoplasmic/endoplasmic reticulum calcium ATPase 1 |
|
| ENSG00000168488 | deletion | Ataxin 2 like |
|
| ENSG00000255501 | deletion | Caspase recruitment domain family member 18 |
|
| ENSG00000153113 | inversion | Calpastatin |
|
| ENSG00000177455 | deletion | CD19 Molecule |
|
| ENSG00000007129 | deletion | CEA Cell Adhesion Molecule 21 |
|
| ENSG00000080910 | deletion | Complement Factor H Related 2 |
|
| ENSG00000134365 | deletion | Complement Factor H Related 4 |
|
|
|
|
|
|
| ENSG00000188603 | deletion | CLN3 Lysosomal/Endosomal Transmembrane Protein, Battenin |
|
| ENSG00000162368 | deletion | Cytidine/Uridine Monophosphate Kinase 1 |
|
| ENSG00000058453 | inversion | Ciliary Rootlet Coiled-Coil, Rootletin |
|
| ENSG00000108242 | deletion | Cytochrome P450 Family 2 Subfamily C Member 18 |
|
| ENSG00000185982 | deletion | Defensin Beta 128 |
|
| ENSG00000178852 | deletion | EF-Hand Calcium Binding Domain 13 |
|
| ENSG00000184110 | deletion | Eukaryotic Translation Initiation Factor 3 Subunit C |
|
| ENSG00000115604 | inversion | Interleukin 18 Receptor 1 |
|
| ENSG00000115602 | inversion | Interleukin 1 Receptor Like 1 |
|
|
|
|
|
|
| ENSG00000136696 | deletion | Interleukin 36B |
|
| ENSG00000125571 | deletion | Interleukin 37 |
|
| ENSG00000186925 | deletion | Keratin Associated Protein 19-6 |
|
| ENSG00000244362 | deletion | Keratin Associated Protein 19-7 |
|
| ENSG00000187922 | deletion | Lipocalin 10 |
|
| ENSG00000267206 | deletion | Lipocalin 6 |
|
| ENSG00000006659 | deletion | Galectin 14 |
|
| ENSG00000153208 | deletion | MER Proto-Oncogene, Tyrosine Kinase |
|
| ENSG00000255524 | deletion | Nuclear Pore Complex Interacting Protein Family Member B8 |
|
| ENSG00000196993 | deletion | Nuclear Pore Complex Interacting Protein Family Member B9 |
|
| ENSG00000176046 | deletion | Nuclear Protein 1, Transcriptional Regulator |
|
| ENSG00000122136 | deletion | Odorant Binding Protein 2A |
|
|
|
|
|
|
|
|
|
|
|
| ENSG00000177212 | deletion | Olfactory Receptor Family 2 Subfamily T Member 33 |
|
| ENSG00000179695 | deletion | Olfactory Receptor Family 6 Subfamily C Member 2 |
|
| ENSG00000205329 | deletion | Olfactory Receptor Family 6 Subfamily C Member 3 |
|
| ENSG00000205328 | deletion | Olfactory Receptor Family 6 Subfamily C Member 65 |
|
| ENSG00000184954 | deletion | Olfactory Receptor Family 6 Subfamily C Member 70 |
|
| ENSG00000187857 | deletion | Olfactory Receptor Family 6 Subfamily C Member 75 |
|
| ENSG00000185821 | deletion | Olfactory Receptor Family 6 Subfamily C Member 76 |
|
| ENSG00000106536 | deletion | POU Class 6 Homeobox 2 |
|
| ENSG00000177548 | deletion | Rabaptin, RAB GTPase Binding Effector Protein 2 |
|
| ENSG00000204628 | inversion | Receptor For Activated C Kinase 1 |
|
| ENSG00000176476 | deletion | SAGA Complex Associated Factor 29 |
|
| ENSG00000178188 | deletion | SH2B Adaptor Protein 1 |
|
| ENSG00000236396 | deletion | Solute Carrier Family 35 Member G4 |
|
| ENSG00000111700 | inversion | Solute Carrier Organic Anion Transporter Family Member 1B3 |
|
| ENSG00000196502 | deletion | Sulfotransferase Family 1A Member 1 |
|
| ENSG00000197165 | deletion | Sulfotransferase Family 1A Member 2 |
|
|
|
|
|
|
| ENSG00000241127 | deletion | YAE1 Maturation Factor Of ABCE1 |
|
| ENSG00000257046 | inversion | Uncharacterized |
|
| ENSG00000204003 | deletion | Uncharacterized |
* Human and chimpanzee orthologs were tested and shown to be significant DE genes in either LCLs and/or iPSCs; Genes in bold were found to have strong signatures of positive or balancing selection using the HKA test [45].