| Literature DB >> 28369217 |
Sean D Smith1, Joseph K Kawash1, Spyros Karaiskos1, Ian Biluck1, Andrey Grigoriev1.
Abstract
Comparative genomics studies typically limit their focus to single nucleotide variants (SNVs) and that was the case for previous comparisons of woolly mammoth genomes. We extended the analysis to systematically identify not only SNVs but also larger structural variants (SVs) and indels and found multiple mammoth-specific deletions and duplications affecting exons or even complete genes. The most prominent SV found was an amplification of RNase L (with different copy numbers in different mammoth genomes, up to 9-fold), involved in antiviral defense and inflammasome function. This amplification was accompanied by mutations affecting several domains of the protein including the active site and produced different sets of RNase L paralogs in four mammoth genomes likely contributing to adaptations to environmental threats. In addition to immunity and defense, we found many other unique genetic changes in woolly mammoths that suggest adaptations to life in harsh Arctic conditions, including variants involving lipid metabolism, circadian rhythms, and skeletal and body features. Together, these variants paint a complex picture of evolution of the mammoth species and may be relevant in the studies of their population history and extinction.Entities:
Keywords: comparative genomics; elephant; evolution; viral defense; woolly mammoth
Mesh:
Year: 2017 PMID: 28369217 PMCID: PMC5737375 DOI: 10.1093/dnares/dsx007
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Fixed, derived woolly mammoth CNVs
| CNV | Location | Gene | Consequence | Biotype |
|---|---|---|---|---|
| DUP | scaffold_16:17919954-17969811 | RNASEL | Transcript amplification | Protein coding |
| DEL | scaffold_2:62406050-62418121 | 5S_rRNA | Transcript ablation | rRNA |
| DEL | scaffold_21:14388123-14390331 | CD44 | Feature truncation | Protein coding |
| DEL | scaffold_4:20346665-20394204 | ENSLAFG00000031480 | Transcript ablation | Protein coding |
| DEL | scaffold_48:12765543-12778793 | U6 | Transcript ablation | snRNA |
| DEL | scaffold_55:5626970-5637970 | ENSLAFG00000027547 | Transcript ablation | Protein coding |
Ensembl gene ID used when gene symbol not available.
Figure 1RNase L amplification unique to woolly mammoths. Additional 10,000 bases shown upstream and downstream of duplication. Read coverage normalized (y axis maximum = 6× average genome read depth). Top four tracks show mammoths (Wrangel, Oimyakon, M4, M25). Bottom four tracks show Asian elephants (Emelia, Asha, Parvathy, Uno). Box indicates region containing RNase L exons.
Amino acid variants in mammoth RNase L
| Location | Codons (elephant/mammoth) | Amino acids (elephant/mammoth) | Protein position | Domain |
|---|---|---|---|---|
| scaffold_16:17939894 | aGt/aTt | S/I | 34 | ANK |
| scaffold_16:17940559 | Gag/Aag | E/K | 256 | ANK |
| scaffold_16:17940757 | Aca/Cca | T/A | 322 | – |
| scaffold_16:17941141 | Ttt/Ctt | F/L | 450 | Protein kinase |
| scaffold_16:17965935 | aGg/aAg | R/K | 675 | RNase |
| scaffold_16:17965975 | aaG/aaC | K/N | 688 | RNase |
Variants occur in 3–9 copies of RNase L. Mammoth predicted to have 5–9 copies of RNase L.
Figure 2African elephant RNase L alignment to human RNase L. Wolly mammoth residue in bold text are shown next to the corresponding elephant and human residues (boxed). Residues of interest near or coinciding with woolly mammoth amino acid substitutions marked above with the following signs: $ (2-5A interaction site), # (self-domain dimerization), or * (ribonuclease active site). RNases L cleavage site H683 (hH672) is marked with a plus sign (+).
Figure 3Occurrences of woolly mammoth SNV combinations in reads for RNase L domain variants. Read counts for Wrangel, M4, Oimyakon, and M25 are left of reads. Read counts in parentheses indicate addition of inferred counts based on the observation that SNVs for residues 675 and 680 always co-occur. Most frequent haplotypes highlighted in gray. Residue number above reads. SNV combinations with no occurrences are not shown.
Nucleotide substitution comparison
| Change | Mammoth (this study) | Asian elephant | Mammoth (Lynch) | |||||
|---|---|---|---|---|---|---|---|---|
| Occurrences | Occurrences | Occurrences | ||||||
| Raw | Normalized | Raw | Normalized | % Chg | Raw | Normalized | % Chg | |
| A/C | 81 | 0.035 | 103 | 0.039 | −9.3 | 78 | 0.038 | −6.1 |
| A/G | 286 | 0.125 | 417 | 0.158 | −20.9 | 266 | 0.129 | −2.8 |
| A/T | 42 | 0.018 | 63 | 0.024 | −23.1 | 50 | 0.024 | −24.1 |
| C/A | 96 | 0.042 | 87 | 0.033 | 27.3 | 92 | 0.045 | −5.7 |
| C/G | 111 | 0.049 | 173 | 0.066 | −26.0 | 112 | 0.054 | −10.4 |
| G/C | 140 | 0.061 | 156 | 0.059 | 3.5 | 126 | 0.061 | 0.5 |
| G/T | 77 | 0.034 | 103 | 0.039 | −13.7 | 90 | 0.044 | −22.7 |
| T/A | 46 | 0.020 | 70 | 0.027 | −24.2 | 53 | 0.026 | −21.5 |
| T/C | 312 | 0.137 | 432 | 0.164 | −16.7 | 261 | 0.126 | 8.1 |
| T/G | 81 | 0.035 | 94 | 0.036 | −0.6 | 69 | 0.033 | 6.1 |
Compared mammoth nucleotide substitutions with Asian elephant and previously identified mammoth (Lynch et al., 2015) substitutions. Comparison using fixed, derived non-synonymous SNVs. Most common miscodings for ancient DNA in bold.
Fixed, derived non-synonymous woolly mammoth variants in ABCC11
| Location | Variant | Protein position | Amino acids (elephant/mammoth) | Codons (elephant/mammoth) |
|---|---|---|---|---|
| scaffold_43:16997813 | Missense variant | 115 | S/G | Agt/Ggt |
| scaffold_43:16985516 | Missense variant | 359 | M/L | Atg/Ctg |
| scaffold_43:16957501 | Stop gained | 703 | W/* | tgG/tgA |
| scaffold_43:16957475 | Missense variant | 712 | G/E | gGa/gAa |
| scaffold_43:16919280 | Missense variant | 1,278 | Q/P | cAa/cCa |
Fixed, derived non-synonymous clade variants
| Clade | Location | SYMBOL | Protein position | Amino acids (ele/mam) | Codons (ele/mam) |
|---|---|---|---|---|---|
| I | scaffold_25:32550639 | ZDHHC23 | 129 | K/E | Aag/Gag |
| I | scaffold_3:68117492 | ENSLAFG00000010153 | 891 | R/L | cGt/cTt |
| I | scaffold_40:10330726 | SULT6B1 | 115 | R/Q | cGa/cAa |
| I | scaffold_64:11532808 | SPTBN5 | 901 | G/R | Ggg/Agg |
| II | scaffold_125:2369691 | CCDC94 | 292 | P/L | cCg/cTg |
| II | scaffold_81:784525 | ENSLAFG00000032374 | 232 | T/M | aCg/aTg |
Non-synonymous clade variants were homozygous in the clade and had no evidence of the variant in the Asian elephants or the other clade. Ensembl gene ID used when gene symbol not available.
Fixed, derived woolly mammoth indels occurring in exons
| Indel | Location | Gene | Consequence | Type |
|---|---|---|---|---|
| DEL | scaffold_1:91019166-91019166 | WWC1 | Frameshift | Protein |
| DEL | scaffold_2:12181244-12181244 | ETNK1 | Frameshift | Protein |
| DEL | scaffold_4:40567579-40567579 | ENSLAFG00000029865 | Frameshift | Protein |
| DEL | scaffold_6:80415826-80415826 | ADAMTSL1 | Frameshift | Protein |
| DEL | scaffold_6:82071600-82071600 | ENSLAFG00000027513 | Frameshift, stop lost | Protein |
| DEL | scaffold_7:76407941-76407941 | ARHGEF28 | Frameshift | Protein |
| DEL | scaffold_10:17436611-17436611 | PAX2 | Frameshift | Protein |
| DEL | scaffold_15:54351650-54351650 | ENSLAFG00000028486 | Frameshift | Protein |
| DEL | scaffold_153:798742-798742 | SBNO2 | Frameshift | Protein |
| DEL | scaffold_16:21968006-21968006 | RALGPS2 | Frameshift | Protein |
| DEL | scaffold_35:4581294-4581294 | GPR83 | Frameshift | Protein |
| DEL | scaffold_36:4137479-4137479 | ARPP21 | Frameshift | Protein |
| DEL | scaffold_63:11761905-11761905 | PGAM2 | Frameshift | Protein |
| DEL | scaffold_63:13307853-13307854 | SNORA5 | Non coding exon | snoRNA |
| DEL | scaffold_68:424956-424956 | ENSLAFG00000026930 | Frameshift | Protein |
| DEL | scaffold_91:278361-278361 | ENSLAFG00000027842 | Frameshift | Protein |
| INS | scaffold_7:56774332-56774332 | ENSLAFG00000027421 | Coding sequence | Protein |
| INS | scaffold_43:286385-286385 | KIFC3 | Coding sequence | Protein |
| INS | scaffold_96:4058660-4058660 | ENSLAFG00000032317 | Coding sequence | Protein |
| INS | scaffold_107:2600881-2600881 | NOL8 | Coding sequence | Protein |
Ensembl gene ID used when gene symbol not available.
Essentiality of genes with fixed, derived woolly mammoth indels occurring in protein-coding regions
| Gene symbol | Indel | Studies indicating gene essential |
|---|---|---|
| KIFC3 | INS | a, b |
| NOL8 | INS | a, b, c |
| WWC1 | DEL | a, b |
| ETNK1 | DEL | a, b |
| ADAMTSL1 | DEL | a |
| ARHGEF28 | DEL | a |
| PAX2 | DEL | a |
| RALGPS2 | DEL | a |
| GPR83 | DEL | a |
| ARPP21 | DEL | a |
| PGAM2 | DEL | a |
| SBNO2 | DEL | a |
Genes without symbols (seven) were excluded. a, Wang et al.; b, Hart et al.; c, Blomen et al.