| Literature DB >> 31070747 |
Marc Tollis1,2,3, Jooke Robbins4, Andrew E Webb5, Lukas F K Kuderna6, Aleah F Caulin7, Jacinda D Garcia2, Martine Bèrubè4,8, Nader Pourmand9, Tomas Marques-Bonet6,10,11,12, Mary J O'Connell13, Per J Palsbøll4,8, Carlo C Maley1,2.
Abstract
Cetaceans are a clade of highly specialized aquatic mammals that include the largest animals that have ever lived. The largest whales can have ∼1,000× more cells than a human, with long lifespans, leaving them theoretically susceptible to cancer. However, large-bodied and long-lived animals do not suffer higher risks of cancer mortality than humans-an observation known as Peto's Paradox. To investigate the genomic bases of gigantism and other cetacean adaptations, we generated a de novo genome assembly for the humpback whale (Megaptera novaeangliae) and incorporated the genomes of ten cetacean species in a comparative analysis. We found further evidence that rorquals (family Balaenopteridae) radiated during the Miocene or earlier, and inferred that perturbations in abundance and/or the interocean connectivity of North Atlantic humpback whale populations likely occurred throughout the Pleistocene. Our comparative genomic results suggest that the evolution of cetacean gigantism was accompanied by strong selection on pathways that are directly linked to cancer. Large segmental duplications in whale genomes contained genes controlling the apoptotic pathway, and genes inferred to be under accelerated evolution and positive selection in cetaceans were enriched for biological processes such as cell cycle checkpoint, cell signaling, and proliferation. We also inferred positive selection on genes controlling the mammalian appendicular and cranial skeletal elements in the cetacean lineage, which are relevant to extensive anatomical changes during cetacean evolution. Genomic analyses shed light on the molecular mechanisms underlying cetacean traits, including gigantism, and will contribute to the development of future targets for human cancer therapies.Entities:
Keywords: cancer; cetaceans; evolution; genome; humpback whale
Mesh:
Year: 2019 PMID: 31070747 PMCID: PMC6657726 DOI: 10.1093/molbev/msz099
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Genomic Sequence Data Obtained for the Humpback Whale Genome.
| Libraries | Est. Number of Reads | Avg. Read Length (bp) | Est. Depth (total) |
|---|---|---|---|
| 180 bp paired-end | 1,211,320,000 | 94 | 41.3 |
| 300 bp paired-end | 25,820,000 | 123 | 1.2 |
| 500 bp paired-end | 112,400,000 | 123 | 5.0 |
| 600 bp paired-end | 395,500,000 | 93 | 13.4 |
| 2 kb mate-paired | 348,080,000 | 49 | 6.2 |
| 10 kb mate-paired | 279,000,000 | 94 | 9.0 |
| Subtotal for WGS libraries | 2,372,120,000 | 76.1 | |
| Chicago Library 1 | 72,000,000 | 100 | 5.3 |
| Chicago Library 2 | 6,000,000 | 151 | 0.7 |
| Chicago Library 3 | 190,000,000 | 100 | 13.9 |
| Chicago Library 4 | 79,000,000 | 100 | 5.8 |
| Subtotal for Chicago Libraries | 347,000,000 | 25.6 | |
| Total for all sequence libraries | 2,719,120,000 | 101.7 |
Note.—WGS, whole-genome shotgun.
Statistics for the Humpback Whale Genome Assembly.
| Feature | Initial Assembly | Final Assembly |
|---|---|---|
| Assembly length | 2.27 Gb | 2.27 Gb |
| Contig N50 | 12.4 kb | 12.3 kb |
| Longest scaffold | 2.2 Mb | 29.4 Mb |
| Number of scaffolds | 24,319 | 2,558 |
| Scaffold N50 | 198 kb | 9.14 Mb |
| Scaffold N90 | 53 kb | 2.35 Mb |
| Scaffold L50 | 3,214 | 79 |
| Scaffold L90 | 11,681 | 273 |
| Percent genome in gaps | 5.36% | 5.45% |
| BUSCO | C: 85%[D: 1.8%], F: 15%, M: 4.9%, | |
| BUSCO | C: 91.2%[D: 0.8%], F: 4.8%, M: 4.0%, | |
| CEGMA | C: 226 (91.13%), P: 240 (96.77%) | |
Note.—BUSCO, Benchmarking Universal Single Copy Orthologs (Simão et al. 2015): C, complete; D, duplicated; F, fragmented; M, missing. CEGMA, Core Eukaryotic Genes Mapping Approach (Parra et al. 2009): C, complete; P, complete and/or partial.
BUSCO and CEGMA results for final assembly only.
Repetitive Content of the Humpback Whale (Megaptera novaeangliae) Genome, Estimated with a Library of Known Mammalian Repeats (RepBase) and De Novo Repeat Identification (RepeatModeler).
| RepBase | RepeatModeler | |||
|---|---|---|---|---|
| Repeat Type | Length (bp) | % Genome (38.85 total) | Length (bp) | % Genome (30.25 total) |
| SINEs | 137,574,621 | 6.07 | 75,509,694 | 3.33 |
| LINEs | 440,955,223 | 19.46 | 432,017,456 | 19.07 |
| LTR | 142,117,286 | 6.27 | 94,177,184 | 4.16 |
| DNA transposons | 84,243,186 | 3.72 | 54,015,996 | 2.38 |
| Unclassified | 1,303,231 | 0.06 | 4,339,463 | 0.19 |
| Satellites | 48,894,580 | 2.16 | 197,862 | 0.01 |
| Simple repeats | 20,779,839 | 0.92 | 20,848,394 | 0.92 |
| Low complexity | 4,167,187 | 0.18 | 4,281,173 | 0.19 |
. 1.Substitution rates in cetacean genomes. (A) Maximum likelihood phylogeny of 12 mammals based on 2,763,828 fourfold degenerate sites. Branch lengths are given in terms of substitutions per site, except for branches with hatched lines which are shortened for visual convenience. All branches received 100% bootstrap support. (B) Based on the phylogeny in (A), a comparison of the estimated DNA substitution rates (in terms of substitutions per site per million years) between terminal and internal cetacean branches, and terminal and internal branches of all other mammals.
. 2.Timescale of humpback whale evolution. (A) Species phylogeny of 28 mammals constructed from 152 orthologs and time-calibrated using MCMCtree. Branch lengths are in terms of millions of years. Node bars indicate 95% highest posterior densities of divergence times. Cetaceans are highlighted in the gray box with mean estimates of divergence times included. (B) The effective population size (Ne) changes over time. Demographic histories of two North Atlantic humpback whales estimated from the PSMC analysis, including 100 bootstrap replicates per analysis. Mutation rate used was 1.54e-9 per year and generation time used was 21.5 years.
GO Terms for Biological Processes Overrepresented by Genes Overlapping Genomic Regions with Elevated Substitution Rates That Are Unique to the Cetacean Lineage.
| Go Term | Description | Number of Genes | Fold Enrichment |
|
|---|---|---|---|---|
| GO:0007608 | Sensory perception of smell | 157 | 4.58 | 1.24E-40 |
| GO:0006956 | Complement activation | 39 | 2.91 | 4.91E-05 |
| GO:0019724 | B-cell-mediated immunity | 39 | 2.91 | 4.91E-05 |
| GO:0032989 | G-protein-coupled receptor signaling pathway | 159 | 2.44 | 1.96E-17 |
| GO:0042742 | Defense response to bacterium | 39 | 2.44 | 2.20E-03 |
| GO:0009607 | Response to biotic stimulus | 43 | 2.09 | 1.85E-02 |
| GO0007155 | Cell adhesion | 101 | 1.99 | 1.74E-06 |
| GO:0007267 | Cell–cell signaling | 137 | 1.69 | 2.84E-05 |
After Bonferonni correction for multiple testing.
. 3.Diversity in both body size and lifespan within rorqual baleen whales (Balaenopteridae) and dolphins (Delphinidae). Maximal pairings using phylogenetic targeting (Arnold and Nunn 2010) of genome assembly-enabled cetaceans resulted in the most extreme divergence in both body size and lifespan between humpback whale (Megaptera novaeangliae) and common minke whale (Balaenoptera acutorostrata) within the Balaenopteridae, facing right, and orca (Orcinus orca) and bottlenose dolphin (Tursiops truncatus) within the Delphinidae, facing left. Trait data were collected from the panTHERIA (Jones et al. 2009) and AnAge (Tacutu et al. 2012) databases.
CGC Genes with dN/dS > 1 as Revealed by Pairwise Comparisons of Cetacean Genomes.
| Comparison | Gene Symbol | Gene Name | Role in Cancer | Function |
|---|---|---|---|---|
| Minke:humpback | CD274 | CD274 molecule | TSG | Plays a critical role in induction and maintenance of immune tolerance to self |
| ETNK1 | Ethanolamine kinase 1 | TSG | Suppresses escaping of programmed cell death | |
| IL21R | Interleukin 21 receptor | Fusion | The ligand binding of this receptor leads to the activation of multiple downstream signaling molecules, including JAK1, JAK3, STAT1, and STAT3.2 | |
| MYOD1 | Myogenic differentiation 1 | Fusion | Regulates muscle cell differentiation by inducing cell cycle arrest, a prerequisite for myogenic initiation | |
| PHF6 | PHD finger protein 6 | TSG | Encodes a protein with two PHD-type zinc finger domains, indicating a potential role in transcriptional regulation, that localizes to the nucleolus | |
| Orca:dolphin | BTG1 | B-cell translocation gene 1; antiproliferative | TSG; fusion | Member of an antiproliferative gene family that regulates cell growth and differentiation |
| CD274 | CD274 molecule | TSG; fusion | Plays a critical role in induction and maintenance of immune tolerance to self | |
| FANCD2 | Fanconi anemia; complementation group D2 | TSG | Suppresses genome instability and mutations; promotes escaping programmed cell death; suppresses proliferative signaling; suppresses invasion and metastasis | |
| FAS | Fas cell surface death receptor | TSG | Promotes cell replicative immortality; promotes proliferative signaling; promotes invasion and metastasis; suppresses escaping programmed cell death | |
| FGFR4 | Fibroblast growth factor receptor 4 | Oncogene | Promotes proliferative signaling; promotes invasion and metastasis | |
| GPC3 | Glypican 3 | Oncogene; TSG | Promotes invasion and metastasis; promotes suppression of growth | |
| HOXD11 | Homeobox D11 | Oncogene; fusion | The homeobox genes encode a highly conserved family of transcription factors that play an important role in morphogenesis in all multicellular organisms | |
| HOXD13 | Homeobox D13 | Oncogene; fusion | ||
| LASP1 | LIM and SH3 protein 1 | Fusion | The encoded protein has been linked to metastatic breast cancer, hematopoetic tumors such as B-cell lymphomas, and colorectal cancer | |
| MLF1 | Myeloid leukemia factor 1 | TSG; fusion | This gene encodes an oncoprotein which is thought to play a role in the phenotypic determination of hemopoetic cells. Translocations between this gene and nucleophosmin have been associated with myelodysplastic syndrome and acute myeloid leukemiaa | |
| MYB | v-myb myeloblastosis viral oncogene homolog | Oncogene; fusion | This gene may be aberrantly expressed or rearranged or undergo translocation in leukemias and lymphomas, and is considered to be an oncogene | |
| MYD88 | Myeloid differentiation primary response gene (88) | Oncogene | Promotes escaping programmed cell death; promotes proliferative signaling; promotes invasion and metastasis; promotes tumor promoting inflammation | |
| NR4A3 | Nuclear receptor subfamily 4; group A; member 3 (NOR1) | Oncogene; fusion | Encodes a member of the steroid–thyroid hormone–retinoid receptor superfamily that may act as a transcriptional activator | |
| PALB2 | Partner and localizer of BRCA2 | TSG | This protein binds to and colocalizes with the breast cancer 2 early onset protein (BRCA2) in nuclear foci and likely permits the stable intranuclear localization and accumulation of BRCA2 | |
| PML | Promyelocytic leukemia | TSG; fusion | Expression is cell-cycle related and it regulates the p53 response to oncogenic signals | |
| RAD21 | RAD21 homolog ( | Oncogene; TSG | Promotes invasion and metastasis; suppresses genome instability and mutations; suppresses escaping programmed cell death | |
| STIL | SCL/TAL1 interrupting locus | Oncogene; fusion | Encodes a cytoplasmic protein implicated in regulation of the mitotic spindle checkpoint, a regulatory pathway that monitors chromosome segregation during cell division to ensure the proper distribution of chromosomes to daughter cells | |
| TAL1 | T-cell acute lymphocytic leukemia 1 (SCL) | Oncogene; fusion | Implicated in the genesis of hemopoietic malignancies and may play an important role in hemopoietic differentiation | |
| TNFRSF14 | Tumor necrosis factor receptor superfamily; member 14 (herpesvirus entry mediator) | TSG | The encoded protein functions in signal transduction pathways that activate inflammatory and inhibitory T-cell immune response | |
| TNFRSF17 | Tumor necrosis factor receptor superfamily; member 17 | Oncogene; fusion | This receptor also binds to various TRAF family members, and thus may transduce signals for cell survival and proliferation |
Note.—TSG, tumor suppressor gene.
Source: RefSeq.
Source: Cancer Hallmark from CGC.
GO Terms for Biological Processes Overrepresented by Genes with Pairwise dN/dS > 1 in the Orca: Bottlenose Dolphin Comparison.
| GO Term | Description | Number of Genes | Fold Enrichment |
|
|---|---|---|---|---|
| GO:0031347 | Regulation of defense response | 36 | 2.70 | 2.91E-2 |
| GO:0050776 | Regulation of immune response | 50 | 2.18 | 1.95E-3 |
| GO:0002694 | Regulation of leukocyte activation | 31 | 2.49 | 1.89E-2 |
| GO:0002275 | Myeloid cell activation involved in immune response | 29 | 2.53 | 2.37E-2 |
| GO:0002699 | Positive regulation of immune effector process | 19 | 4.46 | 1.01-E3 |
| GO:0042108 | Positive regulation of cytokine biosynthetic process | 9 | 6.87 | 2.23E-2 |
After Bonferonni correction for multiple testing.
. 4.Positively selected genes during cetacean evolution. (A) Species tree relationships of six modern cetaceans with complete genome assemblies, estimated from 152 single-copy orthologs. Branch lengths are given in coalescent units. Outgroup taxa are not shown. The complete species tree of 28 mammals is shown in supplementary figure 4, Supplementary Material online. Boxes with numbers indicate the number of positively selected genes passing filters and a Bonferroni correction detected on each branch. (B) TreeMap from REVIGO for GO biological processes terms represented by genes evolving under positive selection across all cetaceans. Rectangle size reflects semantic uniqueness of GO term, which measures the degree to which the term is an outlier when compared semantically to the whole list of GO terms. (C) Cancer gene names and functions from COSMIC found to be evolving under positive selection in the cetacean branch-site models. Superscripts for gene names indicate as follows: T, tumor suppressor gene; O, oncogene; F, fusion gene. Asterisks indicate P-value following FDR correction for multiple testing: **P < 0.01, ***P < 0.001, ****P < 0.0001.