| Literature DB >> 26908371 |
Maria Assunta Biscotti1, Marco Gerdol2, Adriana Canapa1, Mariko Forconi1, Ettore Olmo1, Alberto Pallavicini2, Marco Barucca1, Manfred Schartl3.
Abstract
Lungfish and coelacanths are the only living sarcopterygian fish. The phylogenetic relationship of lungfish to the last common ancestor of tetrapods and their close morphological similarity to their fossil ancestors make this species uniquely interesting. However their genome size, the largest among vertebrates, is hampering the generation of a whole genome sequence. To provide a partial solution to the problem, a high-coverage lungfish reference transcriptome was generated and assembled. The present findings indicate that lungfish, not coelacanths, are the closest relatives to land-adapted vertebrates. Whereas protein-coding genes evolve at a very slow rate, possibly reflecting a "living fossil" status, transposable elements appear to be active and show high diversity, suggesting a role for them in the remarkable expansion of the lungfish genome. Analyses of single genes and gene families documented changes connected to the water to land transition and demonstrated the value of the lungfish reference transcriptome for comparative studies of vertebrate evolution.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26908371 PMCID: PMC4764851 DOI: 10.1038/srep21571
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Hierarchical clustering of the lungfish tissue samples analysed based on gene expression data (see Supplementary Table S1 for details on tissue samples). Square-root transformed TPM values were employed for hierarchical cluster analysis using an average method and correlation-based dissimilarity matrix. Numbers on nodes indicate bootstrap support values. To maximize the visualization of heat map data, only transcripts with a minimum transformed TPM value of 4 in at least one sample are displayed.
Repeat elements detected in the lungfish transcriptome.
| number of elements | length occupied | percentage of sequence | |
|---|---|---|---|
| Retroelements | 3,918 | 1,408,371 bp | 1.20% |
| SINEs: | 791 | 103,659 bp | 0.09% |
| LINEs: | 2,668 | 1,178,516 bp | 1.01% |
| L2/CR1/Rex | 2,118 | 883120 bp | 0.76% |
| R2/R4/NeSL | 196 | 113911 bp | 0.10% |
| RTE/Bov-B | 10 | 1,598 bp | 0.00% |
| L1/CIN4 | 344 | 179,887 bp | 0.15% |
| LTR elements: | 459 | 126,196 bp | 0.11% |
| Gypsy/DIRS1 | 29 | 9,956 bp | 0.01% |
| Retroviral | 429 | 116,103 bp | 0.10% |
| DNA transposons | 1,902 | 377,257 | 0.32% |
| hobo-Activator | 198 | 50,625 | 0.04% |
| Tc1-IS630-Pogo | 1,554 | 293,330 | 0.25% |
| Rolling-circles | 0 | 0 | 0.00% |
| Unclassified: | 8 | 2,109 bp | 0.00% |
| Total interspersed repeats: | 1,787,737 bp | 1.53% | |
| Small RNA: | 46 | 15,534 bp | 0.01% |
| Satellites: | 5 | 412 bp | 0.00% |
| Simple repeats: | 25,339 | 938,512 bp | 0.80% |
| Low complexity: | 3,166 | 146,658 bp | 0.13% |
Sequence repeats detected in the lungfish transcriptome by RepeatMasker 4.0.3, based on the vertebrate repeats database Dfam1.2. A total of 74,318 assembled transcripts were scanned, accounting for a total length of 116,965,448 bp; overall 2,886,461 bp, accounting for 2.47% of the entire assembled transcriptome, were masked.
Figure 2Comparative transcription activity of the main TE classes in the P. annectens, L. menadoensis, and H. chinensis transcriptomes.
Activity is expressed as cumulative TPM values of the elements pertaining to each class. TPM values were calculated on a set of 2,111 evolutionarily conserved genes, as detailed in Materials and Methods. Retro: retroelements that could not be classified with certainty as LTRs or non-LTRs. L: liver; T: testis; WB: whole body.
Figure 3Cumulative expression of reverse transcriptase and integrase domains containing transcripts in the P. annectens tissues analysed.
Figure 4Phylogenetic tree of representative vertebrates based on the alignment of 226 evolutionarily conserved genes (see list in the Materials and Methods section).
Figure 5Actinodin (and) gene evolution and protein structures.
Blue boxes: Repeated motif C(N/D)PXXDPXC; black circles: and1/2; white circles: and3/4; dashed circle: putative missing gene; X: gene loss; curved arrow: duplication event; *incomplete sequence at N-ter. Boxes representing proteins reported in scale.