Literature DB >> 30351380

Multi-tissue transcriptomes of caecilian amphibians highlight incomplete knowledge of vertebrate gene families.

María Torres-Sánchez1, Christopher J Creevey2, Etienne Kornobis3, David J Gower4, Mark Wilkinson4, Diego San Mauro1.   

Abstract

RNA sequencing (RNA-seq) has become one of the most powerful tools to unravel the genomic basis of biological adaptation and diversity. Although challenging, RNA-seq is particularly promising for research on non-model, secretive species that cannot be observed in nature easily and therefore remain comparatively understudied. Among such animals, the caecilians (order Gymnophiona) likely constitute the least known group of vertebrates, despite being an old and remarkably distinct lineage of amphibians. Here, we characterize multi-tissue transcriptomes for five species of caecilians that represent a broad level of diversity across the order. We identified vertebrate homologous elements of caecilian functional genes of varying tissue specificity that reveal a great number of unclassified gene families, especially for the skin. We annotated several protein domains for those unknown candidate gene families to investigate their function. We also conducted supertree analyses of a phylogenomic dataset of 1,955 candidate orthologous genes among five caecilian species and other major lineages of vertebrates, with the inferred tree being in agreement with current views of vertebrate evolution and systematics. Our study provides insights into the evolution of vertebrate protein-coding genes, and a basis for future research on the molecular elements underlying the particular biology and adaptations of caecilian amphibians.
© The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

Entities:  

Keywords:  Gymnophiona; RNA-seq; gene families; phylogenomics; skin-specific genes

Mesh:

Substances:

Year:  2019        PMID: 30351380      PMCID: PMC6379020          DOI: 10.1093/dnares/dsy034

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


1. Introduction

High-throughput sequencing (HTS) technologies and associated bioinformatics are transforming the study of evolutionary and comparative genetics, offering an unprecedented opportunity to characterize and understand diversity and function in both model and non-model organisms. In this context, one recent revolution is the use of HTS technologies to analyse sets of RNA molecules, transcriptomes, on a massively parallel scale., The transcriptome is a snapshot in time of genes transcribed in the tissue or cells sampled. Investigation of transcriptomes can allow the identification of functional elements of genomes, reveal molecular constituents of cells and tissues, help understand organismal development and disease, and has the potential to uncover the role of tissue-specific evolution in biological diversity. Having entered the phylogenomics era, RNA-seq has also become a powerful complement of de novo genome sequencing, particularly helping with functional annotation and gene expression assessment, and is sometimes the only practical approach to scan and survey gene diversity in organisms with large genomes that still lack reference genomic data. A general strategy for this approach is to pool the mRNA data from a wide range of tissues (from different individuals and/or stages of development) to assemble a reference dataset of the genes of the species (i.e. a proxy of the reference genome of the species). We have applied the pooling of tissue-specific reads from RNA-seq to the study of tissue-specific transcriptomic landscapes of five species of caecilian amphibians (order Gymnophiona) representing four of the ten currently recognized families (Caeciliidae, Rhinatrematidae, Siphonopidae and Typhlonectidae) and a range of ecologies and degrees of evolutionary divergence (including coverage of both branches of the basal evolutionary divergence within the order). Caecilians are, along with frogs and salamanders, one of the three orders of extant amphibians. They are a highly specialized group with elongate, annulated, limbless bodies, reduced visual systems and with paired bilateral sensory tentacles on the snout. There are 207 currently recognized extant species classified in 32 genera, with mainly tropical distributions and mainly burrowing habits. Most are terrestrial as adults, living in soil, but several species of the Typhlonectidae (including the one sampled here) are fully aquatic. Caecilians are an old group, with at least 250 million years (myr) of separate evolution from their sister-group, the frogs and salamanders. Due to their specialized body form, ecological distinctiveness and phylogenetic position in the vertebrate tree of life, caecilians are interesting for macro-evolutionary, life history and evolutionary developmental biology research. We provide a first large-scale characterization of caecilian genomes using multi-tissue transcriptomic landscapes generated with RNA-seq. We use two complementary approaches to investigate features of caecilian protein-coding sequences in a vertebrate comparative framework. First, we assess the degree to which homologous elements of caecilian functional genes of varying tissue specificity can be identified across 51 other vertebrates. This reveals a high number of unclassified candidate gene families that are transcribed differentially across tissue types in caecilians. Comparisons between the already known vertebrate gene families and the potentially novel gene families found in caecilians highlight the relevance of skin-specific genes and the poor characterization of the molecular elements of caecilian skin. Here, we start addressing this knowledge gap by identifying protein domains for the caecilian skin-specific genes. Second, we infer the phylogenetic relationships of the five sampled caecilian species and the same set of 51 vertebrates based on candidate orthologous genes. This study provides new information about the functional elements of the genome and phylogenomics of caecilians and highlights distinctive and singular genes for the most neglected amphibian order.

2. Materials and methods

2.1. Sample preparation and high-throughput sequencing

This study includes novel data from five caecilian species: Rhinatrema bivittatum (Guérrin-Méneville, 1838), Caecilia tentaculata Linnaeus, 1758, Typhlonectes compressicauda (Duméril & Bibron, 1841), Microcaecilia unicolor (Duméril, 1861) and Microcaecilia dermatophaga Wilkinson, Sherratt, Starace & Gower, 2013. Different tissues (skin, posterior skin [from the posterior end of the body], foregut, muscle, liver, kidney, lung, heart, spleen and testis) were collected from freshly sacrificed, captive (but wild caught, in French Guiana) maintained specimens anesthetized with tricaine methanesulphonate (MS222). Biopsy samples were cut into pieces thinner than 0.25 cm in any single dimension, immediately soaked in RNAlater stabilization solution (Qiagen), incubated at 4°C overnight (to allow the solution to thoroughly penetrate the tissue) and stored at −20°C. Numbers of specimens and of tissues sampled per species, voucher and sampling information are given in Table 1 and Supplementary Table S1.
Table 1

Information on the species-specific caecilian transcriptome assemblies and their annotation

SpeciesNTContigs% CEGsProtein-coding genesveNOG annotationKVGF annotation
Caecilia tentaculata110142,50297.1827,38418,36812,937
Microcaecilia dematophaga14106,29897.1822,05817,09911,670
Microcaecilia unicolor29146,34897.5826,30218,48712,719
Rhinatrema bivittatum210201,58497.5834,65419,86313,429
Typhlonectes compressicauda17134,39497.5827,60318,30212,293

N: number of specimens; T: number of tissues; % CEGs: percentage completeness core eukaryotic genes; veNOG annotation: number of genes with similarity match in veNOG database; KVGF annotation: number of known vertebrate gene families with caecilian genes.

Information on the species-specific caecilian transcriptome assemblies and their annotation N: number of specimens; T: number of tissues; % CEGs: percentage completeness core eukaryotic genes; veNOG annotation: number of genes with similarity match in veNOG database; KVGF annotation: number of known vertebrate gene families with caecilian genes. RNA was isolated using the RNeasy Fibrous Tissue Mini Kit (Qiagen) using the manufacturer’s instructions, following tissue disruption and homogenization with TissueRuptor (Qiagen). RNA quantity and quality was assessed with Qubit 2.0 fluorometer, NanoDrop 1000 spectrophotometer and Agilent 2100 Bioanalyzer (RNA Nano Chip). Forty RNA extractions with RNA integrity number, RIN, values ranging from 7.8 to 10 were selected for RNA-seq. These 40 selected samples included RNA extractions of skin, liver and kidney for all five caecilian species, as well as a selection of other tissues (foregut, muscle, lung, heart, spleen, testis) each available for only a subset of the species (see Supplementary Table S1). Unstranded paired-end sequencing after poly-A enrichment and TruSeq library preparation was carried out on the Illumina HiSeq2000 platform at Macrogen (16 RNA extraction samples) and BGI Tech Solutions (24 RNA extraction samples) using ten dual flow cells, two lanes per sample. All RNA extractions from the same tissue were sequenced by the same company.

2.2. Raw data processing and de novo assembly

Paired-end RNA-seq raw reads (100 nucleotides long) of each of the 40 tissue samples were trimmed individually and filtered by PRINSEQ 0.20.3 after inspection of the FastQC 0.11.2 quality control report. In all cases, the first 15 bases from the 5′ end of the reads, optical duplicates and reads with an average Phred quality score below 25 were removed. Separate de novo assemblies were performed for each of the five caecilian species employed in the study (species-specific transcriptome assemblies). These were carried out by pooling together all reads (filtered and trimmed) for tissue samples belonging to the same species (Supplementary Table S1). Reads were also pooled for all (both) specimens for each of the two species for which multiple specimens were sampled. A few preliminary de novo assembly runs of separate tissue samples (single-tissue transcriptome assemblies) were conducted on the TRUFA platform to explore parameter settings and run times. De novo species-specific assemblies were performed with Trinity r20140717 using 60 Gb of RAM (–max_memory 60G) and prior in silico normalization (with otherwise default settings). TransDecoder 2.0 was used with default settings to identify candidate protein-coding genes from the subsets of contigs with open reading frame (ORFs) in the five caecilian species-specific transcriptomes. Reads were mapped back to each assembly with Bowtie 2.0.2, post-processed with SAMtools and gene expression was estimated using the counts of reads mapping to each assembly with HTSeq 0.6.1. Multiple measures (N50, median contig length, average contig length, alignment percentage) were used for assessing the accuracy of each of the five caecilian species-specific assemblies., Likewise, we used a computational method, CEGMA 2.4, to estimate the percentage completeness of each caecilian transcriptome, and compared these with the completeness percentages of the genome assemblies of the frog Xenopus tropicalis Gray, 1864 v9.0 and v4.1. Finally, we compared our species-specific transcriptomes to other transcriptomes recently generated for four species of caecilians, including for two of our sampled species (R. bivittatum, T. compressicauda, T. natans [Fischer, 1880] and Geotrypetes seraphini [Duméril, 1859]). These previously published caecilian transcriptomes are not associated with tissue-expression information and they contain fewer ORFs than do our transcriptomes for the same species. Using similarity searches, we determined that the vast majority (89.83%) of the protein-coding genes from the previous transcriptomes occur also in our transcriptomes (using BLAST, blastp version 2.2.28 with e-value threshold of 1e-20: data not shown). Thus, the previously published caecilian genomic data were not used in our subsequent analyses.

2.3. Multigene family analysis

Contigs of the five new species-specific caecilian transcriptomes containing ORFs were aligned against predefined vertebrate-specific gene families (veNOGs) from the EggNOG 4.1 database using blastp, applying a conservative e-value threshold of 1e-20 (applying less conservative 1e-10 or 1e-5 cutoffs does not result in substantially greater annotation percentages: data not shown). Contigs with expression levels below 100 total read counts were discarded and not used in subsequent analyses. We classified all caecilian annotations (from the pooled contigs of the five species) according to the gene-expression presence across the tissues sampled. For tissue expression analysis, contigs were postulated as being expressed in a particular tissue of a particular transcriptome if they had a minimum of 10 reads aligning to them. This allowed a scale of ‘tissue presence’ to be generated, ranging from those genes found expressed in every tissue type to those found expressed in only one tissue type. The distribution of all homologues of the caecilian protein-coding genes on the vertebrate taxonomy tree from the NCBI taxonomy database was generated and visualized using phyloT and ITOL, respectively. Vertebrate taxonomy tree was built using the unique identifier, taxids, of the species that are included in the EggNOG database. Where possible, caecilian gene families were annotated with the same function as those vertebrate gene families with the best BLAST match (smallest e-value and highest BIT score) in EggNOG identified above. Transcripts with no hits to the known vertebrate gene families in EggNOG were clustered using CD-HIT 4.6.4 with a threshold of 90% amino acid sequence identity to ensure same function of the sequences clustered. These clusters were compared against protein-coding genes from currently available amphibian genomes (Ambystoma mexicanum [Shaw & Nodder, 1798], Nanorana parkeri [Stejneger, 1927], Rana catesbeiana Shaw 1802 and Xenopus laevis [Daudin, 1802]), that are not included in the EggNOG database, using blastp, with an e-value threshold of 1e-20. Clusters that remained without similarity hits after the searches against the amphibian genomes were classified as potentially (candidate) novel caecilian gene families. Of these, we calculated the number of tissues in which any gene family was expressed (as described earlier). In addition, to characterize the different tissues with a more restrictive approach than the previously used tissue presence classification, tissue specificity was postulated when 95% of total read counts in each caecilian contig belonged to a single tissue for both unclassified and known vertebrate gene families. To test if there was a greater number of candidate novel genes specific to a particular tissue type than expected by chance, the relative abundance of known vertebrate gene families versus those of candidate novel caecilian gene families were compared using a two-tailed Fisher’s exact test conducted with R 3.3.0, with the null hypothesis that there was no difference in the numbers of tissue-specific candidate novel genes. Finally, our characterization of tissue specificity expression was completed with the inference of protein–protein interactions (PPIs) and functional enrichment pathways using STRING with the option of auto-detect organism for the known vertebrate gene families; and the Pfam annotation of the uncharacterized, candidate novel caecilian gene families using HMMER 3.0 with default parameters to identify protein domains.

2.4. Orthology prediction and phylogenomic analysis

To carry out a phylogenomic analysis we identified candidate orthologous genes from across vertebrates, including our caecilian samples. To do this we used OrthoFinder 0.2.8 and used as input all predicted protein-coding genes from the caecilian transcriptomes and all protein-coding sequences for the 51 vertebrates represented in the EggNOG database. From the results of OrthoFinder analysis we filtered out any groups (orthogroups) that had more than one gene copy in one species (co-orthologues for different species and paralogues for the same species). Multiple-sequence alignments were performed individually for each of the resulting filtered orthogroups using MAFFT 7.245 with default settings, and individual gene trees were inferred using approximately maximum-likelihood with FastTree 2.1.8 and the JTT+CAT model of amino acid substitutions. We reconstructed a supertree using ASTRAL 4.10.11, which provides statistically consistent species tree inference from gene trees subject to incomplete lineage sorting,, and computed posterior probabilities and quartet support for the internal branches of the main recovered topology.

3. Results and discussion

3.1. De novo transcriptome assemblies

In total, RNA sequencing yielded nearly two billion reads (1,963,110,986), averaging 49 million reads per library. The five species-specific assemblies from pooled reads of all tissues of each species resulted in transcriptomes of a mean of 146,227 contigs with N50 values of 1,263–1,884 (Supplementary Table S2). Tissue-specific RNA-seq reads and species-specific de novo transcriptome assemblies are available from NCBI through BioProject ID number PRJNA387587. The maximum and minimum contig lengths were 27,126 and 201 (default minimum size parameter used in the assembly program) bases, respectively. The longest contig was reconstructed from the R. bivittatum transcriptome and only a few very long (see Supplementary Fig. S1) contigs were present in any of the species-specific caecilian transcriptomes. In addition to transcriptome metrics, we assessed the quality of the de novo assemblies by the extent to which each pair of raw reads (more than 95%) could be mapped to the same contig (Supplementary Table S2). On average, 27,600 protein-coding genes were identified from the contigs with ORFs, (Table 1 and Supplementary Table S2). Our caecilian transcriptome reconstructions were supported also by the annotation. At least 241 of 248 ultra-conserved core eukaryotic genes (CEGs) occur in all five species-specific transcriptomes (Table 1). For the sake of comparison, we checked also the presence of CEGs in two different genome assemblies of X. tropicalis and found 225 CEGs in the most recent (v9.0) and 219 in an earlier version (v4.1). On the basis of the quality of our transcriptome assembly reconstructions, we obtained useful reference genomic records for caecilian amphibians, the first to our knowledge that are broad and diverse in terms of species and tissues sampled. Although the metrics used to assess the quality of assemblies of transcriptomic data are controversial our caecilian transcriptome sequences contain more CEGs than the two genome assemblies of X. tropicalis used for comparison, suggesting that our reference species-specific transcriptomes are fairly complete (Table 1). Even so, the generated reference transcriptomes are not fully complete, missing specific genes related to developmental stages and to tissues not sampled in our study. As with estimates for other vertebrates, the number of protein-coding genes identified in the species-specific caecilian transcriptomes is approximately 25,000 (Table 1), and a relatively high percentage of such proteins were annotated, which is also indicative of accurate transcriptome reconstruction. Gene identification is one of the major challenges of de novo transcriptome assembly, even for Trinity assembly of paired-end sequence data that enables potentially confounding sources of variation such as alternative splicing and paralogous genes to be overcome. Thus, the numbers of protein-coding genes could be overestimated. An additional problem is that the transcriptomes are not composed solely of transcripts from protein-coding genes. Recently, it has been demonstrated that almost the entire genome is transcribed. Accordingly, caecilian contigs that are not protein-coding genes or degradation products of the same, nor possible chimeras, are postulated to be long non-coding RNAs and potentially important regulatory elements.

3.2. Vertebrate gene families and unclassified gene families

The vast majority of the annotated caecilian genes that are homologous with those vertebrate genes in the EggNOG database, are expressed in most of the (up to nine) sampled caecilian tissue types. This could be interpreted as indicative of constitutive expression and many might be housekeeping genes. Only a small proportion (see Fig. 1, one tissue fraction) of the caecilian genes with matches to EggNOG (and thus annotated) are tissue specific. This same pattern was found when comparing the pooled caecilian sample (all five species) with each of the 51 EggNOG database vertebrates, with no obvious phylogenetic pattern (Fig. 1). The number of caecilian contigs with matches to known vertebrate genes ranged from 17,099 to 19,863 per caecilian species (Table 1), representing 57.32–77.52% (mean 67.70%) of all caecilian protein-coding genes. We found that 38.75–52.91% (mean 46.36%) of the annotated caecilian genes were classified into vertebrate gene families from EggNOG.
Figure 1

Numbers and tissue presence of the annotated genes found in caecilian transcriptomes. Genes were pooled for the five sampled species-specific transcriptomes and annotated in the 51 vertebrate species available on the EggNOG database, and mapped onto a vertebrate phylogeny inferred from the NCBI’s taxids (using phyloT and ITOL). For each vertebrate taxon, the number of caecilian annotated genes is subdivided to show the number of caecilian tissue types in which those genes are expressed.

Numbers and tissue presence of the annotated genes found in caecilian transcriptomes. Genes were pooled for the five sampled species-specific transcriptomes and annotated in the 51 vertebrate species available on the EggNOG database, and mapped onto a vertebrate phylogeny inferred from the NCBI’s taxids (using phyloT and ITOL). For each vertebrate taxon, the number of caecilian annotated genes is subdivided to show the number of caecilian tissue types in which those genes are expressed. To investigate and quantify the importance of the uncharacterized genes in caecilians, we grouped these protein-coding sequences into multigene families and filtered them by excluding clusters with close similarity to genes from the available amphibian genomes. If caecilian genomes did not contain genes novel for vertebrates, it would be expected that the vast majority of their genes would belong to some already described, known vertebrate gene family or have homologous sequences in the reported amphibian genomes. However, our results indicate that less than half of the caecilian gene families belong to known vertebrate gene families. Given the sparse taxon sampling and the currently poor genomic reference record for amphibians, at least some of the unclassified gene families in caecilians could contain genes from other vertebrate taxa or be amphibian rather than caecilian specific. The absence of homologues of these caecilian gene families in other vertebrate species might reflect gene loss events, or, alternatively, faster sequence evolution in some caecilian genes. Either way, caecilians likely have many functional elements that are novel for vertebrates. A total of 177 known vertebrate and 422 novel caecilian gene families exhibit tissue-specific expression (Table 2). A significantly greater number of novel caecilian genes were expressed only in skin (P-value = 4.5e-05, Fisher’s exact test). In contrast, caecilian spleen transcripts had significantly lower than expected tissue-specific novel gene families (P-value = 0.01935, Fisher’s exact test). Among the tissue-specific known vertebrate gene families, we found significantly more predicted protein–protein interactions (PPIs) than expected by chance and functional enrichment of metabolic pathways in five caecilian tissues (foregut, kidney, liver, spleen and testis, see Supplementary Table S3). The functional enrichments observed tend to relate to well-characterized processes in these tissues such as nutrient absorption in the foregut samples (GO: 0007586), organic acid, anion and amino acid transmembrane transport in the kidney samples (GO:1903825, GO:0098656, GO:0003333), and regulation of fibrinolysis in the liver (GO:0051918), (see Supplementary Table S3). In contrast, in skin samples we did not observe significantly more PPIs than expected by chance, or functional enrichment of pathways in the genes with known annotations. This may be because the vast majority (87%) of genes with tissue-specific gene expression in skin did not match any known vertebrate gene families, the highest of any of the tissues examined (Table 2). This analysis suggests that skin-specific vertebrate gene families remain poorly characterized in general and likely have unknown, innovative functions and interactions.
Table 2

Novel tissue-specific genes in caecilians

ForegutHeartKidneyLiverLungMuscleSkinSpleenTestisTotal
Number of transcriptomes analysed425743112240
Known vertebrate gene families194211836151180177
Gene families shared with the other sampled non-caecilian amphibians6261523222581
Candidate novel caecilian gene families321240449271088142422
P-value0. 26710.78870.4639110.23554.5e-050.019350.07605

The number of transcriptomes determined for each tissue, and the tissue-specific gene families (caecilian gene families that are already known vertebrate gene families, caecilian gene families shared with the other four sampled non-caecilian amphibians, and candidate caecilian-specific gene families) are shown. The last row shows the P-value (significant values in bold font) for Fisher’s exact test of the difference between the abundance of known vertebrate gene families and those of uncharacterized candidate novel caecilian gene families. Skin tissue includes skin samples from different parts of the body: skin and posterior skin samples, see Supplementary Table S1.

Novel tissue-specific genes in caecilians The number of transcriptomes determined for each tissue, and the tissue-specific gene families (caecilian gene families that are already known vertebrate gene families, caecilian gene families shared with the other four sampled non-caecilian amphibians, and candidate caecilian-specific gene families) are shown. The last row shows the P-value (significant values in bold font) for Fisher’s exact test of the difference between the abundance of known vertebrate gene families and those of uncharacterized candidate novel caecilian gene families. Skin tissue includes skin samples from different parts of the body: skin and posterior skin samples, see Supplementary Table S1.

3.3. Skin-specific genes of caecilians

Potentially novel caecilian gene families (those without hits to known genes) expressed in skin were annotated with protein domains that might be associated causally with specializations of caecilian skin., From the uncharacterized tissue-specific clusters (108 in the skin), a total of 91 different protein domains were identified (Supplementary Table S4), including 16 domains occurring exclusively in the skin in our analysis, such as diverse proteases, amino acid storage receptors and toxin-like domains. Skin forms the barrier between the organism and the environment both physically and biochemically. It is genetically and physiologically very active throughout an animal’s life. Amphibian skin is multifunctional with additional roles in respiration, water regulation, and in defence against predators and pathogens., The defensive properties of amphibian skin rely mainly on biochemical substances secreted from specialized skin granular glands. These secretions can contain numerous bioactive components, including alkaloids, biogenic amines, peptides and proteins, some of which have been isolated and studied, particularly in frogs and salamanders. The diversity of functions and biochemical activities of amphibian skin makes it unsurprising that caecilians present specific expression patterns of novel genes, particularly considering their 250+ myr of separate evolutionary history from the other major amphibian lineages and the sustained contact between the skin and soil for most caecilian species. Indeed, some of the protein domains found exclusively in caecilian skin-specific novel gene families, such as proteases and toxin-like domains (Asp_protease_2, gag-asp_proteas, Toxin_TOLIP, UPAR_LY6, see Supplementary Table S4) point to novel skin defensive mechanisms. The maternal skin of some caecilian species plays another unique role: in provision of nutrition to newborns (maternal dermatophagy)., This behaviour occurs in several of the species sampled in this study (observed in M. dermatophaga, likely also present in M. unicolor and C. tentaculata). This phenomenon is especially interesting for understanding the evolution of viviparity because it is possibly a precursor of the oviduct feeding by fetuses that occurs in viviparous caecilians. Maternal dermatophagy involves structural and histochemical changes in the mothers’ epidermis, it becomes hypertrophied and heavily invested with energy reserves, and hence expanded gene machinery is likely needed. Amino acid storage receptor (PhaP_Bmeg, see Supplementary Table S4) is another protein domain found in skin-specific novel gene families that might be related to the unique parental care of caecilian amphibians. A final feature of caecilian skin that makes it so distinctive is the presence of scales. Scales are absent in other extant amphibians but are present, concealed in dermal pockets, in many caecilians (all except T. compressicauda of those sampled in our study). Some of the skin-specific gene families with domains of unknown function (DUF, see Supplementary Table S4) might be involved in the production and maintenance of scales. Further data and analyses are required to identify the taxonomic distribution, diversity and function of these candidate skin-specific gene families. Greater tissue sampling in the future may reveal similar patterns in other tissues, such as testis or gut, that present particularities in caecilians with respect to other amphibians that may be reflected in their genomes. For example, caecilians differ from other amphibians in that males have a copulatory organ formed from the eversible final part of the gut, as well as other autapomorphies of the sperm and internal fertilization specializations such as the Müllerian gland and the ejaculate.

3.4. Phylogenomic dataset

We obtained a total of 23,761 groups or orthogroups, of which 1,955 were groups comprising genes with only one copy in at least four vertebrate taxa. The filtered orthogroups seemingly contain no paralogous genes, at least from the same species, and are straightforward for use in phylogenomics and the study of evolutionary processes that depend upon inferred phylogenetic relationships. The number of analysed genes found in each species is detailed in Supplementary Table S5. For each of the 1,955 orthogroups phylogenetic gene trees were inferred. A supertree was retrieved from the gene trees under a multi-species coalescent model, maximizing the number of induced quartet trees (the supertree is presented in Supplementary Fig. S2). The normalized quartet score of the main topology was 0.798 (i.e. 79.8% of the quartet trees displayed by our gene trees are displayed by the supertree). The supertree constructed from the gene trees of the candidate orthologous groups recovered the main known topology of this subset of the Tree of Life (Supplementary Fig. S2). Branches within the caecilian part of the supertree are well supported as judged by both posterior probabilities and quartet support values. Among the sampled vertebrates, Lissamphibia and Gymnophiona are recovered as monophyletic, and the inferred relationships among the five caecilian species are fully congruent with those inferred in other (non-phylogenomic and phylogenomic) molecular analyses., Our results indicate that combining the information from putative orthologous genes using supertree approach is adequate to reconstruct the phylogenetic relationships among the sampled caecilians, and vertebrates in general.

4. Concluding remarks

As with other studies that have characterized transcriptomes, this study has a strong descriptive component, but it has yielded novel discoveries and represents an important turning point for genomic studies in caecilians (and vertebrates), improving prospects for future research. The species-specific de novo transcriptomes of caecilian amphibians presented here could be improved by additional sequencing of different tissues, individuals and developmental stages (e.g. the transcriptome of M. dermatophaga was built from only four tissue-type samples). In terms of sampling and biological replicates, only the species-specific transcriptomes of R. bivittatum and M. unicolor were reconstructed using more than one (two) specimen each. Obtaining fresh biological samples has been a limiting step for research on many caecilian species, and dedicated fieldwork will likely be required to investigate broadly the genomic potential of this neglected, but important group of vertebrates. Genome science has irreversibly changed the landscape of biological research. Understanding life processes and their evolutionary changes by reading the complete set of encoded instructions that each species holds is increasingly becoming a reality. Nonetheless, achieving this goal thoroughly still remains a challenge for most groups of organisms. Of the almost 6,600 eukaryotic genomes available on the NCBI database, only six records are of amphibian species: A. mexicanum, N. parkeri, R. catesbeiana, Rhinella marina Linnaeus, 1758, X. laevis and X. tropicalis (21 September 2018, date last accessed). Despite the great effort made by initiatives such as the Genome 10 K Project, and other genome-scale studies (e.g. Xenbase, Salamander Genome project), amphibians are the major group of vertebrates with fewest genomic resources available, and, importantly, there are none for the order Gymnophiona. The lack of at least one representative organism of each of the three extant amphibian orders has compromised the diversity of comparable genomic resources for vertebrates, as well as the opportunities for evolutionary and phylogenomic research. To start filling this gap, here we have reported transcriptomic data for five caecilian amphibian species, including first genomic records for three species (C. tentaculata, M. unicolor and M. dermatophaga), and characterized several unclassified candidate gene families with tissue-specific expression, especially in the skin. This provides insights into the evolution of vertebrate protein-coding genes, and further establishes the basis for gene-discovery work as well as investigation of the molecular elements underlying the singular biology of caecilian amphibians. Click here for additional data file.
  60 in total

1.  Granular gland transcriptomes in stimulated amphibian skin secretions.

Authors:  Tianbao Chen; Susan Farragher; Anthony J Bjourson; David F Orr; Pingfan Rao; Chris Shaw
Journal:  Biochem J       Date:  2003-04-01       Impact factor: 3.857

2.  Parental investment by skin feeding in a caecilian amphibian.

Authors:  Alexander Kupfer; Hendrik Müller; Marta M Antoniazzi; Carlos Jared; Hartmut Greven; Ronald A Nussbaum; Mark Wilkinson
Journal:  Nature       Date:  2006-04-13       Impact factor: 49.962

3.  Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation.

Authors:  Ivica Letunic; Peer Bork
Journal:  Bioinformatics       Date:  2006-10-18       Impact factor: 6.937

4.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes.

Authors:  Genis Parra; Keith Bradnam; Ian Korf
Journal:  Bioinformatics       Date:  2007-03-01       Impact factor: 6.937

5.  Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes.

Authors:  Anuphap Prachumwat; Wen-Hsiung Li
Journal:  Genome Res       Date:  2007-12-14       Impact factor: 9.043

Review 6.  The impact of next-generation sequencing technology on genetics.

Authors:  Elaine R Mardis
Journal:  Trends Genet       Date:  2008-02-11       Impact factor: 11.639

7.  Phylogenetic mixture models for proteins.

Authors:  Si Quang Le; Nicolas Lartillot; Olivier Gascuel
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2008-12-27       Impact factor: 6.237

8.  Global patterns of diversification in the history of modern amphibians.

Authors:  Kim Roelants; David J Gower; Mark Wilkinson; Simon P Loader; S D Biju; Karen Guillaume; Linde Moriau; Franky Bossuyt
Journal:  Proc Natl Acad Sci U S A       Date:  2007-01-09       Impact factor: 11.205

9.  One hundred million years of skin feeding? Extended parental care in a Neotropical caecilian (Amphibia: Gymnophiona).

Authors:  Mark Wilkinson; Alexander Kupfer; Rafael Marques-Porto; Hilary Jeffkins; Marta M Antoniazzi; Carlos Jared
Journal:  Biol Lett       Date:  2008-08-23       Impact factor: 3.703

10.  Sal-Site: integrating new and existing ambystomatid salamander research and informational resources.

Authors:  Jeramiah J Smith; Srikrishna Putta; John A Walker; D Kevin Kump; Amy K Samuels; James R Monaghan; David W Weisrock; Chuck Staben; S Randal Voss
Journal:  BMC Genomics       Date:  2005-12-16       Impact factor: 3.969

View more
  5 in total

1.  Morphological Evidence for an Oral Venom System in Caecilian Amphibians.

Authors:  Pedro Luiz Mailho-Fontana; Marta Maria Antoniazzi; Cesar Alexandre; Daniel Carvalho Pimenta; Juliana Mozer Sciani; Edmund D Brodie; Carlos Jared
Journal:  iScience       Date:  2020-07-03

2.  Insights into the skin of caecilian amphibians from gene expression profiles.

Authors:  María Torres-Sánchez; Mark Wilkinson; David J Gower; Christopher J Creevey; Diego San Mauro
Journal:  BMC Genomics       Date:  2020-07-27       Impact factor: 3.969

3.  What lies beneath? Molecular evolution during the radiation of caecilian amphibians.

Authors:  María Torres-Sánchez; David J Gower; David Alvarez-Ponce; Christopher J Creevey; Mark Wilkinson; Diego San Mauro
Journal:  BMC Genomics       Date:  2019-05-09       Impact factor: 3.969

4.  Evolutionary diversification of epidermal barrier genes in amphibians.

Authors:  Attila Placido Sachslehner; Leopold Eckhart
Journal:  Sci Rep       Date:  2022-08-10       Impact factor: 4.996

5.  The gastrin-releasing peptide/bombesin system revisited by a reverse-evolutionary study considering Xenopus.

Authors:  Asuka Hirooka; Mayuko Hamada; Daiki Fujiyama; Keiko Takanami; Yasuhisa Kobayashi; Takumi Oti; Yukitoshi Katayama; Tatsuya Sakamoto; Hirotaka Sakamoto
Journal:  Sci Rep       Date:  2021-06-25       Impact factor: 4.379

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.