| Literature DB >> 29041914 |
Chen Wu1,2, Melissa D Jordan3, Richard D Newcomb2,3, Neil J Gemmell4, Sarah Bank5, Karen Meusemann5,6, Peter K Dearden7, Elizabeth J Duncan8, Sefanie Grosser4,9,10, Kim Rutherford4, Paul P Gardner11, Ross N Crowhurst3, Bernd Steinwender2,3, Leah K Tooman3, Mark I Stevens12,13, Thomas R Buckley14,15.
Abstract
BACKGROUND: The New Zealand collembolan genus Holacanthella contains the largest species of springtails (Collembola) in the world. Using Illumina technology we have sequenced and assembled a draft genome and transcriptome from Holacanthella duospinosa (Salmon). We have used this annotated assembly to investigate the genetic basis of a range of traits critical to the evolution of the Hexapoda, the phylogenetic position of H. duospinosa and potential horizontal gene transfer events.Entities:
Keywords: Chemoreceptors; Developmental biology; Epigenetics; Genome assembly; Hexapoda; Horizontal gene transfer; Methylation; Neanuridae; Phylogenomics; RNA; Sex determination
Mesh:
Substances:
Year: 2017 PMID: 29041914 PMCID: PMC5644144 DOI: 10.1186/s12864-017-4197-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sequencing output used to assemble the Holacanthella duospinosa genome
| Insert size | Sequencing output (Gb) | Number of reads | Genome coverage (X) |
|---|---|---|---|
| 188 bp | 26.9 | 266,061,330 | 84.1 |
| 200 bp | 6.9 | 68,600,986 | 21.6 |
| 470 bp | 34.8 | 344,690,702 | 108.8 |
| 3 kb | 1.9 | 185,408,672 | 5.9 |
| 5 kb | 1.5 | 143,938,120 | 4.7 |
| Total | 72 | 1,008,699,810 | 225.1 |
Summary of the Holacanthella duospinosa genome assembly
| Size (bp) | Number | |
|---|---|---|
| N90 | 147 | 103,690 |
| N80 | 3137 | 5801 |
| N70 | 17,443 | 1588 |
| N60 | 83,545 | 567 |
| N50 | 226,503 | 317 |
| Total (>100 bp) | 370,315,149 | 410,937 |
| Total (>2 kb) | 299,867,363 | 8059 |
| Longest (bp) | 2,807,427 | |
| GC (%) | 33.40 | |
| N (%) | 2.18 | |
Summary of the Holacanthella duospinosa transcriptome assembly
| Transcriptome assembly | |
|---|---|
| Total (bp) | 108,127,906 |
| Number | 152,441 |
| N50 (bp) | 2129 |
| Shortest (bp) | 101 |
| Longest (bp) | 24,141 |
| Mean (bp) | 709 |
| Median (bp) | 234 |
| Number of contigs >500 bp | 44,149 |
| Number of contigs >1000 bp | 27,986 |
| Number of contigs >10 k bp | 183 |
| GC% | 36.25 |
Comparison of repeat components between Holacanthella duspinosa and Drosophila melanogaster genomes
|
|
| |||
|---|---|---|---|---|
| Types | Length (bp) | P% | Length (bp) | P% |
| DNA | 31,620,408 | 8.42 | 4,849,763 | 2.87 |
| LINE | 5,971,075 | 1.59 | 12,119,904 | 7.18 |
| LTR | 10,439,992 | 2.78 | 21,849,378 | 12.95 |
| SINE | 110,785 | 0.00 | 52,841 | 0.03 |
| Simple repeat | 6,196,398 | 1.65 | 2733 | 0.00 |
| Other | 640,294 | 0.17 | 698,554 | 0.41 |
| Unknown | 106,352,725 | 28.32 | 11,211,970 | 6.64 |
| Total | 161,336,129 | 42.96 | 50,785,143 | 30.00 |
Fig. 1Distribution of gene parameters for the genome assembly of Holacanthella duspinosa
Fig. 2Kmer spectrum for the genome assembly of Holacanthella duspinosa
Fig. 3Signatures of normalised CpG content (CpG[o/e]) reveal the presence and absence of historical DNA methylation in hexapods. Graphs are frequency histograms of CpG[o/e] with the y-axis depicting the number of genes with the specific CpG[o/e] values given on the x-axis. a Analysis of gene bodies in the honeybee (Apis mellifera), which has an intact DNA methylation system, reveals a bimodal distribution. b In contrast, the same analysis in Drosophila melanogaster, which does not have an intact DNA methylation system, reveals a unimodal distribution. c Analysis of Holacanthella duospinosa transcripts reveals a similar unimodal distribution consistent with the absence of an intact DNA methylation system in this species. The mean of this distribution is similar to the mean obtained for 1 kb fragments of the genome (d) and is consistent with a slightly lower than expected CpG content in the DNA sequence of H. duospinosa
The genomic copy numbers of the transfer RNA isotypes predicted by tRNAscan and Rfam. The Rfam predictions that did not overlap with tRNAscan predictions are in parentheses
| Isotype | Copy number |
|---|---|
| Ala | 21 |
| Arg | 26 |
| Asn | 9 |
| Asp | 9 |
| Cys | 11 |
| Gln | 11 |
| Glu | 13 |
| Cly | 17 |
| His | 6 |
| Ilse | 17 |
| Leu | 28 |
| Lys | 19 |
| Met | 15 |
| Phe | 18 |
| Pro | 65 |
| Ser | 548 |
| Sup | 1 |
| Thr | 21 |
| Trp | 5 |
| Tyr | 12 |
| Val | 20 |
| Pseudo | 69 |
| SeC | 6 |
| Undetermined | 9 (+374) |
Fig. 4Arrangement of the Hox gene cluster in Holacanthella duospinosa relative to the collembolan Folsomia candida and a hypothetical ancestral insect
Fig. 5Phylogenetic trees of doublesex (a) protein sequences and Sex-lethal (b) protein sequences. F and M denote female and male splice variants, respectively. Numbers above branches are bootstrap proportions where only values greater than 50% are given. The scale bars show the expected number of amino acid substitutions per site
Fig. 6Maximum likelihood phylogenetic tree of the glycosyl hydrolase family 18 (GH18) domains from chitinase-like proteins of insects. Includes sequences from Aedes aegypti (Aaeg), Anopheles gambaie (Agam), Apis mellifera (Amel), Acyrthosiphon pisum (Apis), Bombyx mori (Bmor), Cerapachys biroi (Cbir), Drosophila melanogaster (Dmel), Daphnia pulex (Dpul), Helicoverpa armigera (Harm), Holacanthella duospinosa (Hduo), Nilaparvata lugens (Nlug), Ostrinia furnacalis (Ofur), Pediculus humanus corporis (Phum), Spodoptera litura (Slit), and Tribolium castaneum (Tcas). Values at the nodes are bootstrap support percentages over 50%. Chitinase-like proteins identified from H. duospinosa are indicated in bold. Classification of chitinase groups follows [75]
Fig. 7Best phylogenetic tree inferred with a Maximum Likelihood approach by IQtree (see Methods). Non-parametric bootstrap support was derived from 300 bootstrap replicates. The tree was rooted with Diplura. For both datasets, all inferred relationships revealed maximal support except for the placement of Sminthurus viridis: in black: bootstrap support for the dataset with domain-based meta-partitions (in parentheses support from the SH-LRT test), in grey: bootstrap support for the dataset with gene-based meta-partitions (in parentheses support from the aLRT test). Nodal black dots indicate maximal bootstrap and single branch test support. Photograph: Mark I. Stevens