| Literature DB >> 26590213 |
Sandy Richter1, Francine Schwarz2, Lars Hering3, Markus Böggemann4, Christoph Bleidorn5.
Abstract
Glyceridae (Annelida) are a group of venomous annelids distributed worldwide from intertidal to abyssal depths. To trace the evolutionary history and complexity of glycerid venom cocktails, a solid backbone phylogeny of this group is essential. We therefore aimed to reconstruct the phylogenetic relationships of these annelids using Illumina sequencing technology. We constructed whole-genome shotgun libraries for 19 glycerid specimens and 1 outgroup species (Glycinde armigera). The chosen target genes comprise 13 mitochondrial proteins, 2 ribosomal mitochondrial genes, and 4 nuclear loci (18SrRNA, 28SrRNA, ITS1, and ITS2). Based on partitioned maximum likelihood as well as Bayesian analyses of the resulting supermatrix, we were finally able to resolve a robust glycerid phylogeny and identified three clades comprising the majority of taxa. Furthermore, we detected group II introns inside the cox1 gene of two analyzed glycerid specimens, with two different insertions in one of these species. Moreover, we generated reduced data sets comprising 10 million, 4 million, and 1 million reads from the original data sets to test the influence of the sequencing depth on assembling complete mitochondrial genomes from low coverage genome data. We estimated the coverage of mitochondrial genome sequences in each data set size by mapping the filtered Illumina reads against the respective mitochondrial contigs. By comparing the contig coverage calculated in all data set sizes, we got a hint for the scalability of our genome skimming approach. This allows estimating more precisely the number of reads that are at least necessary to reconstruct complete mitochondrial genomes in Glyceridae and probably non-model organisms in general.Entities:
Keywords: Glyceridae; group II introns; mitogenomics; sequencing coverage; venomous annelids; whole-genome shotgun sequencing
Mesh:
Substances:
Year: 2015 PMID: 26590213 PMCID: PMC4700955 DOI: 10.1093/gbe/evv224
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
List of Glycerid Species (Glyceridae, Annelida) and One Outgroup Species Glycinde armigera (FS17) for Which WGS Libraries Were Constructed
| Species | Origin of Species | Labcode | Accession |
|---|---|---|---|
| Drakes Bay, 45 m depth, CA, coll. April 2003 | FS01 | ||
| Barnstable Harbor, MA, coll. September 2001 | FS12 | KT989321 | |
| Tampa Bay, FL, coll. March 2013 | FS23 | KT989330 | |
| Bamfield, Vancouver Island, BC, Canada, coll. March 2008 | FS10 | KT989319 | |
| White Sea Biological Station, Russia, coll. July 2010 | FS11 | KT989320 | |
| Antarctica, Long 2°59.33′ W, Lat 62°0.64′ S, coll. December 2007 | FS09 | ||
| Wellfleet/Loagy Bay, MA, coll. September 2001 | FS05 | KT989318 | |
| Roscoff, France, coll. April 2011 | FS14 | KT989323 | |
| Baie de Morlaix, France, coll. June 2012 | FS06 | ||
| Baie de Morlaix, France, coll. June 2012 | FS07 | ||
| Asamushi, Japan, coll. August 2012 | FS22 | ||
| Monterey Bay, CA, coll. March 2003 | FS21 | KT989329 | |
| Antarctica, Long 0°01.12′ W, Lat 52°01.98′ S, coll. December 2007 | FS08 | ||
| Banyuls-sur-Mer, France, coll. November 2003 | FS18 | KT989326 | |
| Roscoff, France, coll. April 2010 | Glytri | KT989331 | |
| Eilat, Israel, coll. March 2011 | FS19 | KT989327 | |
| Saint-Efflam, France, coll. April 2011 | FS20 | KT989328 | |
| Banyuls-sur-Mer, France, coll. November 2003 | FS15 | KT989324 | |
| Bellingham Bay, WA, coll. August 2002 | FS17 | KT989325 | |
| Bamfield, Vancouver Island, BC, Canada, coll. March 2008 | FS13 | KT989322 |
Note.—The sequences of complete mitochondrial genomes have been deposited at GenBank under the mentioned accession numbers.
FOverview of the methodical approaches focused in this study. The studied data sets comprise the original and reduced data sets (10 million, 4 million, and 1 million reads) (boxes marked in green). To resolve the regarding scientific questions (boxes marked in red), several methodical approaches were used (boxes marked in white).
Data Sets Used for Maximum likelihood analyses (ML) and Bayesian Inference (BI)
| Data Set | Loci | Third Position | Alignment Positions (bp) |
|---|---|---|---|
| MLall/BIall | mt, 18S, 28S, ITS1, ITS2 | Yes | 19,991 |
| MLall/BIall | mt, 18S, 28S, ITS1, ITS2 | No | 16,274 |
| MLmt + nucl/BImt + nucl | mt, 18S, 28S | Yes | 18,802 |
| MLmt + nucl/BImt + nucl | mt, 18S, 28S | No | 15,085 |
| MLmt/BImt | mt | Yes | 13,270 |
| MLmt/BImt | mt | No | 9,553 |
| MLnucl + ITS/BInucl + ITS | 18S, 28S, ITS1, ITS2 | — | 6,721 |
| MLnucl/BInucl | 18S, 28S | — | 5,532 |
Note.—The third codon positions of the protein-coding genes were either included or excluded from the analyses. mt, mitochondrial genome, comprising the 13 protein-coding mitochondrial genes and the 2 ribosomal mitochondrial genes (sRNA and lRNA).
FNucleotide composition of complete mitochondrial genomes in 13 glycerid taxa (mean values ± standard deviations) and the outgroup species Glycinde armigera (FS17). (A) AT content, (B) AT skew, (C) GC skew. The values are shown for the complete mitochondrial genome (Genome), the 13 protein-coding genes considering all codon positions (all) as well as only the first and second (1 + 2) and the third (3rd) codon position, the 2 rRNAs and 22 tRNAs (tRNA), respectively. Note the disparity in the outgroup species for most of the parameters.
FMitochondrial gene orders and group II introns within Glyceridae. (A) Mitochondrial gene order arrangements of the complete mitochondrial genome of (Δ) the outgroup species Glycinde armigera (FS17), (○) the glycerid species Glycera tesselata (FS18), (•) Glycera dibranchiata (FS05), Glycera tridactyla (Glytri), Glycera cf. tridactyla (FS19), Glycera cf. tridactyla (FS20) (cf. clade 1 in figs. 4 and 5), and (▪) Glycera americana (FS12), G. americana (FS23), Glycera capitata (FS10), G. capitata (FS11), Glycera fallax (FS14), Glycera oxycephala? (FS21), Glycera unicornis (FS15), and Hemipodia simplex (FS13) (cf. clades 2 and 3 in figs. 4 and 5). (B) Group II introns identified inside the cox1 gene of G. fallax (FS14) (I1 and I2) and G. unicornis (FS15). Note that the group II intron (I2) of G. fallax and the group II intron of G. unicornis start at the exactly same position (directly after CDS position 700) of the CDS of the cox1 gene. (C) Intergenic region of approximately 2,988 bp located between the genes nad3 and nad2 in G. unicornis (FS15).
FPhylogeny of Glyceridae based on ML and Bayesian inference for a data set comprising the 13 protein-coding mitochondrial genes, 2 ribosomal mitochondrial genes, and four loci from the nuclear ribosomal cluster (18SrRNA, 28SrRNA, ITS1, and ITS2). The data set includes the third codon position of the protein-coding genes. Scale bars indicate the number of substitutions per site. (A) The ML phylogeny obtained with RAxML v.8.0.5 represents the best tree under a data set-specific GTR + GAMMA + I substitution model. Bootstrap support values (>50%) from 1,000 pseudoreplicates are given at the nodes. (B) For the Bayesian analysis, the 50% majority rule consensus tree was obtained from two independent runs using PhyloBayes MPI v.1.5a (CAT-GTR; 30,000 generations each, burn-in 5,000 each). Posterior probability values (>0.50) are given at the nodes.
FPhylogeny of Glyceridae based on ML and Bayesian inference for a data set comprising the 13 protein-coding mitochondrial genes and the 2 ribosomal mitochondrial genes. The data set includes the third codon position of the protein-coding genes. Scale bars indicate the number of substitutions per site. (A) The ML phylogeny obtained with RAxML v.8.0.5 represents the best tree under a data set-specific GTR + GAMMA + I substitution model. Bootstrap support values (>50%) from 1,000 pseudoreplicates are given at the nodes. (B) For the Bayesian analysis, the 50% majority rule consensus tree was obtained from two independent runs using PhyloBayes MPI v.1.5a (CAT-GTR; 30,000 generations each, burn-in 5,000 each). Posterior probability values (>0.50) are given at the nodes.
FSummarized statistics of the coverage analyses in Glyceridae. (Stacked) Bar graphs (A–C) illustrating the original data sets (marked in orange) of 13 studied glycerid taxa and the corresponding reduced data sets consisting of 10 million reads (marked in black), 4 million reads (marked in dark gray) and 1 million reads (marked in light gray). Absolute numeric values are plotted for the original data sets, mean values are plotted for the reduced data sets. Mean values and standard deviations were calculated from ten subsamples analyzed for each specimen per data set size. (A) Number of reads that mapped to the species-specific mitocontig originated from the original data sets and the resulting contig coverage. The asterisk (*) indicates a multiplication factor of 103. For details on the calculation of the contig coverage, see “Coverage Studies” in the Materials and Methods section. Successful recovery of mitochondrial genomes dependent on data set sizes. (B) To determine the “relative mitosize,” the cumulative length of broken mitocontigs was referred to the length of the corresponding complete mitogenome (equal to 100%) originating from the original data sets. (C) Number of contigs representing the mitochondrial genome for each studied glycerid specimen and data set size (referred to as “contig count”). Boxplots showing the distribution of the relative mitosize (D) and the number of obtained mitocontigs (E) across all subsamples and data set sizes. They are based on the data of ten replicates per data set size (10 million, 4 million, and 1 million reads) per studied glycerid specimen (13 libraries), here plotted as dark gray data points. The interquartile range, comprising the middle 50% of the data points, is highlighted in orange.
FHistogram comprising the c-values of 1,985 invertebrate species. The published c-values of two glycerid species are highlighted in red (c-value = 1.33 Glycera lapidum; c-value = 3.5 Glycera americana). Note that 62.42% of the included invertebrate species have comparatively smaller genome sizes than G. lapidum. c-Values above a value of 10 are not shown. The c-values were taken from the Animal Genome Size Database (Gregory 2015, last accessed December 2, 2015).
Comparison of Three High-Throughput Sequencing Strategies regarding Their Application, Potential Advantages and Disadvantages, and Technological Issues
| RNA Sequencing | Target Enrichment | Genome Skimming | |
|---|---|---|---|
| Technology | |||
| Principle | High-throughput sequencing | High-throughput sequencing | High-throughput sequencing |
| Material | RNA | DNA | DNA |
| Hints for application | |||
| Prior genomic resources required | No | Yes | No |
| Limitations by starting material | RNA has to be available | DNA | DNA |
| Recommend taxon number | Flexible | Huge number recommended | Flexible |
| Required amount of RNA/DNA | Low | Low | Low |
| Genome size of species | Less relevant | Less relevant | Important |
| Workload | Time intensive | Time intensive | Fast and easy method |
| Application | |||
| Ability to identify single copy genes | Yes | Yes | Maybe |
| Ability to distinguish different isoforms | Yes | No | No |
| Ability to analyze expression levels | Yes | No | No |
| Ability to analyze intron–exon structure | No | Yes (require prior information) | Yes |