| Literature DB >> 28187156 |
Gabriele Sales1, Bruce E Deagle2, Enrica Calura1, Paolo Martini1, Alberto Biscontin1, Cristiano De Pittà1, So Kawaguchi2, Chiara Romualdi1, Bettina Meyer3,4, Rodolfo Costa1, Simon Jarman2.
Abstract
Antarctic krill (Euphausia superba) is a key species in the Southern Ocean with an estimated biomass between 100 and 500 million tonnes. Changes in krill population viability would have catastrophic effect on the Antarctic ecosystem. One looming threat due to elevated levels of anthropogenic atmospheric carbon dioxide (CO2) is ocean acidification (lowering of sea water pH by CO2 dissolving into the oceans). The genetics of Antarctic krill has long been of scientific interest for both for the analysis of population structure and analysis of functional genetics. However, the genetic resources available for the species are relatively modest. We have developed the most advanced genetic database on Euphausia superba, KrillDB, which includes comprehensive data sets of former and present transcriptome projects. In particular, we have built a de novo transcriptome assembly using more than 360 million Illumina sequence reads generated from larval krill including individuals subjected to different CO2 levels. The database gives access to: 1) the full list of assembled genes and transcripts; 2) their level of similarity to transcripts and proteins from other species; 3) the predicted protein domains contained within each transcript; 4) their predicted GO terms; 5) the level of expression of each transcript in the different larval stages and CO2 treatments. All references to external entities (sequences, domains, GO terms) are equipped with a link to the appropriate source database. Moreover, the software implements a full-text search engine that makes it possible to submit free-form queries. KrillDB represents the first large-scale attempt at classifying and annotating the full krill transcriptome. For this reason, we believe it will constitute a cornerstone of future approaches devoted to physiological and molecular study of this key species in the Southern Ocean food web.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28187156 PMCID: PMC5302830 DOI: 10.1371/journal.pone.0171908
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Outline of the assembly process.
Raw Illumina reads were first trimmed for adapters and for low quality bases at the 3’ end. They were digitally normalized with the khmer software to reduce redundancy. These sequences were independently assembled using different software (OASIS, Trinity, IDBA and SOAP) and kmer sizes (23, 33, 43, 53). Information deriving from a previous assembly based on the 454 sequencing technology was added to further increase the transcriptome coverage. Repeated sequences were identified and removed using the RepeatMasker software in order to reduce the number of chimeric misassemblies. All surviving fragments were merged into a single transcriptome using the Evidential Gene pipeline. Results were annotated using sequence homology (BLAST) and protein domain searches (InterproScan).
Quality measures of the three de-novo assembly algorithms.
| Software | # Frags | # tot Bases | Avg. Frag. Length | N50 | % Matches | # Proteins |
|---|---|---|---|---|---|---|
| IDBA | 183,688 | 78,682,848 | 428 (± 398) | 445 | 13,06% | 15,747 |
| Butterfly | 190,588 | 107,415,431 | 564 (± 596) | 770 | 12,94% | 14,927 |
| Oases ( | 237,717 | 159,270,607 | 670 (± 742) | 1,009 | 13,79% | 14,291 |
| SOAP ( | 123,855 | 54,662,513 | 441 (± 428) | 484 | 16,88% | 14,998 |
| Evigene | 133,962 | 129,183,922 | 964 (± 840) | 1,294 | 67,27% | 27,928 |
Fig 2(A) The home page of KrillDB. (B) An example of search results. Here we queried the database for all hits containing annotations related to the ‘HSP90’ protein family.
Fig 3Further exploration of results obtained with the search shown in Fig 2.
(A) The summary page for a single transcript. Links to the sequence and similarity sections are highlighted. (B) Sequence records. Nucleotide and amino acid sequences are displayed and can be downloaded as text file in the FASTA format. (C) Sequence similarity results obtained from BLAST are both summarized in a table and (D) depicted graphically to show the matches among different regions of the query and the target sequences.
Fig 4(A) Each transcript summary links to detailed sections about protein domains, gene ontology annotations and expression levels. (B) Protein domains detected within the transcript are visualized along with their ID, description, e-value and position on the sequence. (C) The list of Gene Ontology categories inferred by InterproScan. (D) Expression levels for each sequenced sample estimated by the RSEM software.
Fig 5(A) Transcript fragments are clustered into groups putatively corresponding to genes. Each transcript page is thus linked to a group page. (B)Summary of a transcript group, showing a graphical comparison of the lengths of its members, the most significant BLAST hits and (C) pairwise alignment of all transcripts within the group.