| Literature DB >> 30496475 |
Andrei Rozanski1, HongKee Moon1, Holger Brandl1, José M Martín-Durán2, Markus A Grohme1, Katja Hüttner3, Kerstin Bartscherer3,4, Ian Henry1, Jochen C Rink1.
Abstract
Flatworms (Platyhelminthes) are a basally branching phylum that harbours a wealth of fascinating biology, including planarians with their astonishing regenerative abilities and the parasitic tape worms and blood flukes that exert a massive impact on human health. PlanMine (http://planmine.mpi-cbg.de/) has the mission objective of providing both a mineable sequence repository for planarians and also a resource for the comparative analysis of flatworm biology. While the original PlanMine release was entirely based on transcriptomes, the current release transitions to a more genomic perspective. Building on the recent availability of a high quality genome assembly of the planarian model species Schmidtea mediterranea, we provide a gene prediction set that now assign existing transcripts to defined genomic coordinates. The addition of recent single cell and bulk RNA-seq datasets greatly expands the available gene expression information. Further, we add transcriptomes from a broad range of other flatworms and provide a phylogeny-aware interface that makes evolutionary species comparisons accessible to non-experts. At its core, PlanMine continues to utilize the powerful InterMine framework and consistent data annotations to enable meaningful inter-species comparisons. Overall, PlanMine 3.0 thus provides a host of new features that makes the fascinating biology of flatworms accessible to the wider research community.Entities:
Year: 2019 PMID: 30496475 PMCID: PMC6324014 DOI: 10.1093/nar/gky1070
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Workflow for the SMESG gene predictions, see text and user guide for explanations. (B) Pie chart sub-categorization of the 82 007 gene models in the SMESG dataset. Repeat related: Models that overlap >25% with a genomic repeat annotation. Low confidence: Models with <25% repeat overlap and <0.1 rpkm expression level. HC: Models with <25% repeat overlap and >0.1 rpkm expression in the above RNA-seq dataset. (C) Ground truth comparison of SMESG predictions versus a set of 1322 HC transcripts (see user guide for details). ‘Matches’: Base pair level identity between SMESG prediction and transcript; deviations include ‘Alternative splicing’ as likely biological cause, ‘Mismatches’ with length deviation between transcript and gene model and ‘Missing’ genes without gene prediction. ‘Alignment issues’ group transcripts that could not be assayed due to technical GMAP alignment failures. (D) Correction of transcriptome chimeras by the SMESG gene models. Pie chart summary of the fraction of 108 chimeras in the dd_Smed_v6 reference transcriptome (see user guide for details) in which both (Corrected), at least one (Partially corrected) or none of the two ORFs (Uncorrected) were matched by SMESG models. (E) Correction of 407 genes with fragmented transcript representation in the reference transcriptome dd_Smed_v6 (see user guide for details). ‘Corrected’: genes with accurate full-length representation in SMESG; ‘Uncorrected’: genes with incomplete SMESG representation. (F and G) RNA-seq mapping reference comparison between indicated SMESG subsets and current reference transcriptomes. (F) RNA-seq data from the asexual Schmidtea mediterranea strain (20); (G) RNA-seq datasets from the sexual S. mediterranea strain (21). Mapping references are encoded by bar colour, bar height encodes the fraction of reads mappable against the respective reference. SRR accession number of individual RNA-seq datasets are indicated below the bars.
Figure 2.Overview of gene information page containing (A) links to our UCSC Genome Browser instance hosting the Schmidtea mediterranea genome assembly, (B) a JBrowse view giving a genomic perspective of the gene models and mapped transcript information, (C) opportunities for community feedback with regard to the included gene annotations, (D) a table containing homologous transcripts from community contributed S. mediterranea transcriptomes that are associated with our gene predictions.
Figure 3.Gene expression data from a variety of internal and community sources is integrated into PlanMine including (A) bulk gene expression studies with Smed-wnt11–1/dd_Smed_v6_14391_0_1 as example, (B) single cell gene expression studies with dd_Smed_v6_6859_0_1 as example.
Figure 4.(A) There are 25 flatworm species included in the latest PlanMine release including planarian species, parasitic flatworms, macrostomids and other flatworm species. (B) Included species are presented in form of a phylogenetic tree to represent their evolutionary relationships. (C) The phylogeny can also be used to easily select appropriate species for sequence similarity searching using BLAST. BLAST options include nematode, insect and vertebrate outgroup transcriptomes from ENSEMBL release 93 and ENSEMBL Metazoa release 40.