| Literature DB >> 33045985 |
Marina Athanasouli1, Hanh Witte1, Christian Weiler1, Tobias Loschko1, Gabi Eberhardt1, Ralf J Sommer1, Christian Rödelsperger2.
Abstract
BACKGROUND: Nematode model organisms such as Caenorhabditis elegans and Pristionchus pacificus are powerful systems for studying the evolution of gene function at a mechanistic level. However, the identification of P. pacificus orthologs of candidate genes known from C. elegans is complicated by the discrepancy in the quality of gene annotations, a common problem in nematode and invertebrate genomics.Entities:
Keywords: Caenorhabditis elegans; Evolution; Genome; Orphan genes; Parasitic nematodes
Mesh:
Year: 2020 PMID: 33045985 PMCID: PMC7552371 DOI: 10.1186/s12864-020-07100-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Comparison of protein lengths between one-to-one orthologs. a One-to-one orthologous genes between C. elegans and P. pacificus have highly similar protein lengths (Pearson’s r = 0.83). b Size distributions of one-to-one orthologs show a peak at around 300 amino acids. c P. pacificus genes with more than two-fold length difference were considered for manual curation. d The P. pacificus one-to-one ortholog (PPA0494) of C. elegans lev-8, is more than twice as long as LEV-8. BLAST analysis showed that the N-terminal region has similarity to another C. elegans gene (Y37B6BL.37) suggesting that it represents an artificial gene fusion. e Manual inspection of the PPA0494 in the genome browser shows that there are two assembled RNA-seq transcripts (red) that cover most of the original gene model and further support that PPA0494 is an artificially fused gene model
Comparative assessment of different P. pacificus gene annotations
| Category | ||
|---|---|---|
| V2 | V3 | |
| Number of genes | 28,036 | 28,896 |
| Protein-coding sequence (Mb) | 35.3 | 35.3 |
| BUSCO Completeness (%) | 97.1 | 97.6 |
| BUSCO Duplicated (%) | 1.7 | 1.8 |
| BUSCO Fragmented (%) | 2.0 | 2.0 |
| BUSCO Missing (%) | 0.9 | 0.4 |
| Number of 1–1 orthologs (BRHs) | 8348 | 8607 |
| Number of 1–1 orthologs with variable protein length (%) | 532 | 265 |
| Number of proteins with atypical domain combinations | 1589 | 1137 |
| Number of protein family length outlier | 1388 | 1201 |
The table shows an overview about general characteristics of different P. pacificus gene annotations
Fig. 2Identification of candidates for manual curation. a The boxplots show the length distributions of members of 25 highly abundant gene families. The lower 10% and the upper 20% of each gene family were selected for manual inspection. b Individual screens for suspicious gene models reveal between 336 to 1077 specific candidates indicating their highly complementary. c Manual classification of P. pacificus SSOGs shows numerous genes that overlap gene models on the opposite strand. The category “Others” denotes genes that were not systematically classified as they were part of previous curations
Fig. 3Examples of unsupported SSOGs. a The P. pacificus SSOG PPA46345 overlaps exons of two other gene models that are well supported by transcriptome assemblies from strand-specific RNA-seq and Iso-seq data. b The P. pacificus SSOG PPA4618 overlaps the UTR of a well supported gene model. The absence of strand-specific transcriptomic support indicates that P. pacificus SSOGs PPA46345 and PPA4618 are likely gene prediction artifacts
Comparison of RNA-seq read alignability
| Successfully assigned alignments (%) | Reference | |||
|---|---|---|---|---|
| Accession | Description | V2 | V3 | |
| ERR777792 | Mixed-stage on | 74.8 | 76.8 | [ |
| ERR777793 | Mixed-stage on | 74.9 | 76.6 | [ |
| ERR777794 | Mixed-stage on | 74.4 | 76.1 | [ |
| SRR4017216 | Adults on | 79.8 | 81.7 | [ |
| SRR4017217 | Adults on | 80.3 | 82.2 | [ |
| SRR4017218 | Adults on | 79.6 | 81.6 | [ |
| SRR4017219 | Adults on | 79.2 | 81.1 | [ |
| SRR4017220 | Adults on | 79.9 | 81.8 | [ |
| SRR4017221 | Adults on | 80.7 | 82.6 | [ |
| ERR3421261 | Adults on | 79.7 | 81.6 | [ |
| ERR3421262 | Adults on | 79.5 | 81.3 | [ |
| ERR3421263 | Adults on | 79.6 | 81.5 | [ |
| ERR3421264 | Adults on | 79.5 | 81.5 | [ |
| SRR2142256 | Adults on | 77.8 | 79.8 | [ |
| SRR2142257 | Intestines | 72.5 | 74.2 | [ |
The table shows the percentage of assigned reads from 15 RNA-seq experiments for different P. pacificus gene annotations