| Literature DB >> 35801562 |
Mitchell J Syberg-Olsen1, Arkadiy I Garber2, Patrick J Keeling1, John P McCutcheon2,3, Filip Husnik1,4.
Abstract
Prokaryotic genomes are usually densely packed with intact and functional genes. However, in certain contexts, such as after recent ecological shifts or extreme population bottlenecks, broken and nonfunctional gene fragments can quickly accumulate and form a substantial fraction of the genome. Identification of these broken genes, called pseudogenes, is a critical step for understanding the evolutionary forces acting upon, and the functional potential encoded within, prokaryotic genomes. Here, we present Pseudofinder, an open-source software dedicated to pseudogene identification and analysis in bacterial and archaeal genomes. We demonstrate that Pseudofinder's multi-pronged, reference-based approach can detect a wide variety of pseudogenes, including those that are highly degraded and typically missed by gene-calling pipelines, as well newly formed pseudogenes containing only one or a few inactivating mutations. Additionally, Pseudofinder can detect genes that lack inactivating substitutions but experiencing relaxed selection. Implementation of Pseudofinder in annotation pipelines will allow more precise estimations of the functional potential of sequenced microbes, while also generating new hypotheses related to the evolutionary dynamics of bacterial and archaeal genomes.Entities:
Keywords: zzm321990 dN/dSzzm321990 ; annotation; archaea; bacteria; genome; prediction; pseudogene
Mesh:
Year: 2022 PMID: 35801562 PMCID: PMC9336565 DOI: 10.1093/molbev/msac153
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 8.800
Fig. 1.Summary of benchmarking results, comparing pseudogene predictions by Pseudofinder to those of two other softwares: PGAP and DFAST (run with two different gene-callers). (A and B) ‘Upset’ plots (Conway et al. 2017), showing the overlap and differences between the three pipelines in pseudogenes predicted from Shewanella (A) and Sodalis (B). Each bar in the barplot represents the total number of pseudogenes that overlap between the pipelines denoted with dots below. (C) Barplots showing the types of pseudogenes that were predicted only by Pseudofinder in Shewanella and Sodalis (i.e., Pseudofinder-specific pseudogenes). Italicized numbers at the bottom of each bar indicate the number of Pseudofinder-specific pseudogenes predicted in each genome.
Fig. 2.Pseudofinder workflow: the main Annotate branch is shown in the top part of the workflow, where predicted coding and intergenic regions are compared against proteins from a reference database, allowing the software to identify truncated and run-on ORFs, fragmented genes, and highly degraded gene remnants that lack identifiable gene features. The Sleuth branch is shown in the bottom part of the workflow, where genes from a closely related reference genome are compared against the genome-of-interest to identify gene inactivations at a finer scale; these inactivations, or gene breakages, can include significant frameshift-inducing indels (i.e., indels that results in substantial changes to the protein sequence), nonsense substitutions, loss of start and stop codons, and relaxed selection (elevated dN/dS, measured using PAML, Yang 2007). Information obtained from these two branches are then consolidated and provided to the user in the form of GFF and FASTA files for downstream processing. Pseudofinder also provides multiple ways for users to visualize the results, including a PDF-formatted genome diagram/map, as well as an HTML-formatted files for interactive exploration of pseudogene predictions.