| Literature DB >> 30383852 |
Eliran Avni1, Dennis Montoya2, David Lopez2, Robert Modlin3, Matteo Pellegrini2, Sagi Snir1.
Abstract
BACKGROUND: Pseudogenes are non-functional sequences in the genome with homologous sequences that are functional (i.e. genes). They are abundant in eukaryotes where they have been extensively investigated, while in prokaryotes they are significantly scarcer and less well studied. Here we conduct a comprehensive analysis of the evolution of orthologs of Mycobacterium leprae pseudogenes in prokaryotes. The leprosy pathogen M. leprae is of particular interest since it contains an unusually large number of pseudogenes, comprising approximately 40% of its entire genome. The analysis is conducted in both broad and narrow phylogenetic ranges.Entities:
Mesh:
Year: 2018 PMID: 30383852 PMCID: PMC6211624 DOI: 10.1371/journal.pone.0204322
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Phylogenetic data, tree similarities.
| NPG—QP tree | NPG—QA tree | PG—QP tree | PG—QA tree | 16s tree | |
| NPG—QP tree | 100 | 95 | 74 | 73 | 72 |
| NPG—QA tree | 95 | 100 | 78 | 77 | 67 |
| PG—QP tree | 74 | 78 | 100 | 97 | 70 |
| PG—QA tree | 73 | 77 | 97 | 100 | 69 |
| 16 tree | 72 | 67 | 70 | 69 | 100 |
The pairwise Qfit similarity measure (in %) between the 5 trees: the QP trees both for PGs and NPGs, the QA trees both for PGs and NPGs, and the 16S tree. PGs and NPGs refer to pseudogenes and non-pseudogenes (i.e., functional genes), respectively. We mention that QP and QA refer to the set of plurality quartets and the set of all quartets, respectively. Thus, for example, the NPG—QP tree was constructed based on the plurality quartets that were computed using the NPG set.
Phylogenetic data, CQV.
| (a) | ||
| Pseudogenes—single copy | Non-pseudogenes—single copy | |
| CQV (%) | 76.78 | 72.75 |
| STD (%) | 20.32 | 22.41 |
| samples | 246,761 | 635,376 |
| (b) | ||
| Pseudogenes—multi copy | Non-pseudogenes—multi copy | |
| CQV (%) | 53.86 | 48.69 |
| STD (%) | 14.97 | 13.15 |
| samples | 569,615 | 635,376 |
(a) The CQV score, representing the average plurality vote over all plurality quartets. Below is the estimated STD (standard deviation), and number of samples (quartets). (b) The CQV score for gene trees that may contain gene duplications, representing the average plurality vote over all plurality quartets. Below is the estimated STD, and number of samples (quartets).
Fig 1RIATA results.
The parameters of the linear regression line of the pseudogenes are incorporated in the figure. Examining the number of putative HGTs vs. the number of leaves in the gene trees does not lead to an identification of an unusual HGT pattern in the pseudogenes.
Pseudogenes vs. non-pseudogenes difference.
| data set | #PG |
| #NPG |
| p-val |
|---|---|---|---|---|---|
| 911 | 0.63 | 1410 | 0.71 | -12.72 | |
| 627 | 0.43 | 1226 | 0.55 | -20.39 | |
| 887 | 0.70 | 1379 | 0.78 | -11.68 | |
| 752 | 0.54 | 1308 | 0.65 | -15.38 |
Average synteny index () results for pseudogenes (PG) and non-pseudogenes (NPG) with confidence values (p-val, expressed in natural log). Four pairs of species were compared: M. tuberculosis vs. (1) M. leprae, (2) M. smegmatis, (3) M. bovis, and (4) M. marinum. Each row contains the number of PGs that were compared, the number of NPGs that were compared (#PG and #NPG respectively), and their corresponding average synteny indeces.
Fig 2Gene expression vs. SI, and SI distribution at PGs and NPGs.
A: Synteny index (SI) values between the M. leprae and M. tuberculosis genomes were calculated and compared to median gene expression in pseudogenes, non-pseudogenes, or all genes at each synteny index value. B: Gene SI distribution as measured between M. leprae and M. tuberculosis divided to pseudogenes (PGs, right) and non-pseudogenes (NPGs, left).
Fig 3Functional analysis of PGs and NPGs.
Percent of pseudogenes and non-pseudogenes, divided into low (0.125-0.375) and high (0.625-1.0) SI, that are part of (A) intermediary metabolism and respiration and (B) lipid metabolism. (C) total number of genes in each category.
PGs and NPGs divided into functional categories.
| Non-pseudogenes | Pseudogenes | |||||||
|---|---|---|---|---|---|---|---|---|
| High synteny | Low synteny | High synteny | Low synteny | |||||
| Function | % of total | #genes | % of total | #genes | % of total | #genes | % of total | #genes |
| Cell wall and cell processes | 25% | 147 | 32% | 18 | 19% | 45 | 24% | 18 |
| Intermediary metabolism and respiration | 30% | 175 | 25% | 14 | 24% | 58 | 21% | 16 |
| Conserved hypotheticals | 17% | 101 | 11% | 6 | 14% | 33 | 16% | 12 |
| Unknown | 0% | 0 | 0% | 0 | 19% | 45 | 14% | 11 |
| Lipid metabolism | 5% | 29 | 9% | 5 | 10% | 24 | 11% | 8 |
| Regulatory proteins | 4% | 24 | 14% | 8 | 8% | 18 | 5% | 4 |
| Virulence, detoxification, adaptation | 3% | 18 | 7% | 4 | 2% | 4 | 4% | 3 |
| Information pathways | 14% | 79 | 2% | 1 | 4% | 10 | 3% | 2 |
| PE/PPE | 1% | 5 | 0% | 0 | 0% | 1 | 3% | 2 |
Pseudogenes and non-pseudogenes, divided into functional categories and separated by low (0.125-0.375) and high (0.625-1) SI.