| Literature DB >> 30801025 |
Sarah Entwistle1, Xueqiong Li1, Yanbin Yin1,2.
Abstract
Orphan genes (also known as ORFans [i.e., orphan open reading frames]) are new genes that enable an organism to adapt to its specific living environment. Our focus in this study is to compare ORFans between pathogens (P) and nonpathogens (NP) of the same genus. Using the pangenome idea, we have identified 130,169 ORFans in nine bacterial genera (505 genomes) and classified these ORFans into four groups: (i) SS-ORFans (P), which are only found in a single pathogenic genome; (ii) SS-ORFans (NP), which are only found in a single nonpathogenic genome; (iii) PS-ORFans (P), which are found in multiple pathogenic genomes; and (iv) NS-ORFans (NP), which are found in multiple nonpathogenic genomes. Within the same genus, pathogens do not always have more genes, more ORFans, or more pathogenicity-related genes (PRGs)-including prophages, pathogenicity islands (PAIs), virulence factors (VFs), and horizontal gene transfers (HGTs)-than nonpathogens. Interestingly, in pathogens of the nine genera, the percentages of PS-ORFans are consistently higher than those of SS-ORFans, which is not true in nonpathogens. Similarly, in pathogens of the nine genera, the percentages of PS-ORFans matching the four types of PRGs are also always higher than those of SS-ORFans, but this is not true in nonpathogens. All of these findings suggest the greater importance of PS-ORFans for bacterial pathogenicity. IMPORTANCE Recent pangenome analyses of numerous bacterial species have suggested that each genome of a single species may have a significant fraction of its gene content unique or shared by a very few genomes (i.e., ORFans). We selected nine bacterial genera, each containing at least five pathogenic and five nonpathogenic genomes, to compare their ORFans in relation to pathogenicity-related genes. Pathogens in these genera are known to cause a number of common and devastating human diseases such as pneumonia, diphtheria, melioidosis, and tuberculosis. Thus, they are worthy of in-depth systems microbiology investigations, including the comparative study of ORFans between pathogens and nonpathogens. We provide direct evidence to suggest that ORFans shared by more pathogens are more associated with pathogenicity-related genes and thus are more important targets for development of new diagnostic markers or therapeutic drugs for bacterial infectious diseases.Entities:
Keywords: ORFan; horizontal gene transfer; orphan gene; pathogenic island; pathogenicity; prophage; virulence factor
Year: 2019 PMID: 30801025 PMCID: PMC6372840 DOI: 10.1128/mSystems.00290-18
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
Nine bacterial genera selected for the ORFan study
| Genus | Phylum | No. of genomes | Range of: | |||
|---|---|---|---|---|---|---|
| Total | P | NP | No. of | Genome | ||
| 79 | 34 | 45 | 2,841–6,402 | 3.1–6.0 | ||
| 33 | 27 | 6 | 4,248–8,006 | 3.0–4.4 | ||
| 32 | 17 | 15 | 2,224–5,639 | 2.5–6.5 | ||
| 51 | 35 | 16 | 1,768–2,999 | 2.0–3.4 | ||
| 57 | 47 | 10 | 3,708–5,732 | 4.0–5.7 | ||
| 40 | 28 | 12 | 2,661–3,143 | 2.8–3.1 | ||
| 54 | 44 | 10 | 1,605–6,784 | 3.3–7.0 | ||
| 51 | 18 | 33 | 3,734–6,178 | 4.2–7.1 | ||
| 108 | 90 | 18 | 1,585–2,270 | 1.8–2.4 | ||
Comparisons of the four groups of ORFans in the P and NP genomes
| Protein group | No. (%) of ORFans in | |
|---|---|---|
| Total NP genomes | Total P genomes | |
| All proteins | ||
| SS-ORFans | 17,081 (2.60) | 17,455 (1.39) |
| PS-ORFans | 56,196 (4.48) | |
| NS-ORFans | 39,437 (6.00) | |
| Non-ORFans | 600,654 (91.40) | 1,181,929 (94.13) |
| Prophage proteins | ||
| SS-ORFans | 1,459 (8.54) | 2,138 (12.24) |
| PS-ORFans | 10,539 (18.75) | |
| NS-ORFans | 3,747 (9.50) | |
| Non-ORFans | 13,071 (2.18) | 34,366 (2.91) |
| PAI proteins | ||
| SS-ORFans | 5,091 (29.81) | 5,163 (29.58) |
| PS-ORFans | 17,087 (30.41) | |
| NS-ORFans | 8,236 (20.88) | |
| Non-ORFans | 37,412 (6.23) | 84,006 (7.11) |
| VF proteins | ||
| SS-ORFans | 78 (0.46) | 116 (0.66) |
| PS-ORFans | 2,718 (4.84) | |
| NS-ORFans | 259 (0.66) | |
| Non-ORFans | 109,216 (18.18) | 210,988 (17.85) |
| HGT proteins | ||
| SS-ORFans | 5,486 (32.12) | 4,694 (26.89) |
| PS-ORFans | 13,857 (24.66) | |
| NS-ORFans | 15,587 (39.52) | |
The results shown represent 165 genomes and 657,172 proteins for total NP genomes and 340 genomes and 1,255,580 proteins for total P genomes.
FIG 1Pangenome idea to define different groups of ORFan genes and non-ORFan genes.
P values in Wilcoxon tests of P versus NP genomes of the nine genera on different subjects
| Null | |||||
|---|---|---|---|---|---|
| All proteins | Prophages | PAIs | VFs | HGTs | |
| 0.22435869 | 0.856125938 | 0.071359632 | |||
| 0.830680548 | 0.187561481 | 0.09171531 | |||
| 0.272854034 | 0.45490102 | ||||
| 0.114873724 | 0.704693746 | ||||
| 0.70368707 | 0.07518119 | ||||
| 0.930016112 | 0.795456343 | 0.738283882 | |||
| 0.424624239 | 0.816433738 | ||||
| 0.130966299 | 0.167064102 | ||||
| 0.112397253 | 0.538543551 | 0.38647105 | 0.356860357 | ||
Boldface P values are <0.05, supporting P > NP. Italic P values are >0.95, supporting P < NP.
Comparison of P and NP genomes in terms of the total number of protein-coding genes.
Comparison of P and NP genomes in terms of % genes located in prophages = no. of prophage genes/total no. of protein-coding genes in genome.
Comparison of P and NP genomes in terms of % genes located in PAIs = no. of PAI genes/total no. of protein-coding genes in genome.
Comparison of P and NP genomes in terms of % VF genes = no. of VF genes/total no. of protein-coding genes in genome.
Comparison of P and NP genomes in terms of % ORFan genes that are HGTs = no. of ORFans that are HGTs/total no. of ORFans in genome.
FIG 2The percentages of different groups of ORFans. The violin boxplots are shown with genomes represented as dots of different colors corresponding to four groups of ORFans. For each genome, the percentages of different ORFan groups are calculated as follows: % SS-ORFans = no. of SS-ORFans/total no. of proteins in the genome. Four pairs of Wilcoxon tests were performed: (i) SS-ORFans (P) versus SS-ORFans (NP), (ii) PS-ORFans (P) versus NS-ORFans (NP), (iii) PS-ORFans (P) versus SS-ORFans (P), and (iv) NS-ORFans (NP) versus SS-ORFans (NP). Only the statistically significant differences are indicated with vertical lines and asterisks (*). Red asterisks indicate P value of <0.05, supporting higher SS-ORFans (P) in test pair i, higher PS-ORFans (P) in test pair ii, higher PS-ORFans (P) in test pair iii, and higher NS-ORFans (P) in test pair iv. Blue asterisks indicate the opposite.
P values in Wilcoxon tests of different groups of ORFans in the nine genera based on the percentage of ORFans in prophages
| Null | ||||
|---|---|---|---|---|
| % PS-ORFans > | % NS-ORFans > | % SS-ORFans (P) > | % PS-ORFans > | |
| 0.394433525 | 0.880832477 | |||
| 0.159456553 | 0.728183622 | 0.056349164 | ||
| 0.878883731 | 0.874371848 | 0.062736061 | ||
| 0.939696489 | 0.094187485 | |||
| 0.5 | 0.700199577 | |||
| 0.196771097 | 0.580161013 | 0.064978632 | ||
| 0.943266767 | 0.821135897 | |||
| 0.645583377 | ||||
Boldface P values are <0.05, supporting the null hypothesis in the header row. Italic P values are >0.95, supporting the alternative hypothesis. The percentages of the different ORFan groups in prophages are calculated as, e.g., % PS-ORFans = no. of PS-ORFans located in prophages/total no. of prophage proteins in genome.
P values in Wilcoxon tests of different groups of ORFans in the nine genera based on the percentage of ORFans in PAIs
| Null | ||||
|---|---|---|---|---|
| % PS-ORFans > | % NS-ORFans > | % SS-ORFans (P) > | % PS-ORFans > | |
| 0.159483531 | 0.838156431 | 0.477412277 | ||
| 0.149975088 | 0.471677773 | |||
| 0.926534723 | 0.151197704 | |||
| 0.453673917 | 0.86992928 | 0.366613929 | ||
| 0.589735247 | 0.273832977 | |||
| 0.066440993 | ||||
| 0.64218788 | 0.944064298 | |||
Boldface P values are <0.05, supporting the null hypothesis in the header row. Italic P values are >0.95, supporting the alternative hypothesis. The percentages of the different ORFan groups in PAIs are calculated as, e.g., % PS-ORFans = no. of PS-ORFans located in PAIs/total no. of PAI proteins in genome.
P values in Wilcoxon tests of different groups of ORFans in the nine genera based on the percentage of ORFans of VF origin
| Null | ||||
|---|---|---|---|---|
| % PS-ORFans > | % NS-ORFans > | % SS-ORFans (P) > | % PS-ORFans > | |
| 0.286237787 | ||||
| 0.08833981 | 0.928052027 | |||
| 0.260592071 | ||||
| 0.785348469 | 0.529331759 | |||
| 0.5 | 0.274220063 | 0.274220063 | ||
| 0.157008589 | ||||
| 0.62968916 | ||||
| 0.060758795 | ||||
Boldface P values are <0.05, supporting the null hypothesis in the header row. Italic P values are >0.95, supporting the alternative hypothesis. The percentages of the different ORFan groups of VF origin are calculated as, e.g., % PS-ORFans = no. of PS-ORFans of VF origin/total no. of VF proteins in genome.
In total, only 3 ORFans of the 40 Listeria genomes are VFs (Data Set S1), so the P values for this genus are not reliable.
P values in Wilcoxon tests of different groups of ORFans in the nine genera based on the percentage of ORFans of HGT origin
| Null | ||||
|---|---|---|---|---|
| % PS-ORFans > | % NS-ORFans > | % SS-ORFans (P) > | % PS-ORFans > | |
| 0.543668612 | 0.937425556 | |||
| 0.086742734 | 0.842279273 | 0.169238704 | ||
| 0.82867728 | 0.181149495 | |||
| 0.098310924 | ||||
| 0.89308897 | 0.659921543 | |||
| 0.416257815 | 0.931692179 | 0.148137858 | ||
| 0.889525369 | 0.397438826 | |||
| 0.17201608 | 0.832963242 | |||
| 0.123168263 | ||||
Boldface P values are <0.05, supporting the null hypothesis in the header row. Italic P values are >0.95, supporting the alternative hypothesis. The percentages of the different ORFan groups of HGT origin are calculated as, e.g., % PS-ORFans = no. of horizontally transferred PS-ORFans/total no. of horizontally transferred ORFans in genome.
FIG 3More conserved PS-ORFans (but not NS-ORFans) are more likely to be found in prophages and PAIs. The x axis is the number of genera in which an ORFan has blastp hits. (The number is 1 for an ORFan restricted to its own genus.) The y axis is the percentage of ORFans (e.g., the number of ORFans located in prophages divided by the number of ORFans). The detailed numbers are available in Table S4.