| Literature DB >> 30717662 |
Shuhaila Mat-Sharani1,2, Mohd Firdaus-Raih3,4.
Abstract
BACKGROUND: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes.Entities:
Keywords: Conserved; Fungal; Small open Reading frames; sORFs; smORF
Mesh:
Year: 2019 PMID: 30717662 PMCID: PMC7394265 DOI: 10.1186/s12859-018-2550-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Research workflow
Fig. 2Distribution of sORFs according to length for all 31 fungal genomes
Fig. 3Distribution of total of conserved sORFs across fungal genomes
List of sORFs predicted from the whole genome and intergenic regions of fungal genomes
| Microorganisms (yeast/fungi) | Phylum | Genome size (Mb) | Scaffolds(sc)/ Chromosomes | Intergenic region | Intron | ORF | sORFs from genome annotation | sORF from ab initio prediction | Total sORFs predicted | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GetORF | sORFfinder | Combined 1st & 2nd approached | sORFs match ORF homolog | sORFs ab initio predicted | ||||||||||
| Genome | Intergenic | |||||||||||||
| Agaricus bisporus | Basidiomycetes | 30.78 | 29sc | 10,606 | 50,859 | 10,450 | 575 | 423,332 | 113,990 | 902,110 | 2808 | 1952 | 856 | 1431 |
| Aspergillus fumigatus | Ascomycetes | 29.39 | 16 | 9916 | 18,630 | 9630 | 180 | 329,750 | 644,099 | 746,886 | 6665 | 5212 | 1453 | 1633 |
| Aspergillus nidulans | Ascomycetes | 29.83 | 17 | 9410 | 25,192 | 9410 | 83 | 376,689 | 533,546 | 288,159 | 3031 | 1803 | 1228 | 1311 |
| Aspergillus niger | Ascomycetes | 38.50 | 20sc | 10,828 | 25,160 | 10,609 | 90 | 454,556 | 645,989 | 431,379 | 2950 | 1690 | 1260 | 1350 |
| Aspergillus oryzae | Ascomycetes | 37.91 | 27sc | 12,937 | 29,686 | 12,818 | 172 | 508,537 | 639,035 | 361,674 | 3336 | 2042 | 1294 | 1466 |
| Candida dubliniensis | Ascomycetes | 14.62 | 8 | 5499 | 169 | 5213 | 45 | 180,147 | 249,542 | 181,108 | 4591 | 2046 | 2545 | 2590 |
| Candida glabrata | Ascomycetes | 12.34 | 14 | 6580 | 33,084 | 6575 | 159 | 153,883 | 203,269 | 290,815 | 2593 | 1271 | 1322 | 1481 |
| Cryptococcus gattii | Basidiomycetes | 18.37 | 14 | 6617 | 34,336 | 6475 | 94 | 250,564 | 331,357 | 263,216 | 2447 | 1864 | 583 | 677 |
|
| Basidiomycetes | 2.50 | 14 | 6658 | 479 | 6290 | 110 | 254,930 | 329,135 | 247,268 | 2266 | 1673 | 593 | 703 |
|
| Ascomycetes | 2.25 | 8 | 2029 | 32 | 1996 | 300 | 153,782 | 190,959 | 165,774 | 4179 | 2540 | 1639 | 1939 |
| Encephalitozoon cuniculi | Microsporidia | 2.22 | 11 | 1892 | 14 | 1833 | 13 | 28,102 | 33,715 | 53,062 | 554 | 396 | 158 | 171 |
| Encephalitozoon intestinalis | Microsporidia | 2.19 | 11 | 4853 | 239 | 4434 | 18 | 26,156 | 28,950 | 54,302 | 543 | 412 | 131 | 149 |
| Eremothecium cymbalariae | Microsporidia | 9.67 | 8 | 5356 | 276 | 4776 | 61 | 119,153 | 153,876 | 171,154 | 3384 | 1865 | 1519 | 1580 |
| Eremothecium gossypii | Microsporidia | 9.12 | 8 | 11,624 | 25,808 | 11,628 | 80 | 94,967 | 117,788 | 121,934 | 3529 | 2265 | 1264 | 1344 |
| Gibberella zeae | Ascomycetes | 38.05 | 11 | 5649 | 1040 | 5378 | 104 | 488,905 | 698,951 | 151,211 | 4487 | 2265 | 1832 | 1936 |
| Kazachstania africana | Ascomycetes | 11.13 | 12 | 5649 | 1040 | 5378 | 88 | 144,748 | 186,855 | 141,681 | 1894 | 1247 | 647 | 735 |
| Kluyveromyces lactis | Ascomycetes | 10.73 | 7 | 5412 | 182 | 5085 | 73 | 139,746 | 182,755 | 81,044 | 3006 | 1636 | 1370 | 1443 |
| Lachancea thermotolerans | Ascomycetes | 10.39 | 8 | 5498 | 284 | 5091 | 53 | 114,598 | 145,278 | 155,291 | 3134 | 1628 | 1506 | 1559 |
| Magnaporthe grisea | Ascomycetes | 40.30 | 8 | 14,210 | 25,265 | 14,014 | 1105 | 506,420 | 699,033 | 217,609 | 4466 | 2735 | 1731 | 2836 |
| Myceliophthora thermophila | Ascomycetes | 38.74 | 7 | 9294 | 15,500 | 9099 | 419 | 343,719 | 540,155 | 140,855 | 4738 | 2547 | 2191 | 2610 |
| Naumovozyma castellii | Ascomycetes | 11.22 | 10 | 5870 | 203 | 5592 | 102 | 147,551 | 187,871 | 187,462 | 5691 | 3430 | 2261 | 2363 |
| Naumovozyma dairenensis | Ascomycetes | 13.53 | 12 | 6057 | 177 | 5772 | 99 | 185,531 | 251,917 | 231,996 | 2841 | 1323 | 1518 | 1617 |
| Pichia pastoris | Ascomycetes | 9.60 | 4 | 5040 | 578 | 5040 | 89 | 112,299 | 138,917 | 61,604 | 4161 | 2872 | 1289 | 1378 |
| Pichia stipitis | Ascomycetes | 15.44 | 8 | 5816 | 2567 | 5816 | 61 | 186,356 | 259,185 | 79,673 | 2146 | 1008 | 1138 | 1199 |
|
| Ascomycetes | 12.16 | 17 | 6349 | 366 | 5906 | 224 | 153,187 | 200,252 | 398,979 | 3478 | 2196 | 1282 | 1506 |
| Schizosaccharomyces pombe | Ascomycetes | 12.59 | 4 | 6991 | 3793 | 5133 | 124 | 165,857 | 165,864 | 33,542 | 6027 | 3304 | 2723 | 2847 |
| Tetrapisispora phaffii | Ascomycetes | 12.12 | 16 | 5460 | 141 | 5250 | 89 | 153,895 | 206,397 | 825,322 | 2959 | 1410 | 1549 | 1638 |
| Thielavia terrestris | Ascomycetes | 36.91 | 6 | 9958 | 17,290 | 9802 | 72 | 296,185 | 443,166 | 163,032 | 3291 | 1966 | 1325 | 1397 |
| Torulaspora delbrueckii | Ascomycetes | 9.22 | 8 | 5176 | 203 | 4972 | 402 | 113,250 | 138,118 | 509,930 | 4521 | 3074 | 1447 | 1849 |
| Yarrowia lipolytica | Ascomycetes | 20.55 | 7 | 7357 | 1120 | 6472 | 106 | 242,726 | 369,856 | 127,236 | 1594 | 456 | 1138 | 1244 |
| Zygosaccharomyces rouxii | Ascomycetes | 9.76 | 7 | 5332 | 166 | 4991 | 65 | 123,372 | 154,232 | 353,232 | 4429 | 2634 | 1795 | 1860 |
| TOTAL | 210,928 | 5255 | 6,972,893 | 9,184,052 | 902,110 | 105,739 | 42,587 | 47,842 | ||||||
Fig. 4Clustered conserved sORFs
Fig. 5Multiple sequence alignments for sORFs that are conserved within (a) 26 fungal genomes (i-xx3497 and ii-xx4629) and (b) 2/4 fungal genomes (i-xx4249 and ii-xx6165) based on clustering. The sORFs extracted from genome annotations have identifiers with ‘*gi*’ while those computed from this work have identifiers with ‘*sf*’
Fig. 6Phylogenetics of conserved sORFs within (a) 26 fungal genomes (xx3497) and (b) 2/4 fungal genomes (xx4249) based on clustering
Characterization of sORFs conserved in 15 fungal genomes
| sORFs ID | Access Number | Existed Description | New Description |
|---|---|---|---|
| A.bisporus-gi426200236 | EKV50160.1 | hypothetical protein AGABI2DRAFT_115218 | plasma membrane proteolipid 3 |
| A.fumigatus-gi70999334 | XP_754386.1 | stress response RCI peptide | plasma membrane proteolipid 3 |
| A.niger-gi317027004 | XP_001399936.2 | plasma membrane proteolipid 3 | plasma membrane proteolipid 3 |
| A.oryzae-gi317139792 | XP_003189201.1 | plasma membrane proteolipid 3 | plasma membrane proteolipid 3 |
| C.dubliniensis-sf4096_1 | NA | NA | plasma membrane proteolipid 3 |
| C.gattii-gi321253028 | XP_003192603.1 | cation transport-related protein | plasma membrane proteolipid 3 |
| C.neoformans-gi58264530 | XP_569421.1 | cation transport-related protein | plasma membrane proteolipid 3 |
| D.hansenii-gi50417989 | XP_457739.1 | DEHA2C01320p | plasma membrane proteolipid 3 |
| G.zeae-gi46137413 | XP_390398.1 | hypothetical protein FG10222.1 | plasma membrane proteolipid 3 |
| M.oryzae-gi389644188 | XP_003719726.1 | plasma membrane proteolipid 3 | plasma membrane proteolipid 3 |
| M.thermophila-gi367024053 | XP_003661311.1 | hypothetical protein MYCTH_2314489 | plasma membrane proteolipid 3 |
| P.stipitis-sf9282_1 | NA | NA | plasma membrane proteolipid 3 |
| S.pombe-gi429239798 | NP_595350.2 | plasma membrane proteolipid Pmp3 | plasma membrane proteolipid 3 |
| T.terrestris-gi367036855 | XP_003648808.1 | hypothetical protein THITE_2106674 | plasma membrane proteolipid 3 |
| Y.lipolytica-gi210075749 | XP_502906.2 | YALI0D16665p | plasma membrane proteolipid 3 |
List of sORFs predicted in S. cerevisiae
| from this study | Kastenmayer et al | from this study | Kastenmayer et al |
|---|---|---|---|
| S.cerevisiae-gi14318502 | YFL017W-A | S.cerevisiae-gi6323292 | #N/A |
| S.cerevisiae-gi398364355 | YFR032C-A | S.cerevisiae-gi6323318 | #N/A |
| S.cerevisiae-gi398365385 | YNL024C-A | S.cerevisiae-gi6323506 | #N/A |
| S.cerevisiae-gi398365605 | YLR287C-A | S.cerevisiae-gi6323558 | #N/A |
| S.cerevisiae-gi398365775 | YOR210W | S.cerevisiae-gi6323634 | #N/A |
| S.cerevisiae-gi398365789 | YDR139C | S.cerevisiae-gi6323912 | #N/A |
| S.cerevisiae-gi398366075 | YLR388W | S.cerevisiae-gi6324184 | #N/A |
| S.cerevisiae-gi6321622 | YGR183C | S.cerevisiae-gi6324259 | #N/A |
| S.cerevisiae-gi6321937 | YHR143W-A | S.cerevisiae-gi6324313 | #N/A |
| S.cerevisiae-gi6323294 | YLR264W | S.cerevisiae-gi6324619 | #N/A |
| S.cerevisiae-gi6323357 | YLR325C | S.cerevisiae-gi6324877 | #N/A |
| S.cerevisiae-gi6324070 | YNL259C | S.cerevisiae-gi6325391 | #N/A |
| S.cerevisiae-gi6324360 | YNR032C-A | S.cerevisiae-gi73858744 | #N/A |
| S.cerevisiae-gi6324741 | YOR167C | S.cerevisiae-gi7839147 | #N/A |
| S.cerevisiae-gi7839181 | YHR072W-A | S.cerevisiae-sf1119_1 | #N/A |
| S.cerevisiae-gi12621478 | #N/A | S.cerevisiae-sf19568_1 | #N/A |
| S.cerevisiae-gi147921768 | #N/A | S.cerevisiae-sf21_1 | #N/A |
| S.cerevisiae-gi33438768 | #N/A | S.cerevisiae-sf21973_1 | #N/A |
| S.cerevisiae-gi33438785 | #N/A | S.cerevisiae-sf22173_1 | #N/A |
| S.cerevisiae-gi33438820 | #N/A | S.cerevisiae-sf23868_1 | #N/A |
| S.cerevisiae-gi33438821 | #N/A | S.cerevisiae-sf27242_1 | #N/A |
| S.cerevisiae-gi33438834 | #N/A | S.cerevisiae-sf27243_1 | #N/A |
| S.cerevisiae-gi33438835 | #N/A | S.cerevisiae-sf27714_1 | #N/A |
| S.cerevisiae-gi33438838 | #N/A | S.cerevisiae-sf3100_1 | #N/A |
| S.cerevisiae-gi33438839 | #N/A | S.cerevisiae-sf31758_1 | #N/A |
| S.cerevisiae-gi398365465 | #N/A | S.cerevisiae-sf32431_1 | #N/A |
| S.cerevisiae-gi398365709 | #N/A | S.cerevisiae-sf32615_1 | #N/A |
| S.cerevisiae-gi398366109 | #N/A | S.cerevisiae-sf34463_1 | #N/A |
| S.cerevisiae-gi398366483 | #N/A | S.cerevisiae-sf35098_1 | #N/A |
| S.cerevisiae-gi398366543 | #N/A | S.cerevisiae-sf4587_1 | #N/A |
| S.cerevisiae-gi398366617 | #N/A | S.cerevisiae-sf7880_1 | #N/A |
| S.cerevisiae-gi41629681 | #N/A | S.cerevisiae-sf85063_1 | #N/A |
| S.cerevisiae-gi6226526 | #N/A | S.cerevisiae-sf85096_1 | #N/A |
| S.cerevisiae-gi6226533 | #N/A | S.cerevisiae-sf9229_1 | #N/A |
| S.cerevisiae-gi6320017 | #N/A | S.cerevisiae-gi6320482 | #N/A |
| S.cerevisiae-gi6320068 | #N/A | S.cerevisiae-gi6320734 | #N/A |
| S.cerevisiae-gi6320142 | #N/A | S.cerevisiae-gi6320819 | #N/A |
| S.cerevisiae-gi6320291 | #N/A | S.cerevisiae-gi6322272 | #N/A |
| S.cerevisiae-gi6323020 | #N/A |
Fig. 7Classification of predicted conserved sORFs based on Gene Ontology. The solid colors represents cellular components, the dot patterns represents biological processes and cross patterns represents molecular functions