| Literature DB >> 26911138 |
Klaus Neuhaus1, Richard Landstorfer2, Lea Fellner3, Svenja Simon4, Andrea Schafferhans5, Tatyana Goldberg6, Harald Marx7, Olga N Ozoline8, Burkhard Rost9, Bernhard Kuster10,11, Daniel A Keim12, Siegfried Scherer13.
Abstract
BACKGROUND: Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome).Entities:
Mesh:
Substances:
Year: 2016 PMID: 26911138 PMCID: PMC4765031 DOI: 10.1186/s12864-016-2456-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Novel genes detected in EHEC
| Gene description | Ribosomal footprints | MS | PlatProm prediction | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Name | Classification | Start | Stop | Length [bp] | Origin | RPKM | Gene coverage | Ribosomal coverage value | LB | LB-Nit | LB-small | Upstream of start codon [bp] | Score |
| X001 | real | 217270 | 217488 | 219 | 1690 | 0.99 | 2.35 | −460 | 9.00 | ||||
| X002 | real | 391261 | 391725 | 465 | 56 | 0.61 | 0.64 | −211( | 8.93 | ||||
| X003 | real | 570516 | 570710 | 195 | 102 | 0.72 | 0.69 | -- | -- | ||||
| X004 | real | 667557 | 667805 | 249 | 18 | 0.59 | 0.51 | 2 | 2 | −287 | 7.90 | ||
| X005 | real | 713269 | 713421 | 150 | 190 | 0.92 | 0.89 | −54( | 7.61 | ||||
| X006 | real | 713433 | 713630 | 198 | 166 | 0.86 | 0.77 | −54( | 7.61 | ||||
| X007* | 790488 | 790682 | 195 | 79 | 0.65 | 0.80 | −563 | 9.15 | |||||
| X008 | real | 902889 | 903083 | 195 | phage | 678 | 0.71 | 0.82 | −14 | 7.63 | |||
| X009* | real | 978607 | 978747 | 141 | 17 | 0.52 | 0.50 | −129 | 7.63 | ||||
| X010a | real | 1112292 | 1112471 | 180 | 35 | 0.75 | 0.71 | −297 | 8.28 | ||||
| X010b | 1508079 | 1507899 | duplicate of X010a | ||||||||||
| X011* | real | 1146872 | 1147027 | 156 | 13 | 0.53 | 0.38 | -- | -- | ||||
| X012 | real | 1152583 | 1152795 | 213 | 57 | 0.51 | 0.42 | −2 | 7.66 | ||||
| X013 | real | 1256680 | 1256967 | 288 | phage | 230 | 0.89 | 0.92 | 2 | −590 | 7.73 | ||
| X014a | real | 1267635 | 1267820 | 186 | phage | 552 | 0.66 | 0.26 | −67 | 8.07 | |||
| X014b | 2314896 | 2314711 | duplicate of X014a | ||||||||||
| X015 | real | 1334776 | 1334931 | 156 | phage | 35 | 0.84 | 0.32 | −70( | 7.84 | |||
| X016a | real | 1346825 | 1347184 | 360 | phage | 58 | 0.65 | 0.69 | −20 | 9.29 | |||
| X016b | 3000443 | 3000802 | duplicate of X016a | ||||||||||
| X017 | real | 1353605 | 1353772 | 168 | phage | 23 | 0.52 | 0.21 | −91 | 9.27 | |||
| X018 | real | 1411438 | 1411557 | 120 | 49 | 0.8 | 0.37 | −30 | 10.92 | ||||
| X019 | real | 1680779 | 1680967 | 189 | phage | 242 | 0.77 | 3.51 | −269 | 9.01 | |||
| X020 | real | 1772962 | 1773144 | 183 | 53 | 0.6 | 1.04 | −24( | 10.52 | ||||
| X021 | real | 1843458 | 1843622 | 165 | 1029 | 0.65 | 2.56 | −625 | 7.74 | ||||
| X022* | real | 1866296 | 1866505 | 210 | phage | 2169 | 0.82 | 0.73 | −6 | 7.63 | |||
| X023* | 1866493 | 1866648 | 156 | phage | 280 | 0.88 | 1.30 | −203 | 7.63 | ||||
| X024 | real | 1881598 | 1881819 | 222 | phage | 21 | 0.37 | 0.60 | 2 | 2 | 2 | −76 | 8.93 |
| X025a | 1389500 | 1389288 | duplicate of X015b | ||||||||||
| X025b | real | 1888594 | 1888806 | 213 | phage | 524 | 0,95 | 0,08 | −112 | 8,27 | |||
| X026 | real | 1905731 | 1905850 | 120 | phage | 622 | 0.7 | 1.01 | −77(Z2121) | 12.18 | |||
| X027 | real | 2038161 | 2038382 | 222 | 75 | 0.51 | 1.46 | −313 | 7.47 | ||||
| X028* | 2101101 | 2101247 | 147 | 131 | 0.61 | 1.70 | 0 | 8.00 | |||||
| X029 | real | 2109655 | 2109921 | 267 | 629 | 0.97 | 0.85 | -- | -- | ||||
| X030 | real | 2138823 | 2139137 | 315 | phage | 1520 | 0.98 | 1.35 | −53 | 16.96 | |||
| X031 | real | 2168349 | 2168567 | 219 | 77 | 0.66 | 0.60 | −110 | 8.48 | ||||
| X032a | 1269797 | 1269913 | duplicate of X032c | ||||||||||
| X032b | 1868589 | 1868705 | duplicate of X032c | ||||||||||
| X032c | real | 2312618 | 2312734 | 117 | 650 | 0.81 | 0.74 | −447 | 7.83 | ||||
| X033* | 2379507 | 2379659 | 153 | 348 | 0.87 | 1.50 | −77 | 12.26 | |||||
| X034 | real | 2430386 | 2430598 | 213 | 47 | 0.53 | 0.22 | −9 | 11.59 | ||||
| X035* | 2480019 | 2480177 | 159 | 25 | 0.52 | 0.20 | −63 | 10.51 | |||||
| X036 | real | 2584677 | 2584847 | 171 | 52 | 0.66 | 0.17 | −162 | 12.18 | ||||
| X037 | real | 2663871 | 2664122 | 252 | 14 | 0.53 | 0.58 | −243 | 12.65 | ||||
| X038 | real | 2670869 | 2671075 | 207 | phage | 1209 | 0.8 | 0.69 | −28 | 11.39 | |||
| X039 | real | 2742703 | 2742918 | 216 | 90 | 0.58 | 0.61 | −103 | 7.60 | ||||
| X040 | real | 2777135 | 2777347 | 213 | phage | 37 | 0,57 | 0,02 | -- | -- | |||
| X041 | real | 2779284 | 2779508 | 225 | phage | 57 | 0.73 | 1.32 | -- | -- | |||
| X042 | real | 2844454 | 2844606 | 153 | 768 | 0.84 | 0.83 | −295(X043) | 8.26 | ||||
| X043 | real | 2844640 | 2844804 | 165 | 212 | 0.92 | 0.44 | −295 | 8.26 | ||||
| X044 | real | 2844865 | 2845074 | 210 | 36 | 0.53 | 0.17 | −210 | 11.00 | ||||
| X045 | real | 2845149 | 2845358 | 210 | 163 | 0.9 | 0.16 | −23 | 9.54 | ||||
| X046* | 2845408 | 2845602 | 195 | 145 | 0.69 | 0.35 | −33 | 9.54 | |||||
| X047 | real | 2966787 | 2966987 | 201 | phage | 34 | 0.71 | 0.17 | −21 | 8.08 | |||
| X048 | real | 3003688 | 3003945 | 258 | phage | 40 | 0.65 | 1.96 | −353 | 8.18 | |||
| X049 | real | 3004951 | 3005067 | 117 | phage | 241 | 0.75 | 1.39 | 3 | 2 | −93 | 9.71 | |
| X050 | real | 3013440 | 3013694 | 255 | phage | 28 | 0.64 | 0.47 | −71(Z3371) | 8.46 | |||
| X051 | real | 3261588 | 3261758 | 171 | 89 | 0.86 | 0.35 | -- | -- | ||||
| X052* | 3271689 | 3271820 | 132 | 34 | 0.79 | 0.32 | −95 | 9.93 | |||||
| X053 | real | 3453780 | 3454583 | 804 | 41 | 0.53 | 0.20 | 9 | 13 | 2 | −36 | 9.48 | |
| X054* | 3894853 | 3894993 | 141 | 98 | 0.86 | 0.56 | −220 | 8.25 | |||||
| X055 | real | 3918141 | 3918344 | 204 | 47 | 0.56 | 0.31 | -- | -- | ||||
| X056 | real | 4207372 | 4207641 | 270 | 725 | 0.92 | 0.66 | −52 | 10.58 | ||||
| X057 | real | 4240665 | 4240883 | 219 | 2974 | 0.88 | 2.01 | −24 | 13.80 | ||||
| X058* | 4441485 | 4441643 | 159 | 359 | 0.98 | 0.64 | −569 | 9.75 | |||||
| X059 | real | 4449723 | 4449821 | 99 | 19 | 0.6 | 0.08 | −96 | 7.97 | ||||
| X060 | real | 4468299 | 4468592 | 294 | 639 | 0.84 | 2.99 | −253 | 9.57 | ||||
| X061 | real | 4585965 | 4586174 | 210 | 202 | 0.92 | 1.98 | 2 | 2 | −67 | 9.03 | ||
| X062 | real | 4654347 | 4654490 | 144 | phage | 29 | 0.73 | 0.89 | −393 | 8.17 | |||
| X063* | 4730352 | 4730537 | 186 | 15 | 0.51 | 0.95 | −533 | 11.48 | |||||
| X064 | real | 4793504 | 4793737 | 234 | 20 | 0.53 | 0.28 | -- | -- | ||||
| X065 | real | 4870817 | 4870978 | 162 | 38 | 0.74 | 1.28 | −90( | 8.1 | ||||
| X066* | 4873916 | 4874122 | 207 | 117 | 0.84 | 2.58 | −104 | 7.92 | |||||
| X067 | real | 4916583 | 4916756 | 174 | 162 | 0.84 | 0.64 | −22( | 11.84 | ||||
| X068* | 5077694 | 5077831 | 138 | 2040 | 0.97 | 0.55 | −368( | 7.61 | |||||
| X069 | real | 5369765 | 5369998 | 234 | 141 | 0.94 | 0.33 | −159( | 11.47 | ||||
| X070 | real | 5456776 | 5457042 | 267 | 53 | 0.52 | 3.58 | −163( | 8.02 | ||||
| X071 | real | 5494158 | 5494394 | 237 | 45 | 0.57 | 2.82 | −27 | 8.35 | ||||
| X072 | real | 5515374 | 5515541 | 168 | 38 | 0.69 | 0.80 | −39( | 7.9 | ||||
The asterisk indicates genes not annotated in any other organism (blastp against GenBank, threshold E-value ≤10−10)
Machine learning classification based on the set of annotated proteins (“real”) and their shuffled counterparts as training set
The positions are given in relation to GenBank accession no. NC_002655, the original genome sequence of strain EDL933. Only very recently, the genome has been updated (GenBank accession no. CP008957)
Genes originating from prophages are indicated
The RPKM footprint and coverage of the actual ORF with footprints is given as average of two replicate experiments for bacteria grown in LB medium
Fraction of the ORF covered with one or more footprint reads
Ratio of RPKM footprints to RPKM transcriptome
Indicated is the number of individual peptide spectra gained by mass spectrometry
Putative promoters have been predicted using PlatProm. The position of the assumed transcription start site upstream of the start codon and the quality of the prediction (score) are given
Fig 1Four examples of new EHEC protein-coding ORFs (red arrows) discovered by ribosomal footprinting and visualized using Artemis [30]. Protein-coding ORFs are indicated by cyan arrows in the lower part of each panel. Blue lines in the upper part of each panel represent ribosomal footprint reads. a X018 is an example for a single (monocistronic) gene. b X001 is located in the upstream part of yaeO. These two genes might form a translationally coupled operon. c Two short genes, X005 and X006 are located downstream of cstA, maybe also translationally coupled. d X002 might be part of the operon yahDEFGIJ spanning from yahD to yahJ (only partly shown). The missing gene yahH had been annotated at first but was rejected later due to its structure (see Discussion and Fig 4)
Fig 4Repeat structure of the REP23 containing gene yahH (the same as X002) and its protein YahH [58]. The upper part shows one repeat block folded as mRNA [59]. The DNA sequence (lower part) basically consists of five of such repeated blocks, with only minor differences (when compared to each other – single nt differences are in green) and a short unique sequence at the 3’-end (green stretch). When comparing the fourth block to the other, a base appears to be missing (red marked gap) causing a change in the reading frame visible in the protein structure. Thus, the protein contains three large repeats and a fourth truncated one (grey blocks, few differences in aa indicated in blue). Downstream of the “frame shift” mutation, a different structure of two blocks is found (yellow). The protein contains many charged amino acids, either positive (RK, red print) or negative (DE, blue print)
Fig 2Graphical overview of PredictProtein values for the novel and length-matched annotated proteins. Error bars (if given) show the SD. a Shown is the predicted percentage of the protein length comprised of helices H, sheets E, and loops L. Furthermore, the percentage of buried and exposed amino acids is given (b and e). b On the left side, the fraction of proteins possessing at least one predicted transmembrane domain (TMD) is shown. On the right side, the mean number of TMDs per possessing proteins is shown. c The fraction of proteins having a coiled-coil prediction using a window of 14 amino acids is given. d The left bars show the fraction of proteins with a low-complexity region, the right bars give the mean length of this region compared to the overall length of the proteins for those possessing such a region. e The left bars show the fraction of proteins with a disordered region, the right bars give the mean length of this region compared to the overall length of the proteins for those possessing such a region. f The fraction of proteins having at least one Cys = Cys bond predicted
Fig 3Graphical overview of the PredictProtein values for the novel and length-matched annotated proteins comparing localization, protein-protein binding sites, and PROSITE pattern. a Subcellular localization has been predicted using LocTree3 and is shown in per cent for the different compartments (membr., membrane). b The left bars show that all proteins have predicted protein-protein binding sites. The right bars show the percentage of the predicted number of amino acids involved in this type of interaction. c Given is the predicted number of PROSITE patterns per 100 aa
Transcriptome data of selected novel genes regulated under specific conditions given as fold-change compared to standard LB. Data are taken from [9]
| Name | Minimal medium | LB-Nit | pH9 | Radish sprouts | Spinach leaf juice | 15 °C | Amoeba | Antibiotics | Cow dung | Agar surface | pH4 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| X009* | u/c | n.r. | 9 | u/c | u/c | n.r. | 70 | u/c | u/c | n.r. | n.r. |
| X011* | 12 | u/c | 6 | 8 | 26 | 13 | 151 | n.r. | n.r. | 19 | 21 |
| X031 | u/c | u/c | u/c | u/c | 26 | u/c | u/c | u/c | u/c | u/c | −18 |
| X037 | n.r. | n.r. | n.r. | n.r. | n.r. | n.r. | 213 | n.r. | n.r. | n.r. | n.r. |
| X048 | n.r. | - u/c | u/c | u/c | n.r. | u/c | n.r. | 48 | n.r. | n.r. | u/c |
| X052* | n.r. | −6 | −5 | n.r. | u/c | u/c | 12 | n.r. | n.r. | n.r. | −5 |
| X060 | u/c | 7 | u/c | 5 | 10 | 18 | u/c | −17 | u/c | u/c | u/c |
| X062 | 25 | 14 | 12 | n.r. | u/c | 7 | 23 | n.r. | n.r. | n.r. | n.r. |
| X070 | n.r. | n.r. | u/c | n.r. | n.r. | u/c | 25 | n.r. | n.r. | n.r. | n.r. |
| X071 | 122 | 14 | 9 | 5 | u/c | u/c | n.r. | n.r. | 5 | n.r. | n.r. |
positive values, up regulated; negative values, down regulated; n.r., no reads under this condition; u/c, unchanged (threshold ≥5-fold regulation)
The asterisk indicates genes not annotated in any other organism (see Table 1)
Phenotype in calves of transposon hits in or nearby the novel genes. Threshold is defined as a 5-fold or higher regulation. Negative values indicate down-regulation. Data are taken from [72]
| Name | Position of Tn insertion | Direct hit [H] or bases upstream [b] | Fold-change output versus input |
|---|---|---|---|
| X033* | 2379421 | 86 | −33 |
| X036 | 2584780 | H | −13 |
| X045 | 2845234 | H | −50 |
The asterisk indicates genes not annotated in any other organism (see Table 1)