| Literature DB >> 26076724 |
Kun Zhang1, Yan Fu2, Wen-Feng Zeng1, Kun He1, Hao Chi3, Chao Liu3, Yan-Chang Li4, Yuan Gao4, Ping Xu4, Si-Min He3.
Abstract
MOTIVATION: Proteogenomics has been well accepted as a tool to discover novel genes. In most conventional proteogenomic studies, a global false discovery rate is used to filter out false positives for identifying credible novel peptides. However, it has been found that the actual level of false positives in novel peptides is often out of control and behaves differently for different genomes.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26076724 PMCID: PMC4595894 DOI: 10.1093/bioinformatics/btv340
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
The probabilities and their estimates used in this article
| Identification type | True positive | False positive |
|---|---|---|
| Annotated identification | ||
| Novel identification |
Fig. 1.The upper and lower bounds of when
Fig. 2.Simulation results on the E.coli and M.tuberculosis datasets. To simulate partial annotation, we randomly removed some annotated genes from the database. Gene sampling was performed on the basis of θ, with a step of 0.1 from 0 to 1, and in addition, 0.95 and 0.99 were also appended. (A) The experimental obtained on the E.coli dataset as shown by red crosses fits well with the theoretical value (blue line). The deduced values for θ were approximately identical to the sampled ones, as shown by magenta triangles on the diagonal line. (B) On the M.tuberculosis dataset, genes were sampled 10 times for each value of θ. The experimental values as shown by red boxes fit well with the theoretical values (blue line) when θ is less than 0.9. As truly novel peptides may exist, the experimental diverges from the theoretical counterpart. The experimental is 0.69 when sampled θ = 1, and the deduced θ is 0.996 correspondingly. However, all deduced values for θ still match the sampled ones (green box), since the annotation completeness ratio is very close to 1.