| Literature DB >> 20823330 |
Rosario M Piro1, Ivan Molineris, Ugo Ala, Paolo Provero, Ferdinando Di Cunto.
Abstract
MOTIVATION: The identification of genes involved in specific phenotypes, such as human hereditary diseases, often requires the time-consuming and expensive examination of a large number of positional candidates selected by genome-wide techniques such as linkage analysis and association studies. Even considering the positive impact of next-generation sequencing technologies, the prioritization of these positional candidates may be an important step for disease-gene identification.Entities:
Mesh:
Year: 2010 PMID: 20823330 PMCID: PMC2935433 DOI: 10.1093/bioinformatics/btq396
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic representation of the candidate prioritization method. The procedure is exemplified with two of the hypothetical candidate genes. The list of candidate genes, the phenotype p and the spatial gene expression profiles are considered as given.
Results of the leave-one-out tests
| Organism, phenotypes | Candidates (average) | Ranked first | Ranked 1st–10th | Ranked ≤10% | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Obs. | Exp. | Obs. | Exp. | Obs. | Exp. | |||||||
| Mouse | 50 | 85.1 | 860 | 26 | 10 | 1.57e-05*** | 169 | 101 | 1.38e-11*** | 144 | 86 | 5.95e-10*** |
| Mouse | 100 | 160.4 | 877 | 18 | 5 | 1.47e-05*** | 115 | 55 | 6.77e-14*** | 152 | 88 | 1.78e-11*** |
| Mouse | 200 | 298.8 | 880 | 13 | 3 | 1.20e-05*** | 65 | 29 | 4.79e-09*** | 147 | 88 | 5.67e-10*** |
| Human, mol.basis unkn. ( | 50 | 73.8 | 797 | 16 | 11 | 7.95e-02 | 149 | 108 | 2.64e-05*** | 126 | 80 | 1.80e-07*** |
| Human, mol.basis unkn. ( | 100 | 137.7 | 844 | 13 | 6 | 9.84e-03** | 105 | 61 | 6.26e-08*** | 132 | 84 | 1.93e-07*** |
| Human, mol.basis unkn. ( | 200 | 256.6 | 847 | 6 | 3 | 1.16e-01 | 61 | 33 | 4.84e-06*** | 139 | 85 | 5.08e-09*** |
N represents the size of the artificial loci having a maximum of 2N+1 genes. The average numbers of effective candidates with ABA profiles and the numbers of evaluated g–p pairs are shown. The observed and expected numbers of g–p pairs, for which the true phenotype-causing gene g ranks first, among the top 10 and within the best 10% of the prioritized list, is reported along with the corresponding P-values (one-tailed Fisher exact test). Significant P-values are highlighted (*P<0.05; **P<0.01; ***P<0.001).
Best 20 candidates of the prioritization (evaluation via similar phenotypes) of the chromosome X genes resequenced by Tarpey et al. (2009) (see also Supplementary Table S2)
| Rank | Gene | Entrez ID | Disorder | Mut. score | Score |
|---|---|---|---|---|---|
| 1 | BRWD3 | 254 065 | XLMR | 2.86 | 7.16e-78 |
| 2 | 3654 | – | 1.94 | 1.84e-71 | |
| 3 | SYP | 6855 | XLMR | − | 8.97e-69 |
| 4 | 331 | other | 37.04 | 4.28e-68 | |
| 5 | 9500 | – | 5.14 | 5.63e-68 | |
| 6 | 9643 | – | − | 3.37e-66 | |
| 7 | 55 609 | – | − | 5.07e-65 | |
| 8 | SYN1 | 6853 | XLMR | − | 1.15e-64 |
| 9 | 10 046 | other | 12.00 | 1.19e-64 | |
| 10 | ATP6AP2 | 10 159 | XLMR | − | 2.70e-64 |
| 11 | 3054 | – | 27.82 | 1.65e-61 | |
| 12 | 64 219 | – | 2.44 | 1.82e-61 | |
| 13 | NGFRAP1 | 27 018 | – | − | 1.91e-61 |
| 14 | 9130 | – | 11.62 | 5.63e-61 | |
| 15 | HUWE1 | 10 075 | XLMR | 46.75 | 1.62e-60 |
| 16 | GRIA3 | 2892 | XLMR | 13.00 | 1.14e-59 |
| 17 | 5277 | other | − | 3.24e-59 | |
| 18 | OGT | 8473 | – | 15.90 | 4.15e-59 |
| 19 | 54 552 | – | 22.49 | 1.68e-58 | |
| 20 | WDR40C | 340 578 | – | 0.22 | 3.35e-58 |
Associations to disorders and mutation scores are as reported by Tarpey et al. Mutation scores reflect the conservation scores at missense positions and are summed over the single missense mutations found for each gene. Genes in bold face overlap with those in Table 3.
Best 20 candidates of the prioritization (prediction via true XLMR genes) of the chromosome X genes resequenced by Tarpey et al. (2009) (see also Supplementary Table S3)
| Rank | Gene | Entrez ID | Disorder | Mut. score | Score |
|---|---|---|---|---|---|
| 1 | 9643 | – | − | 5.09e-99 | |
| 2 | 64219 | – | 2.44 | 7.70e-97 | |
| 3 | 55609 | – | − | 1.91e-93 | |
| 4 | 9500 | – | 5.14 | 1.55e-91 | |
| 5 | MAGEE1 | 57692 | – | 2.04 | 1.60e-85 |
| 6 | 331 | other | 37.04 | 1.02e-84 | |
| 7 | GRIPAP1 | 56850 | – | 8.17 | 3.13e-82 |
| 8 | 10046 | other | 12.00 | 3.75e-81 | |
| 9 | 54552 | – | 22.49 | 4.07e-81 | |
| 10 | 9130 | – | 11.62 | 7.72e-81 | |
| 11 | PGRMC1 | 10857 | – | 9.70 | 8.65e-81 |
| 12 | GPM6B | 2824 | – | − | 4.12e-79 |
| 13 | 3654 | – | 1.94 | 8.19e-79 | |
| 14 | 3054 | – | 27.82 | 1.28e-78 | |
| 15 | 5277 | other | − | 1.65e-78 | |
| 16 | RPS4X | 6191 | – | − | 4.97e-78 |
| 17 | REPS2 | 9185 | – | − | 2.81e-77 |
| 18 | ARMCX2 | 9823 | – | − | 1.67e-75 |
| 19 | DRP2 | 1821 | – | 41.79 | 3.64e-74 |
| 20 | MED14 | 9282 | – | 9.54 | 1.41e-73 |
Associations to disorders and mutation scores are as reported by Tarpey et al. Mutation scores reflect the conservation scores at missense positions and are summed over the single missense mutations found for each gene. Genes in bold face overlap with those in Table 2.
Fig. 2.Distribution of relative ranks k(c, r)/kmax of the three best XLMR candidates (within the co-expression lists of the 58 references genes), compared to the distribution of relative ranks of all 471 candidates. Data points represent bins of width 0.025.