| Literature DB >> 23705687 |
Ronald J Nowling1, Jenica L Abrudan, Douglas A Shoue, Badi' Abdul-Wahid, Mariha Wadsworth, Gwen Stayback, Frank H Collins, Mary Ann McDowell, Jesús A Izaguirre.
Abstract
BACKGROUND: The control of vector-borne diseases, such as malaria, dengue fever, and typhus fever is often achieved with the use of insecticides. Unfortunately, insecticide resistance is becoming common among different vector species. There are currently no chemical alternatives to these insecticides because new human-safe classes of molecules have yet to be brought to the vector-control market. The identification of novel targets offer opportunities for rational design of new chemistries to control vector populations. One target family, G protein-coupled receptors (GPCRs), has remained relatively under explored in terms of insecticide development.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23705687 PMCID: PMC3680159 DOI: 10.1186/1756-3305-6-150
Source DB: PubMed Journal: Parasit Vectors ISSN: 1756-3305 Impact factor: 3.876
Figure 1Flowchart describing the Ensemble*, GPCRHMM*, and Pfam* classifiers. The diamonds represent sequences, while the rectangles represent steps of the classifiers. The dashed rectangles describe to which classifiers the components belong. The input sequences are the sequences to be classified, while the training sequences are sequences which have known designations of either GPCR or not GPCR.
Data set sources and construction
| VectorBase, v. AaegL1.2 | VB and GPCRDB annotations | |
| VectorBase, v. AgamP3.5 | [ | |
| Beebase, v. Pre-Release 2 OGS | GPCRDB annotations | |
| Flybase, v. 5.29 | Flybase and GPCRDB annotations | |
| Ensembl, v. 37.59 | [ | |
| VectorBase, v. 1.2 | VB and GPCRDB annotations |
Figure 2ROC Curve comparing classifier performance on combined data set. The dashed (GPCRHMM*), dotted (Pfam*), and solid (Ensemble*) curves represent the relationship between the percentage of training set GPCRs correctly predicted (true positives) plotted against the percentage of sequences miss-classified as GPCRs (false positives) by the classifiers as the likelihood score threshold, was decreased from 1 (accepting nothing) to 0 (accepting everything). GPCRHMM (diamond), Pfam (circle), and PredCouple (square) only produce true/false predictions for each sequence and hence are represented as single points.
Classifiers’ true positive rates for different false positive rates
| | |||||
|---|---|---|---|---|---|
| Ensemble* | 87% | 89% | 89% | 91% | 91% |
| GPCRHMM | 81% | -- | -- | -- | 81% |
| GPCRHMM* | 81% | 81% | 81% | 83% | 83% |
| Pfam | -- | -- | 82% | -- | 82% |
| Pfam* | 80% | 80% | 80% | 80% | 80% |
| PredCouple | -- | 86% | -- | -- | 86% |
*Percentage of training set GPCRs correctly predicted (true positive rates) for fixed percentages of sequences misclassified as GPCRs (false positive rates) are given. The false positive rates (FPR) of GPCRHMM, PredCouple, Pfam, and at which Ensemble*’s true positive rate plateaus were chosen to provide the most direct comparisons between the classifiers.
Figure 3ROC Curves comparing classifier performance on individual organisms. For each ROC curve, the classifiers were trained on five of the organisms and then evaluated on the sixth organism (given in the title). The solid (Ensemble*), dashed (GPCRHMM*), and dotted (Pfam*) lines represent the relationship between the percentage of correctly predicted training set GPCRs (true positives) and the percentage of incorrectly predicted non-GPCRS (false positives) as the likelihood score threshold is decreased from 1 (accept nothing) to 0 (accept everything). GPCRHMM (diamond), Pfam (circle), and PredCouple (square) only produce true/false predictions for each sequence and hence are represented as single points.
Number of test set sequences found / missed by species
| 73 / 61 | 111 / 23 | 101 / 33 | ||
| 105 / 32 | 115 / 22 | 113 / 24 | ||
| 45 / 11 | 54 / 2 | 54 / 2 | ||
| 176 / 19 | 156 / 39 | 180 / 15 | ||
| 759 / 133 | 712 / 180 | 778 / 114 | ||
| 72 / 31 | 89 / 14 | 86 / 17 | ||
| Vectors (374) | 250 / 124 | 315 / 39 | 300 / 74 | |
| Total (1517) | 1230 / 287 | 1237 / 280 | 1312 / 205 | |
* The number of test set sequences each classifier identified (found) and was unable to identify (missed) as GPCRs are given by species, vectors, and total. GPCRHMM, Pfam, and Predcouple were run using default settings. For Ensemble*, sequences with a positive likelihood score were considered to predicted GPCRs. The best results are in bold.
Figure 4GPCR Validation pipeline. Diamonds indicate input and output sequences. Rectangles represent the validation programs and filters. Arrows indicate the movement of sequences and are labeled with the number of sequences that passed from one validation step to the next. Numbers for sequences that were identified as something other than GPCRs are not shown. The first filter identifies sequences which are annotated as GPCRs in the database and removes any sequences which were annotated as something other than GPCRs or were identified by ScanPROSITE as having non-GPCR domains. The second filter uses a combination of the BLAST hits, I-TASSER results, and identification of GPCR domains by ScanPROSITE to categorize the final 52 predicted but unconfirmed sequences.
Newly-discovered GPCRs identified by ensemble*†
| AAEL013430-PA | 0.346 | ScanPROSITE and BLAST and I-TASSER | putative GPCR class a orphan receptor 5 | |
| AAEL000818-PA | 0.188 | ScanPROSITE and BLAST | class C metabotropic glutamate-like G- protein coupled receptor GPRmgl4, putative | |
| AGAP002229-PA | 0.691 | ScanPROSITE and BLAST | serotonin receptor | |
| AGAP004930-PA | 0.658 | ScanPROSITE and BLAST and I-TASSER | G protein coupled receptor | |
| AGAP000383-PA | 0.480 | ScanPROSITE and BLAST | G protein coupled receptor | |
| AGAP005229-PA | 0.440 | ScanPROSITE and BLAST and I-TASSER | G protein coupled receptor | |
| AGAP000606-PA | 0.411 | ScanPROSITE and BLAST | alpha-2 adrenergic receptor | |
| AGAP013324-PA | 0.409 | ScanPROSITE and BLAST | G protein-coupled receptor | |
| AGAP008703-PA | 0.278 | ScanPROSITE and BLAST and I-TASSER | G protein-coupled receptor | |
| AGAP008237-PA | 0.250 | BLAST and I- TASSER | G protein-coupled receptor 143 | |
| AGAP011320-PA | 0.236 | ScanPROSITE and BLAST and I-TASSER | Beta-3 adrenergic receptor | |
| AGAP012824-PA | 0.167 | ScanPROSITEand BLAST and I-TASSER | tachykinin receptor | |
| AGAP011646-PA | 0.112 | ScanPROSITE and BLAST | class C metabotropic glutamate-like G-protein coupled receptor | |
| PHUM074100-PA | 0.375 | ScanPROSITE and BLAST | Frizzled-7 | |
| PHUM010590-PA | 0.342 | ScanPROSITE and I-TASSER | PREDICTED: cadherin EGF LAG seven-pass G-type receptor 3-like | |
| PHUM423330-PA | 0.289 | ScanPROSITE and I-TASSER | N/A | |
| PHUM447490-PA | 0.250 | ScanPROSITE and I-TASSER | neuropeptide receptor A31 | |
| PHUM618870-PA | 0.208 | ScanPROSITE and I-TASSER | G protein-coupled receptor | |
| PHUM128700-PA | 0.167 | ScanPROSITE and I-TASSER | G protein-coupled receptor | |
| AAEL005994-PA | 0.278 | Unconfirmed | uridine cytidine kinase i | |
| AAEL002694-PA | 0.250 | Unconfirmed | transmembrane protein 87A | |
| AAEL000851-PA | 0.217 | Unconfirmed | N/A | |
| AAEL010852-PA | 0.214 | Unconfirmed | Transmembrane protein 145 | |
| AGAP000130-PA | 0.430 | Unconfirmed | PREDICTED: latrophilin 2-like | |
| AGAP005356-PA | 0.423 | Unconfirmed | PREDICTED: alpha-1A adrenergic receptor-like | |
| AGAP011701-PA | 0.400 | Unconfirmed | GPRmgl4 | |
| AGAP004783-PA | 0.227 | Unconfirmed | neuropeptide receptor A16 | |
| AGAP007896-PA | 0.141 | Unconfirmed | sprinter | |
| PHUM596180-PA | 0.476 | Unconfirmed | transmembrane protein 145 | |
| PHUM288590-PA | 0.136 | Unconfirmed | N/A |
† Sequences were predicted using the Ensemble* classifier. Independent validation was performed using ScanPROSITE, I-TASSER, and homology to GPCRs in other organisms as determined by a BLAST search of the NCBI nr database. Status as a newly-discovered GPCR was contingent upon identification by the Ensemble* classifier and the existence of no contrary evidence, while confirmed GPCRs also had to be validated by at least two independent methods.
Figure 5Distribution of the newly-discovered vector GPCR sequences according to their GPCR status for each vector species.