| Literature DB >> 15833126 |
Shawn M Gomez1, Karin Eiglmeier, Beatrice Segurens, Pierre Dehoux, Arnaud Couloux, Claude Scarpelli, Patrick Wincker, Jean Weissenbach, Paul T Brey, Charles W Roth.
Abstract
We describe the preliminary analysis of over 35,000 clones from a full-length enriched cDNA library from the malaria mosquito vector Anopheles gambiae. The clones define nearly 3,700 genes, of which around 2,600 significantly improve current gene definitions. An additional 17% of the genes were not previously annotated, suggesting that an equal percentage may be missing from the current Anopheles genome annotation.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15833126 PMCID: PMC1088967 DOI: 10.1186/gb-2005-6-4-r39
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Flow chart of sequence processing and categorization.
Distribution of cDNA clusters across the Anopheles genome
| 2R | 2L | 3R | 3L | X | UNKN | |
| Predicted | 950 (31%) | 676 (22%) | 588 (19%) | 439 (15%) | 229 (8%) | 146 (5%) |
| Novel | 130 (24%) | 141 (26%) | 99 (18%) | 92 (17%) | 50 (9%) | 31 (6%) |
Figure 2Classification of clusters with Gene Ontology. Numbers above bars indicate the number of novel clusters in the given category.
Pfam domains within novel ORFs
| Pfam domain | Description | Number |
| adh_short | Short chain dehydrogenase | 1 |
| Aldo_ket_red | Aldo/keto reductase family | 1 |
| Amidase_2 | 1 | |
| Ank | Ankyrin repeat | 3 |
| Bin3 | Bicoid-interacting protein 3 (Bin3) | 1 |
| CBFD_NFYB_HMF | Histone-like transcription factor (CBF/NF-Y) and archaeal histone | 1 |
| CH | Calponin homology (CH) domain | 1 |
| CRAL_TRIO | CRAL/TRIO domain | 1 |
| Death | Death domain | 1 |
| DEP | Domain found in Dishevelled, Egl-10, and Pleckstrin | 1 |
| Dsrm | Double-stranded RNA binding motif | 2 |
| DUF1395 | Protein of unknown function (DUF1395) | 1 |
| DUF227 | Domain of unknown function (DUF227) | 1 |
| DUF783 | Protein of unknown function (DUF783) | 1 |
| Efhand | EF hand | 4 |
| Exonuc_X-T | Exonuclease | 1 |
| F-box | F-box domain | 1 |
| FYRC | F/Y rich C-terminus | 1 |
| G_glu_transpept | Gamma-glutamyltranspeptidase | 1 |
| GST_C | Glutathione S-transferase, C-terminal domain | 1 |
| HIT | HIT domain | 1 |
| Ins_allergen_rp | Insect allergen related repeat | 1 |
| Linker_histone | Linker histone H1 and H5 family | 1 |
| LRR | Leucine rich repeat | 4 |
| LSM | LSM domain | 1 |
| MtN3_slv | MtN3/saliva family | 2 |
| p450 | Cytochrome P450 | 2 |
| Pkinase | Protein kinase domain | 1 |
| Psf2 | Partner of SLD five, PSF2 | 1 |
| Radical_SAM | Radical SAM superfamily | 1 |
| Retrotrans_gag | Retrotransposon gag protein | 1 |
| Ribosomal_L27e | Ribosomal L27e protein family | 1 |
| Ribosomal_L36e | Ribosomal protein L36e | 1 |
| Ribosomal_L37e | Ribosomal protein L37e | 1 |
| Ribosomal_S8 | Ribosomal protein S8 | 1 |
| RRM_1 | RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain) | 1 |
| SAM_1 | SAM domain (sterile alpha motif) | 1 |
| Serpin | Serpin (serine protease inhibitor) | 1 |
| Tetraspannin | Tetraspanin family | 2 |
| THAP | THAP domain | 2 |
| TIL | Trypsin inhibitor like cysteine rich domain | 1 |
| TIP49 | TIP49 C-terminus | 1 |
| TPR | TPR Domain | 1 |
| TraB | TraB family | 1 |
| Trypsin | Trypsin | 1 |
| Tubulin | Tubulin/FtsZ family, GTPase domain | 1 |
| Tubulin_C | Tubulin/FtsZ family, C-terminal domain | 1 |
| UNC-50 | UNC-50 family | 1 |
| UPF0224 | Uncharacterized protein family (UPF0224) | 1 |
| WD40 | WD domain, G-beta repeat | 3 |
| zf-C2H2 | Zinc finger, C2H2 type | 18 |
| zf-C3HC4 | Zinc finger, C3HC4 type (RING finger) | 1 |
Figure 3GC content of cDNA clusters and Ensembl transcripts.
Figure 4Putative novel member of the CLIPA protein subfamily. (a) Phylogenetic tree of CLIPA subfamily proteins and the novel member described here - PUT CLIPA5B. The protein CG5390 is the closest Drosophila relative to this protein. Bootstrap values are shown as percentages of 1,000 replications (see Materials and methods). (b) Genomic region containing the putative gene. Yellow bars indicate Ensembl 16 transcripts, cDNA evidence is shown in red and cDNA clusters are shown in green. Similarity and proximity suggest that this novel member probably arose through a recent duplication of CLIPA5.
Figure 5Peptidoglycan recognition protein LD. Cluster of cDNAs (cluster_2935 in green) associated with the peptidoglycan recognition protein LD. Note that the current Ensembl definition of PGRPLD (in cyan) is truncated and does not currently reflect available transcript information.