| Literature DB >> 25073905 |
Alejandro Padrón, Alvaro Molina-Cruz1, Mariam Quinones, José Mc Ribeiro, Urvashi Ramphul, Janneth Rodrigues, Kui Shen, Ashley Haile, José Luis Ramirez, Carolina Barillas-Mury.
Abstract
BACKGROUND: Genome sequencing of Anopheles gambiae was completed more than ten years ago and has accelerated research on malaria transmission. However, annotation needs to be refined and verified experimentally, as most predicted transcripts have been identified by comparative analysis with genomes from other species. The mosquito midgut-the first organ to interact with Plasmodium parasites-mounts effective antiplasmodial responses that limit parasite survival and disease transmission. High-throughput Illumina sequencing of the midgut transcriptome was used to identify new genes and transcripts, contributing to the refinement of An. gambiae genome annotation.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25073905 PMCID: PMC4131051 DOI: 10.1186/1471-2164-15-636
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Heat map of coverage of the Illumina reads for the midgut transcriptome. A) Coverage of Illumina reads obtained for An. gambiae G3 strain along the An. gambiae genome. B) Coverage of Illumina reads obtained for An. gambiae L3-5 strain along the An. gambiae genome. Within a data zoom of 3, the colors scale linearly from blue to green to red (low to high coverage). The gap region of poorly expressed genes in chromosome 3R corresponds to the heterochromatic region (orange arrow) near subdivision 35B/C.
Distribution of midgut transcripts by Cufflinks class code
| Transcript class codes | Number | Percentage |
|---|---|---|
|
| 5483 | 23.95 |
| Complete match | 5483 | 23.95 |
| Novel isoform | 4940 | 21.58 |
| Within reference intron | 2550 | 11.14 |
| Read mapping errors | 15 | 0.07 |
| Overlap | 517 | 2.26 |
| Pre-mRNA | 470 | 2.05 |
| Exonic overlap to opposite strand | 131 | 0.57 |
|
|
|
|
|
| ||
| Polymerase run-on | 1821 | 7.96 |
| Unknown Intergenic | 5450 | 23.81 |
| Repeat | 707 | 3.09 |
| Multiple classifications | 805 | 3.52 |
|
|
|
|
|
|
|
|
Number of Anopheles gambiae midgut transcripts for each Cufflinks class code and as a percentage of the total.
Figure 2Coding probability of midgut transcripts and functional classification of midgut transcripts generated by a genome-based analysis. A) Intergenic transcripts (light pink color) and genic (blue) show a bimodal distribution that defines two major populations of transcripts with different coding probability. B) Number of potentially novel and previously annotated An. gambiae transcripts were functionally classified by BLAST to different databases. Abbreviated titles are “RNA Processing/Transc/Transl”: RNA Processing, Transcription, Translation; “Cytosk/Stor/Secr/ExMtrx”: Cytoskeletal, Secretion, Extracellular Matrix; “Transp/Channels”: Transporters and Channels; “Post trnsl mod/Prot mach”: Post-translational modification and proteasome machinery; “Nuc export & Reg” Nuclear Export and Regulation. “Protease/Protease inhib”: Protease and protease inhibitors; “Transp. Element”: Transposable Element.
Distribution of midgut lncRNA by Cufflinks class code
| Transcript class codes | Amount | LncRNA | Percentage |
|---|---|---|---|
|
| |||
| Complete match | 5483 | 321 | 5.8 |
| Novel isoform | 4940 | 601 | 1.2 |
| Within reference intron | 2550 | 1616 | 63.3 |
| Read mapping errors | 15 | 10 | 66.7 |
| Overlap | 517 | 151 | 29.2 |
| Pre-mRNA | 470 | 219 | 46.6 |
| Exonic overlap to opposite strand | 131 | 90 | 68.7 |
|
|
|
|
|
|
| |||
| Polymerase run-on | 1821 | 1511 | 83.0 |
| Unknown intergenic | 5450 | 4335 | 79.5 |
| Repeat | 707 | 377 | 53.3 |
| Multiple classifications | 805 | 632 | 78.5 |
|
|
|
|
|
|
|
|
|
|
Number of An. gambiae midgut long non-coding RNA (lncRNA) by Cufflinks class codes and as a percentage of the class code transcript total.
Figure 3Frequency of midgut transcripts by length. Transcripts were generated either by a genome-based strategy using TopHat/Cufflinks (black line) or by a de novo strategy using ABySS (orange line).
Alignment comparison of and genome-based assembly strategies for the midgut transcriptome
| Reference genes detected | ||||
|---|---|---|---|---|
| Query | # Transcripts | AVG Length (BP) | Unique | Shared |
|
| 67011 | 678 | 1009 | 6881 |
|
| 20273 | 2039 | 167 | |
Alignment comparison was done using BLAST with de novo contigs as the query and with genome-based assembly transcripts as the subject and vice versa.
Figure 4Density of variants and expressed genes across 100-kb loci for the midgut transcripts for each chromosomal arm. Variant density (blue filled graph) and gene density (red filled graph) shows variants across the entire genome with a tendency to decrease toward centromeric regions (“C” label). The gap region of no variants or expressed genes in Chromosome 3R corresponds to the heterochromatic region (orange arrow) near subdivision 35B/C. A normalized ratio of variants to gene density (blue line graph) shows regions of high polymorphism and regions of low polymorphism.
Figure 5Annotation of the detected variants in mapped reads from the midgut transcriptome. The type of variant in either the G3 or L3-5 An. gambiae strain vs. the pink-eyed laboratory strain of An. gambiae (PEST; AgamP3.6) reference genome. Variant annotation was performed using a program for annotating and predicting the effects of single-nucleotide polymorphisms (snpEFF).