| Literature DB >> 21507250 |
Bonnie A Fraser1, Cameron J Weadick, Ilana Janowitz, F Helen Rodd, Kimberly A Hughes.
Abstract
BACKGROUND: Next-generation sequencing is providing researchers with a relatively fast and affordable option for developing genomic resources for organisms that are not among the traditional genetic models. Here we present a de novo assembly of the guppy (Poecilia reticulata) transcriptome using 454 sequence reads, and we evaluate potential uses of this transcriptome, including detection of sex-specific transcripts and deployment as a reference for gene expression analysis in guppies and a related species. Guppies have been model organisms in ecology, evolutionary biology, and animal behaviour for over 100 years. An annotated transcriptome and other genomic tools will facilitate understanding the genetic and molecular bases of adaptation and variation in a vertebrate species with a uniquely well known natural history.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21507250 PMCID: PMC3113783 DOI: 10.1186/1471-2164-12-202
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Run and assembly statistics for 454 sequencing used for the transcriptome assembly.
| Total reads (n) | 1,665,609 |
| Total bases (bp) | 336,869,979 |
| Assembled reads (n) | 1,162,670 |
| Bases assembled (bp) | 25,534,864 |
| Singletons (n) | 171,305 |
| Total contigs (n) | 54,987 |
Figure 1Gene ontology (GO) ID representations for our guppy transcriptome database (white) and the zebrafish transcriptome (grey). Three comparisons are shown: (a) biological processes ontology; (b) molecular function ontology; (c) cellular component ontology. Asterisks denote significant differences between species for each category. Significance was determined via χ2 tests with a p-value corrected for multiple tests.
Candidate genes with annotations from the Swiss-Prot on NR database (with examples of candidate gene studies).
| Gene description | Accession number | Number in cluster | Database | E-value | Percent coverage | |
|---|---|---|---|---|---|---|
| Non-visual opsins[ | Kallikrein-8; AltName: Neuropsin; | O88780.1 | 1 | SP | 6e-16 | 93% |
| Kallikrein-8; Short = mK8; AltName: Neuropsin | Q61955.1 | 1 | SP | 2e-12 | 81% | |
| Melanopsin-like; AltName: Opsin-4-like | Q1JPS6.1 | 1 | SP | 8e-43 | 23% | |
| Visual opsins | Green-sensitive opsin-1; AltName: Green cone | P32311.1 | 12 | SP | 6e-87 | 60% |
| Rhodopsin | P79756.1 | 1 | SP | e-172 | 65% | |
| Rhodopsin | P79848.1 | 183 | SP | e-64 | 65% | |
| Blue-sensitive opsin; AltName: Blue cone | P87365.1 | 4 | SP | e-154 | 66% | |
| Green-sensitive opsin; AltName: Green cone | P87366.1 | 13 | SP | 9e-108 | 79% | |
| Green-sensitive opsin-4; AltName: Green cone | Q9W6A6.2 | 2 | SP | 3e-61 | 32% | |
| Red-sensitive opsin; AltName: Red cone | P87367.1 | 6 | SP | 1e-76 | 75% | |
| Putative violet-sensitive opsin; AltName: Violet | P87368.1 | 12 | SP | 2e-85 | 80% | |
| MHC class I [ | Mhc, class IA | CAA90790.1 | 2 | NR | 5e-100 | 20% |
| Mhc, class IA | CAA90793.1 | 1 | NR | 6e-27 | 55% | |
| Mhc, class 1b | CAA90782.1 | 1 | NR | 3e-16 | 31% | |
| classical MHC class I antigen | ACN49159.1 | 1 | NR | 1e-04 | 9% | |
| classical MHC class I antigen | ACN49175.1 | 1 | NR | 3e-11 | 26% | |
| MHC class I related gene | O19477.2 | 5 | SP | 4e-8 | 21% | |
| MHC class I related gene | Q5RD09.1 | 1 | SP | 8e-14 | 51% | |
| MHC class I receptor | AAY79253.1 | 1 | NR | 5e-15 | 31% | |
| MHC class II | MHC class II alpha subunit | AAO19852.1 | 1 | NR | 1e-8 | 12% |
| MHC class II antigen | AAP20186.1 | 1 | NR | 1e-10 | 17% | |
| MHC II invariant chain | AAS77256.1 | 1 | NR | 5e-22 | 31% | |
| MHC class II antigen beta chain | ABX44766.1 | 1 | NR | 2e-7 | 30% | |
| MHC class II antigen | ACN72667 | 1 | NR | 9e-15 | 41% | |
| HLA class II histocompatibility antigen, DRB1-8 beta | Q30134.2 | 2 | SP | 2e-7 | 50% | |
| Behaviour Genes [ | D(4) dopamine receptor; AltName: Dopamine D4 | P21917.2 | 1 | SP | 2e-22 | 15% |
| D(2)-like dopamine receptor | P53453.1 | 2 | SP | 4e-61 | 28% | |
| Early growth response protein 1 | P26632.2 | 1 | SP | 2e-42 | 24% | |
| Target of EGR1 protein 1 | Q17QN2.1 | 1 | SP | 6e-20 | 37% | |
| Target of EGR1 protein 1 | Q96GM8.1 | 1 | SP | 1e-76 | 38% | |
| Proto-oncogene protein c-fos | P53450.1 | 3 | SP | 8e-30 | 37% | |
| c-FosLb protein | CAD56866.1 | 1 | NR | 9e-11 | 14% | |
| Pigment genes [ | D-dopachrome decarboxylase-A | Q68FI3.1 | 1 | SP | 1e-33 | 100% |
| Melanocyte protein Pmel 17; AltName: Silver | Q98917.3 | 11 | SP | 9e-15 | 14% | |
| L-dopachrome tautomerase | O93505.1 | 2 | SP | 5e-57 | 29% | |
| Melanocyte-stimulating hormone receptor | P55167.1 | 1 | SP | 2e-36 | 36% | |
| Dihydropteridine reductase; AltName: HDHPR | P09417.2 | 2 | SP | 5e-57 | 74% | |
| Pterin-4-alpha-carbinolamine dehydratase | Q91901.3 | 4 | SP | 1e-14 | 52% | |
| Pterin-4-alpha-carbinolamine dehydratase 2 | Q9CZL5.2 | 1 | SP | 2e-40 | 76% | |
| Putative pterin-4-alpha-carbinolamine dehydratase | Q9TZH6.3 | 1 | SP | 5e-41 | 58% |
Presented are the gene description found in the protein database, the accession number, the number of contigs and EST sequences in that annotation cluster, the database (Swiss-Prot: SP, non-redundant: NR), the mean e-value, and mean percent coverage of the reference sequence by the contigs or EST sequences.
Contigs tested for male specificity by PCR
| ID | Gene Description (accession ID) | Number of reads | Length | Number of putative SNPs | Male-specific expression confirmed |
|---|---|---|---|---|---|
| contig44905 | lipocalin-type prostaglandin D synthase-like protein (second top search) (BAB88224.1) | 59 | 781 | 3 | Yes |
| contig50719 | Islet amyloid polypeptide precursor (ACO09255.1) | 63 | 339 | 8 | No |
| contig44896 | Putative heparin-binding growth factor 1 (Q6PBT8.1) | 64 | 711 | 2 | Yes |
| contig42251 | EF-hand calcium-binding domain-containing protein 2 (Q9CQ46.1) | 78 | 394 | 2 | No |
| contig50654 | Hyaluronidase-2 precursor (AC132917.1) | 135 | 1,465 | 21 | Yes |
| contig40220 | N/A | 181 | 1,179 | 0 | No |
Presented are the annotations with either the Swiss-Prot or NR databases, the number of reads used to assemble the contig, length of contig, the number of putative SNPs, and whether male-specific expression was confirmed with PCR.
RNA-seq results showing the number of reads after the purity filter, the number of reads aligned to our reference database with percent of total number of reads in brackets, and the number of reference sequences the reads mapped to in our database with percentage of total number of sequences in brackets.
| Number of reads (after purity filter) | Number of reads mapped (%) | Number of reference sequences matched (%) | |
|---|---|---|---|
| Guppy pred - 1 | 32,054,094 | 16,229,906 (51%) | 42,501 (73%) |
| Guppy pred - 2 | 31,238,411 | 16,158,014 (51%) | 42,256 (72%) |
| Guppy pred + 1 | 30,811,092 | 15,603,608 (51%) | 42,072 (72%) |
| Guppy pred + 2 | 30,680,881 | 16,752,768 (55%) | 43,099 (74%) |
| Sailfin molly | 29,754,476 | 12,248,933 (41%) | 39,704 (68%) |
Figure 2Differential expression in predator-exposed and non-exposed fish. The differently expressed genes are in blue, and the others in grey. The x-axis is an estimate of the relative abundance of the transcript (a measure of the average expression level for each sequence across the two groups, Ag), and the y-axis is a measure of differential expression, Mg. The solid light-blue horizontal lines show where genes with 2-fold differences in expression would fall, so all the genes with differential expression in this analysis show > 2 fold differences between treatments. Reference sequences with very low or very high values of Mg have their fold-change values compressed to fit within the [-10, +10] interval. The compressed values usually represent sequences with zero counts in one treatment group.
List of sequences that were differently expressed in guppies exposed to predators and guppies that were not
| Sequence name | Log FC | FDR | Counts pred - 1 | Counts pred - 2 | Counts pred + 1 | Counts pred + 2 | Annotation (accession #) | |
|---|---|---|---|---|---|---|---|---|
| contig36557 | -30.67 | 2e-08 | 0.0004 | 16 | 32 | 0 | 0 | |
| ES381343 | -4.02 | 3e-08 | 0.0004 | 175 | 443 | 22 | 16 | hepcidin-like precursor (AAS66305.1) |
| contig46890 | -30.54 | 1.1e-07 | 0.0009 | 14 | 30 | 0 | 0 | |
| contig41409 | 4.61 | 1.1e-07 | 0.0009 | 3 | 0 | 25 | 49 | |
| contig34882 | -5.98 | 1.3e-07 | 0.0009 | 19 | 44 | 0 | 1 | |
| contig40497 | 1.66 | 1.9e-07 | 0.0010 | 184 | 193 | 524 | 671 | Cerebellin-2 (Q8BGU2.1) |
| contig34536 | -3.73 | 2.4e-07 | 0.0011 | 33 | 60 | 5 | 2 | hepcidin-like precursor (AAS66305.1) |
| contig30446 | -3.9 | 5.4e-07 | 0.0022 | 41 | 94 | 6 | 3 | LINE-1 reverse transcriptase homolog (P08548.1) |
| ES374452 | 2.54 | 6.7e-07 | 0.0024 | 12 | 9 | 69 | 53 | Fibrocystin-L (Q86WI1.1) |
| ES383031 | 3.47 | 1.7e-06 | 0.0054 | 94 | 90 | 493 | 1588 | Nattectin Precursor (Q66S03.1) |
| contig06616 | 1.8 | 2.3e-06 | 0.0068 | 47 | 52 | 146 | 200 | Cerebellin-1; (P63182.2) |
| ES376890 | -1.47 | 3.7e-06 | 0.0098 | 493 | 601 | 196 | 200 | Fibronectin (P07589.4) |
| contig32843 | 2.63 | 4.3e-06 | 0.0106 | 5 | 5 | 31 | 31 | |
| contig46202 | 29.67 | 4.9e-06 | 0.0111 | 0 | 0 | 10 | 14 | |
| ES371621 | -1.56 | 5.2e-06 | 0.0111 | 345 | 419 | 135 | 124 | Proactivator polypeptide (P07602.2) |
| contig18751 | 1.66 | 8.4e-06 | 0.0168 | 70 | 64 | 176 | 249 | |
| contig36338 | -30.17 | 1.6e-05 | 0.0294 | 8 | 26 | 0 | 0 | |
| contig38849 | -1.91 | 2.1e-05 | 0.0352 | 53 | 56 | 16 | 13 | Complement C1q tumor necrosis factor-related protein (P0C862.1) |
| ES371258 | 1.68 | 2.1e-05 | 0.0352 | 132 | 87 | 312 | 392 | |
| contig40097 | 2.14 | 2.2e-05 | 0.0352 | 10 | 9 | 36 | 48 | Granzyme A (P11032.2) |
| contig13556 | 1.44 | 3.2e-05 | 0.0494 | 97 | 92 | 216 | 299 | |
| contig37489 | -1.98 | 3.5e-05 | 0.0494 | 47 | 48 | 14 | 10 | |
| contig33977 | 4.64 | 3.6e-05 | 0.0494 | 1 | 0 | 10 | 15 | |
| ES383122 | -4.06 | 3.7e-05 | 0.0494 | 17 | 50 | 3 | 1 | 60S ribosomal protein L11 (Q5RC11) |
Shown is the log fold change (Log FC) of the number of counts, p-value, corrected p-value (FDR), counts for samples not exposed to predators (pred - 1 and pred - 2) and exposed to predators (pred + 1 and pred + 2). Annotation is from either the Swiss-prot or the NR databases with their accession number in brackets.