| Literature DB >> 21156066 |
Jeong-Hyeon Choi1, Teiya Kijimoto, Emilie Snell-Rood, Hongseok Tae, Youngik Yang, Armin P Moczek, Justen Andrews.
Abstract
BACKGROUND: Horned beetles, in particular in the genus Onthophagus, are important models for studies on sexual selection, biological radiations, the origin of novel traits, developmental plasticity, biocontrol, conservation, and forensic biology. Despite their growing prominence as models for studying both basic and applied questions in biology, little genomic or transcriptomic data are available for this genus. We used massively parallel pyrosequencing (Roche 454-FLX platform) to produce a comprehensive EST dataset for the horned beetle Onthophagus taurus. To maximize sequence diversity, we pooled RNA extracted from a normalized library encompassing diverse developmental stages and both sexes.Entities:
Mesh:
Year: 2010 PMID: 21156066 PMCID: PMC3019233 DOI: 10.1186/1471-2164-11-703
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sequencing and assembly statistics
| Category | |
|---|---|
| Total number of reads | 1,366,749 |
| Total length of reads (bp) | 625,825,203 |
| Total number of reads cleaned | 1,361,424 |
| Total length of reads cleaned (bp) | 598,655,879 |
| Number of reads placed | 1,302,023 |
| Number of singletons | 10,992 |
| Total length of singletons (bp) | 3,714,066 |
| Average length of singletons (bp) | 337 |
| Largest singleton (bp) | 692 |
| Number of contigs | 39,088 |
| Total length of contigs (bp) | 22,806,009 |
| Average length of contigs (bp) | 583 |
| Largest contig (bp) | 6,401 |
| Average read coverage of contigs | 24 |
Sequence matches against public databases
| Database | E-value | Contigs | Singletons | Total | |||
|---|---|---|---|---|---|---|---|
| Query | Subject | Query | Subject | Query | Subject | ||
| NCBI NR | 10-5 | 21,275 | 12,739 | 2,715 | 2,371 | 23,990 | 14,223 |
| 10-20 | 14,359 | 9,604 | 1,357 | 1,261 | 15,716 | 10,394 | |
| 10-50 | 8,068 | 6,350 | 373 | 356 | 8,441 | 6,574 | |
| 10-5 | 14,807 | 5,158 | 3,448 | 1,760 | 18,255 | 5,322 | |
| 10-20 | 10,260 | 4,671 | 2,236 | 1,259 | 12,946 | 4,799 | |
| 10-50 | 5,945 | 3,791 | 1,011 | 624 | 6,956 | 3,880 | |
| 10-5 | 20,560 | 8,911 | 2,185 | 1,614 | 22,745 | 9,303 | |
| 10-20 | 14,497 | 7,888 | 1,124 | 951 | 15,621 | 8,203 | |
| 10-50 | 8,100 | 5,798 | 316 | 284 | 8,416 | 5,907 | |
The total numbers of Onthophagus sequences with matches against public databases at the indicated E-value cut-off. Databases: NCBI NR [52], Tribolium UniGene [91], and Tribolium proteins [92]. "Query" denotes the total number of Onthophagus sequences with matches against sequences from the database at the indicated cut-off. "Subject" denotes the total number of sequences from the indicated database with matches against Onthophagus sequences at the indicated E-value cut-off.
Figure 1Sequence matches to NR protein database and . Venn diagram showing the number of Onthophagus contigs and singletons (in parenthesis) with sequence matches against the NCBI NR database [52], Tribolium genome sequence [92] and Tribolium annotated proteins [92]. The number of sequence matches at E-value cut-offs of 1 × 10-5, 1 × 10 -20 and 1 × 10-50 are shown in black, red and blue, respectively.
Figure 2Taxonomic distribution of sequence matches. Phylogenetic tree showing the number of O. taurus non-redundant sequences assigned to branches. The MEGAN algorithm used in this analysis assigns each sequence to the lowest common ancestor of the set of taxa with corresponding sequence matches. The total numbers of O. taurus sequences assigned to each branch are indicated in decimals and by the pie chart area (Log scale). Pie graph colors indicate the proportion of contigs (red) and singeltons (blue) assigned to each branch.
Figure 3Bi-connected components and alternative splicing. A. An example of a bi-connected component structure (BCCs). A BBC composed of three contigs 28477, 25928 and 04341 that share three independent sets of 31, 28, and 8 broken reads, respectively (indicated by dashed line) in relation to the homologous T. castaneum transcript (Tc XM963744). Our analysis suggests that this pattern is reflective of two alternative splice variants present in the Onthophagus transcriptome. B. The three conceptual polypeptide sequences from these contigs align to a contiguous region of the Disabled protein from Tribolium, supporting this hypothesis. Shown are, from top to bottom, two alternative Drosophila Disabled transcripts (dark blue lines; thin light blue lines indicate first methionine (M) and stop codon (*)), the homologous Tribolium sequence (green; no alternative transcripts are known from Tribolium) and the relative positions of contigs 28477, 25928 and 04341. Note that the contig 28477 (light orange), which based on our analysis is a putatively alternatively spliced exon, does not share similarity with the exon that is alternatively spliced in Drosophila.
Clustering using sequence matches to HomoloGene
| E-value | Contigs | Singletons | Total | |||
|---|---|---|---|---|---|---|
| Query | Subject | Query | Subject | Query | Subject | |
| 1 × 10-5 | 17,160 | 11,504 | 1,807 | 2,934 | 18,967 | 12,464 |
| 1 × 10-20 | 11,032 | 8,084 | 845 | 1,145 | 11,877 | 8,524 |
| 1 × 10-50 | 5,711 | 4,767 | 183 | 190 | 5,894 | 4,846 |
The total numbers of non-redundant Onthophagus sequences with matches against the HomoloGene [54] database at the indicated E-value cut-offs. "Query" denotes the total number of Onthophagus sequences with matches against HomoloGene sequences, and "subject" denotes the total number of sequences from the HomoloGene database with matches against Onthophagus. The numbers of cases where the all Onthophagus sequences in a cluster have the best match to the same HomoloGene sequence are shown in parenthesis.
Figure 4GO categories. GO annotations associated with 8,504 HomoloGene sequence groups. The distributions of the second and third levels of the GO term annotations of the sampled O. taurus sequences (left charts), were remarkably similar to those on the complete T. castaneum proteome (right charts).
Candidate developmental genes
| Sequence ID | Accession | E-value | Description | GO |
|---|---|---|---|---|
| contig19562 | NP_001034532.1 | 9.00E-99 | brachyury | 1 |
| FQTIJGT01DV55X | NP_001034527.2 | 2.00E-53 | Kruppel | 1 |
| contig27749 | XP_970831.2 | 1.00E-23 | PREDICTED: similar to fibroblast growth factor receptor | 1 |
| contig14082 | XP_001602830.1 | 1.00E-137 | PREDICTED: similar to epidermal growth factor receptor | 1, 2, 4, 21 |
| contig13096 | XP_001654153.1 | 1.00E-108 | decapentaplegic | 1, 4, 21 |
| contig18654 | XP_975017.2 | 0 | PREDICTED: similar to ets | 1, 8 |
| contig18756 | BAD00045.1 | 0 | armadillo protein | 1, 9, 16 |
| contig13865 | XP_970668.2 | 1.00E-58 | PREDICTED: similar to Homeobox protein cut | 1, 10 |
| contig04562 | NP_001107765.1 | 3.00E-45 | hairy | 1, 10, 21 |
| FQTIJGT02F4ASF | NP_001034490.1 | 4.00E-23 | pangolin | 1, 15 |
| contig17509 | XP_968516.2 | 0 | PREDICTED: similar to par-1 CG8201-PA | 1, 15 |
| contig00028 | XP_967537.1 | 1.00E-180 | PREDICTED: similar to COUP-TF/Svp nuclear hormone receptor | 1, 17 |
| contig14224 | XP_970678.1 | 1.00E-156 | PREDICTED: similar to thickveins CG14026-PA | 1, 21 |
| contig08201 | XP_974235.1 | 6.00E-37 | PREDICTED: similar to DNA cytosine-5 methyltransferase | 3 |
| FQTIJGT01BFBE3 | XP_974854.1 | 8.00E-69 | PREDICTED: similar to cornichon protein, putative | 2 |
| contig19544 | XP_966833.1 | 0 | PREDICTED: similar to extracellular signal-regulated kinase | 2 |
| contig25027 | XP_968594.2 | 1.00E-115 | PREDICTED: similar to Ecdysone-induced protein 63E CG10579-PK | 5 |
| contig04278 | XP_396527.3 | 1.00E-176 | PREDICTED: similar to Ecdysone-induced protein 78C CG18023-PA, | 5, 17 |
| contig32340 | XP_001847468.1 | 5.00E-89 | ras | 5 |
| FQTIJGT01E6R2M | NP_001116500.1 | 2.00E-12 | matrix metalloproteinase 1 isoform 2 | 5, 9 |
| contig20604 | XP_001663781.1 | 3.00E-17 | phosphatidylinositol 3-kinase regulatory subunit | 6 |
| contig33698 | XP_001952079.1 | 1.00E-19 | PREDICTED: similar to insulin receptor | 6 |
| contig26215 | XP_974994.1 | 4E-70 | PREDICTED: similar to Phosphatidylinositol-3,4,5-trisphosphate 3-phosphatase and dual-specificity protein phosphatase PTEN | 6, 8 |
| contig36880 | NP_001128399.1 | 2.00E-14 | epoxide hydrolase 1 | 7 |
| contig20325 | NP_001034501.1 | 0 | extradenticle | 4 |
| contig04888 | XP_969771.2 | 6.00E-45 | PREDICTED: dachshund | 4 |
| contig29998 | XP_001944887.1 | 9.00E-75 | PREDICTED: similar to BarH1 CG5529-PA | 4 |
| contig07732 | XP_969484.2 | 3.00E-30 | PREDICTED: similar to LIM homeobox 1b | 4 |
| contig01196 | NP_001034489.1 | 1.00E-152 | homothorax | 4, 18 |
| FQTIJGT02HBUI7 | NP_001107853.1 | 1.00E-38 | Notch | 11 |
| contig26747 | XP_975449.2 | 4E-20 | PREDICTED: similar to FAS-associated factor 1, putative | 8 |
| contig14846 | NP_001034510.1 | 1E-126 | transcription factor deformed | 8, 18 |
| contig03161 | AAO16241.1 | 3.00E-86 | effector caspase; Sl-caspase-1 | 9 |
| mira_c460 | XP_001810562.1 | 3.00E-31 | PREDICTED: similar to caspase | 9 |
| contig02560 | XP_966617.2 | 2.00E-77 | PREDICTED: similar to E74 | 9 |
| contig02035 | XP_967068.2 | 2.00E-19 | PREDICTED: similar to NAD-dependent deacetylase sirtuin-1 | 9, 12 |
| contig22079 | XP_970822.2 | 0 | PREDICTED: similar to Darkener of apricot CG33553-PG | 9, 19 |
| contig20962 | NP_001107840.1 | 2.00E-67 | Dicer-2 | 12 |
| contig04521 | XP_971295.2 | 0 | PREDICTED: Argonaute-1 | 12 |
| contig35987 | XP_624270.2 | 1.00E-109 | PREDICTED: similar to brahma CG5942-PA, isoform A, partial | 12 |
| contig03878 | XP_975376.1 | 1.00E-70 | PREDICTED: similar to Headcase protein | 12 |
| contig04464 | XP_966633.1 | 0 | PREDICTED: similar to histone deacetylase | 12 |
| contig35227 | NP_001107838.1 | 5.00E-40 | aristaless | 10 |
| contig36981 | XP_001814382.1 | 1.00E-156 | PREDICTED: similar to fringe CG10580-PA | 10, 11 |
| contig09571 | XP_975412.2 | 2.00E-53 | PREDICTED: similar to suppressor of fused | 13 |
| contig04709 | XP_975408.1 | 0 | PREDICTED: similar to supernumerary limbs CG3412-PA | 13, 15 |
| FQTIJGT02G66FW | EEB10664.1 | 6.00E-37 | Antennapedia, putative | 18 |
| contig06860 | AAK96031.1 | 3.00E-82 | homeodomain transcription factor Prothoraxless | 18 |
| contig04152 | NP_001107807.1 | 0 | maxillopedia | 18 |
| contig08220 | XP_971065.1 | 5.00E-96 | PREDICTED: similar to rotated abdomen CG6097-PA | 18 |
| contig05318 | NP_001034497.1 | 1.00E-107 | ultrabithorax | 18 |
| contig02060 | XP_971671.2 | 4.00E-44 | PREDICTED: similar to fruitless | 20, 19 |
| contig25669 | XP_001807448.1 | 1.00E-58 | PREDICTED: similar to BmDSX-F | 19 |
| contig14519 | XP_971676.1 | 2.00E-70 | PREDICTED: similar to iroquois-class homeodomain protein irx | 14 |
| contig15982 | XP_968422.1 | 1.00E-117 | PREDICTED: similar to cadherin | 14, 21 |
| contig22068 | NP_001127850.1 | 2.00E-88 | smoothened | 14, 21 |
| contig31931 | NP_001107650.1 | 2.00E-94 | ecdysone receptor isoform A | 17 |
| FQTIJGT02HPUTA | XP_001845875.1 | 6.00E-67 | nuclear hormone receptor FTZ-F1 beta | 17, 22 |
| FQTIJGT02GV771 | XP_971362.2 | 7.00E-18 | PREDICTED: similar to ecdysone inducible protein 75 | 17 |
| contig03369 | CAH69897.1 | 1.00E-162 | retinoid X receptor | 17 |
| contig04903 | NP_001107813.1 | 1.00E-122 | glass bottom boat protein | 21 |
| contig05941 | XP_971286.2 | 0 | PREDICTED: similar to mothers against dpp protein | 21 |
| contig07923 | EEB19343.1 | 8.00E-19 | porcupine, putative | 16 |
| contig01101 | XP_968118.1 | 5.00E-45 | PREDICTED: similar to frizzled | 16 |
| contig08319 | XP_623523.1 | 4.00E-75 | PREDICTED: similar to frizzled 7 | 16 |
| contig03739 | XP_974963.2 | 1.00E-172 | PREDICTED: similar to jnk | 16 |
| contig13446 | XP_973551.1 | 1.00E-150 | PREDICTED: similar to legless CG2041-PA | 16 |
| FQTIJGT02I9RJB | XP_969261.1 | 3.00E-13 | PREDICTED: similar to Wnt11 protein | 16 |
| contig29078 | XP_968055.2 | 1.00E-77 | PREDICTED: similar to Wnt6 | 16 |
| contig19316 | XP_974084.1 | 1.00E-70 | PREDICTED: similar to wntless CG6210-PB | 16 |
| contig19245 | XP_001847858.1 | 1.00E-168 | wingless protein | 16 |
Examples of contigs representing genes putatively involved in Onthophagus development. contig/singleton ID, accession number from NCBI NR dataset, E-value, and gene description are shown. Specific GO terms shown here are: 1. cell fate determination, 2. epidermal growth factor receptor signaling pathway, 3. DNA methylation, 4. leg disc pattern formation, 5. instar larval or pupal development, 6. insulin receptor signaling pathway, 7. juvenile hormone metabolic process, 8. positive regulation of programmed cell death, 9. programmed cell death, 10. regulation of Notch signaling pathway, 11. Notch signaling pathway, 12. regulation of gene expression, epigenetic, 13. regulation of smoothened signaling pathway, 14. smoothened signaling pathway, 15. regulation of Wnt receptor signaling pathway, 16. Wnt receptor signaling pathway, 17. steroid hormone receptor activity, 18. segment specification, 19. sex differentiation, 20. sex determination, 21. wing disc pattern formation, 22. response to ecdysone.
Effect of contig length and coverage on detection of SNPs and Indels
| Total detected SNPs | Total Detected Indels | |||||
|---|---|---|---|---|---|---|
| Contig Length | 1.73 | 4371 | 0.0000 | 0.128 | 138.8 | <0.0001 |
| Number Reads | 0.96 | 5417 | 0.0000 | 0.313 | 3304 | 0.0000 |
Both contig length (transformed) and number of reads (transformed) were related to total detected SNPs and total detected indels in standard least squares linear models. Residuals from these models were used in subsequent analyses to estimate levels of genetic variation in a contig controlling for sampling differences between contigs.
Correlations between patterns of gene expression and estimated levels of variation
| Residual SNP frequency | Residual Indel Frequency | |||||
|---|---|---|---|---|---|---|
| Average Expression | -0.565 | 46.5 | <0.0001 | -0.083 | 8.19 | 0.004 |
| Number of Tissues | -0.364 | 13.1 | 0.0003 | -0.063 | 3.23 | 0.07 |
| Morph-biased Exp. | -0.127 | 0.42 | 0.52 | -0.032 | 0.22 | 0.64 |
| Sex-biased Exp. | 0.108 | 0.30 | 0.58 | -0.009 | 0.02 | 0.89 |
Shown are results from standard least square linear models relating measures of gene expression from a previous experiment - average expression levels (A), sex-biased gene expression, alternate mating morph-biased gene expression, and expression detected across up to four different tissue types - to estimates of genetic variation (see Table 5).
Figure 5Gene expression patterns are correlated with patterns of genetic variation. Residual SNP frequency (number of SNPs in a contig controlling for contig length and read number) was negatively related to overall expression levels ("A") and the number of tissues (head horn epidermis, thoracic horn epidermis, legs and central brain) in which differential expression was detected in a previous microarray study (N = 48 arrays, reported in Snell-Rood et al. 2010). Statistics are presented in Table 6.
Figure 6Flow diagram of sequence assembly and annotation. Flow diagram illustration the steps involved in the sequence analysis. Computational steps are indicated by purple boxes, sequences are indicated by blue boxes, and analysis with respect to sequence or annotation databases are indicated by green cylinders.