| Literature DB >> 22724064 |
Biao Wang1, Robert Ekblom, Todd A Castoe, Eleanor P Jones, Radoslav Kozma, Erik Bongcam-Rudloff, David D Pollock, Jacob Höglund.
Abstract
The black grouse (Tetrao tetrix) is a galliform bird species that is important for both ecological studies and conservation genetics. Here, we report the sequencing of the spleen transcriptome of black grouse using 454 GS FLX Titanium sequencing. We performed a large-scale gene discovery analysis with a focus on genes that might be related to fitness in this species and also identified a large set of microsatellites. In total, we obtained 182 179 quality-filtered sequencing reads that we assembled into 9035 contigs. Using these contigs and 15 794 length-filtered (greater than 200 bp) singletons, we identified 7762 transcripts that appear to be homologues of chicken genes. A specific BLAST search with an emphasis on immune genes found 308 homologous chicken genes that have immune function, including ten major histocompatibility complex-related genes located on chicken chromosome 16. We also identified 1300 expressed sequence tag microsatellites and were able to design suitable flanking primers for 526 of these. A preliminary test of the polymorphism of the microsatellites found 10 polymorphic microsatellites of the 102 tested. Genomic resources generated in this study should greatly benefit future ecological, evolutionary and conservation genetic studies on this species.Entities:
Keywords: bird, spleen, RNA-seq, immune genes, major histocompatibility complex, microsatellites
Mesh:
Substances:
Year: 2012 PMID: 22724064 PMCID: PMC3376728 DOI: 10.1098/rsob.120054
Source DB: PubMed Journal: Open Biol ISSN: 2046-2441 Impact factor: 6.411
Summary of sequencing and assembly results.
| number of reads | 182 179 |
| read length (bp) | 320 ± 140 |
| number of reads assembled | 153 065 |
| percentage of reads assembled | 84 |
| number of contigs | 9035 |
| contig length (bp) | 470 ± 250 |
| reads per contig | 18.81 |
| coverage per nucleotide site | 10.01 |
| number of singletons | 15 794 |
| singleton length (bp) | 370 ± 90 |
Figure 1.A summary of sequencing and contig assembly results. (a) Length distribution of the pre-process 454 quality-filter-pass reads. (b) Length distribution of assembled contigs. Contigs larger than 2000 bp are binned at the end of the x-axis. (c) Distribution of reads per contig (blue) and coverage per nucleotide site (red). Contigs with more than 30 reads are binned at the end of the x-axis. (d) Density scatterplot showing relationship between reads per contig and contig length. The black line represents the trend of the contig length with increasing reads per contig. Both the x- and y-axes are presented on a log scale.
Summary of annotation results.
| homology search using E | |
|---|---|
| number of transcripts (contigs + singletons) | 24 829 |
| number of transcripts used | 12 593 |
| number of genes discovered | 6852 |
| additional homology search using NCBI bird proteins | |
| number of remaining transcripts | 12 236 |
| number of transcripts used | 1150 |
| number of genes discovered | 910 |
Figure 2.Distribution of the Gene Ontology (GO) functional categories. The transcripts of the black grouse spleen were classified into GO categories of (a) biological process (BP), (b) molecular function (MF) and (c) cellular component (CC) on the basis of GO second level terms.
Figure 3.Distribution of the immune-related GO terms. The transcripts with immune functions were assigned to all levels of immune-related GO terms. The top 17 of the represented terms are shown and the rest are binned at the bottom.
Chromosome 16 MHC genes identified.
| transcript ID | transcript length (bp) | E | gene symbol | gene description |
|---|---|---|---|---|
| contig01538 | 1384 | ENSGALP00000000233 | MHC class I antigen B-F major heavy chain | |
| contig08853 | 249 | ENSGALP00000000193 | MHC class II beta chain | |
| contig08938 | 353 | |||
| contig08968 | 402 | |||
| contig02821 | 491 | ENSGALP00000000213 | MHC class II M alpha chain | |
| contig02432 | 490 | ENSGALP00000040419 | MHC class II M beta chain 2 | |
| contig02454 | 2014 | ENSGALP00000000211 | bromodomain containing 2 | |
| FZYUT3M04XU4W0 | 441 | ENSGALP00000000182 | B-lec C-type lectin-like receptor | |
| contig01331 | 603 | ENSGALP00000000202 | tapasin precursor | |
| contig08153 | 323 | |||
| contig02591 | 807 | ENSGALP00000040428 | transporter associated with antigen processing 2 fragment | |
| FZYUT3M04XTXK7 | 252 | |||
| contig01293 | 1697 | ENSGALP00000000170 | guanine nucleotide-binding protein subunit beta-2-like 1 | |
| FZYUT3M04YN4L0 | 434 | ENSGALP00000019549 | tripartite motif protein 7 |
aGenes that are curated based on a double-check of the NCBI RefSeq database.
A summary of microsatellites identified.
| number of loci identified | number of loci with primers designed | number of loci with annotated information | number of loci tested | |
|---|---|---|---|---|
| di-nucleotides | 337 | 91 | 34 | 30 |
| tri-nucleotides | 805 | 388 | 258 | 70 |
| tetra-nucleotides | 111 | 33 | 8 | 2 |
| penta-nucleotides | 47 | 14 | 6 | 0 |
| total | 1300 | 526 | 306 | 102 |
A summary of polymorphic microsatellites.
| locus | repeat motif | primer sequence (5′–3′) | number of alleles | allele size (bp) | related gene |
|---|---|---|---|---|---|
| BG03 | (AGC) | F: GCACTCTTCACTAGCAGCCC | 3 | 146–164 | DNA repair protein RAD52 homolog |
| R: CAAGCAGGGTCAGAGCATTG | |||||
| BG04 | (AC) | F: GGGTCTCTTGCTTCCTTGAC | 2 | 219–221 | ATP-binding cassette, sub-family C, member 9 isoform SUR2A |
| R: TTAAACTTCATGCTCACACGC | |||||
| BG07 | (AT) | F: CAGTTACAGCAAGGACAGAGC | 2 | 127–141 | putative uncharacterized protein, UniProtKB/TrEMBL Acc. Q5ZM27 |
| R: GGGAGCCAACAAGAATAAACTG | |||||
| BG14 | (AT) | F: ACAGCGCCTTCCCTATATCC | 2 | 146–149 | claudin domain containing 1 |
| R: TGACCAAACTTTGCCGGAAG | |||||
| BG15 | (AG) | F: ACAGACACAGAAAGCATCCC | 3 | 312–316 | amyloid beta A4 protein |
| R: TGCTGTAACACAAGTAGATGCC | |||||
| BG21 | (ACG) | F: AACATCACGCCGTTTCACTG | 2 | 124–127 | probable ATP-dependent RNA helicase DDX10 |
| R: AAGCCGCGTTCCAAACAC | |||||
| BG26 | (AC) | F: TGACAGCCTGGGAAGTATGC | 2 | 264–268 | C-type lectin domain family 3, member B |
| R: CACCAGTGGCTCTTTGATGC | |||||
| BG29 | (AGG) | F: CCAGCTTTCATGACCACGTC | 3 | 136–142 | alpha-L RNA-binding motif superfamily |
| R: TCAGTACTCTCTCTGCGGAAC | |||||
| BG78 | (AGG) | F: TCTTCAGGGCTTTCTCAGGG | 2 | 234–240 | ABC transporter-like |
| R: CATGAAACCTGTCAGCGTGG | |||||
| BG94 | (AC) | F: TGAACCTGAGAAGGCAAAGG | 3 | 130–148 | sarcoplasmic/endoplasmic reticulum calcium ATPase 3 |
| R: AGCATCAGGGTGAGGTGTC |