| Literature DB >> 22712730 |
Xiaohong Duan1, Emily Schmidt, Pei Li, Douglas Lenox, Lin Liu, Changlong Shu, Jie Zhang, Chun Liang.
Abstract
BACKGROUND: The peanut (Arachis hypogaea) is an important crop cultivated worldwide for oil production and food sources. Its complex genetic architecture (e.g., the large and tetraploid genome possibly due to unique cross of wild diploid relatives and subsequent chromosome duplication: 2n = 4x = 40, AABB, 2800 Mb) presents a major challenge for its genome sequencing and makes it a less-studied crop. Without a doubt, transcriptome sequencing is the most effective way to harness the genome structure and gene expression dynamics of this non-model species that has a limited genomic resource. DESCRIPTION: With the development of next generation sequencing technologies such as 454 pyro-sequencing and Illumina sequencing by synthesis, the transcriptomics data of peanut is rapidly accumulated in both the public databases and private sectors. Integrating 187,636 Sanger reads (103,685,419 bases), 1,165,168 Roche 454 reads (333,862,593 bases) and 57,135,995 Illumina reads (4,073,740,115 bases), we generated the first release of our peanut transcriptome assembly that contains 32,619 contigs. We provided EC, KEGG and GO functional annotations to these contigs and detected SSRs, SNPs and other genetic polymorphisms for each contig. Based on both open-source and our in-house tools, PeanutDB presents many seamlessly integrated web interfaces that allow users to search, filter, navigate and visualize easily the whole transcript assembly, its annotations and detected polymorphisms and simple sequence repeats. For each contig, sequence alignment is presented in both bird's-eye view and nucleotide level resolution, with colorfully highlighted regions of mismatches, indels and repeats that facilitate close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22712730 PMCID: PMC3444431 DOI: 10.1186/1471-2229-12-94
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Major characters of the peanutDB transcriptome assembly
| The Total Contigs in the Assembly | 32,619 | 100.0 | |
| The Longest Contig | | | 29,281 |
| The Shortest Contig | | | 91 |
| N50 Length3 | | | 1,167 |
| Contigs with a length > = 500 | 32,319 | 99.1 | |
| Contigs with a length > = 1,000 | 12,225 | 37.5 | |
| Contigs with a length > = 5,000 | 44 | 0.1 | |
| Contigs with valid GO annotation | 19,529 | 59.9 | |
| Contigs with valid EC annotation | 7,744 | 23.7 | |
| Contigs with valid KEGG annotation | 9,604 | 29.4 |
1 PeanutDB Transcriptome Assembly release: PeanutDB Version 1.0, Apr-18-2012.
2 The percentage is calculated using the total contig number (i.e., 32,619) in the assembly as the denominator.
3 N50 Length is a statistical indicator of average contig length for a given assembly, defined as the length N for which 50% of all bases in the sequences are in a sequence of length L < N) [4].
Figure 1The snapshots of PeanutDB web interfaces. Panel A: the major web portal. Panel B1: The data grid view shows EC annotation within Contig Annotation. Panel B2: The data filter for data grid views. Panel C: the sequence viewer shows multiple functions for sequence manipulation. Panel D1: the alignment view in bird’s-eye resolution. Panel D2: the alignment view in nucleotide resolution that displays and highlights the differences between contig sequence and individual sequence reads.