| Literature DB >> 23110314 |
Hamid Ashrafi1, Theresa Hill, Kevin Stoffel, Alexander Kozik, Jiqiang Yao, Sebastian Reyes Chin-Wo, Allen Van Deynze.
Abstract
BACKGROUND: Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes.Entities:
Mesh:
Year: 2012 PMID: 23110314 PMCID: PMC3545863 DOI: 10.1186/1471-2164-13-571
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
A summary of sequences included in the pepper Sanger-EST assembly
| | 123,489 | |
| 125,320 | | |
| 372 | | |
| | 465 | |
| 318 | | |
| 28 | | |
| 31 | | |
| 27 | | |
| Others | 61 | |
| 515 | ||
| 427 | | |
| 76 | | |
| Others | 12 | |
| 642 | 642 | |
Figure 1a) Distribution of contigs length in a) pepper Sanger-EST assembly b) distribution of contigs length in pepper IGA transcriptome assembly.
Comparison of assembly of pepper Sanger ESTs versus assembly of IGA reads
| Number of Unigenes | 31,196 | 123,261 |
| Total assembled nucleotides | 21,665,127 | 135,019,787 |
| Average GC content (%) | 41 | 39 |
| Longest Contig Length | 3,989 | 19,089 |
| Average Contig Length | 694.5 | 1,095 |
| Median Contig Length | 651 | 697 |
| N50 | 702 | 1,647 |
| Number/Percent Contigs Size < 1 KB | 27,248 | 78,433 |
| 1–2 KB | 3,634 | 27,436 |
| 2–3 KB | 288 | 10,616 |
| 3–4 KB | 26 | 3,955 |
| 4–5 KB | 0 | 1,559 |
| 5–10 KB | 0 | 1,184 |
| 10–20 KB | 0 | 78 |
The effect of trimming the reads and k-mer length on the number of contigs and N50 in IGA transcriptome assembly
| Untrimmed/K31 | 65,337 | 52,179 | 80 | 603 | 62,570 | 50,306 | 80 | 589 | 68,737 | 57,077 | 83 | 497 |
| Untrimmed/K35 | 64,096 | 49,875 | 78 | 680 | 61,561 | 48,345 | 79 | 660 | 68,237 | 55,564 | 81 | 562 |
| Untrimmed/K41 | 52,099 | 36,770 | 71 | 947 | 50,290 | 35,900 | 71 | 926 | 58,431 | 44,045 | 75 | 777 |
| Trimmed 25–70/K31 | 42,310 | 30,628 | 72 | 864 | 36,173 | 24,871 | 69 | 995 | 39,956 | 28,711 | 72 | 870 |
| Trimmed 25–70/K35 | 34,525 | 22,627 | 66 | 1,109 | 30,202 | 18,859 | 62 | 1,205 | 34,497 | 23,039 | 67 | 1,057 |
| Trimmed 25–70/K41 | 27,439 | 16,728 | 61 | 1,239 | 26,885 | 16,660 | 62 | 1223 | 28,588 | 18,162 | 64 | 1,165 |
* The untrimmed reads were between 40–80 nt long. The same reads were trimmed by 5 and 10 nt from 5' an 3' ends respectively to eliminate the possible sequencing errors at the beginning and at the end of each read. The numbers were selected arbitrarily but K=31 is the Velvet recommended value.
Summary statistics of transcriptome assembly of three pepper lines using Velvet, CLC and CAP3 assemblers
| Super Assembly | Velvetb | 75,853 | 1,287 | 71,903,681 | 70,459 | 1,303 | 67,210,074 | 81,973 | 1,198 | 73,865,962 |
| | CLCc | 83,187 | 1,357 | 79,564,926 | 76,542 | 1,389 | 74,367,265 | 81,528 | 1,347 | 78,144,374 |
| Mega Assembly | CAP3 | 83,113 | 1,488 | 84,792,180 | 76,375 | 1,526 | 79,383,673 | 82,614 | 1,488 | 84,973,865 |
| (Velvet+CLC) | ||||||||||
| Meta Assembly (3 lines) | CAP3 | 123,261 | 1,647 | 135,019,787 | ||||||
a No of contigs longer than 300 nucleotides were included in all assemblies.
b Six Velvet iterations including three k-mers of normally trimmed data and 3 k-mers of stringently trimmed data were assembled using CAP3 program.
c Two CLC iterations including one normally trimmed data and one stringently trimmed data were assembled using CAP3 program.
Figure 2Distribution of Blast2GO three-step processes including BLASTX, mapping and annotation of for a) Sanger-EST assembly and b) IGA transcriptome assembly.
Figure 3a) Species distribution by accounting all BLASTX hits in the Sanger-EST assembly b) Top-hit species distribution based on BLASTX alignments in the Sanger-EST assembly. c) Species distribution by accounting all BLASTX hits in the transcriptome assembly d) Top-hit species distribution based on BLASTX alignments in the IGA transcriptome assembly. Cultivated Solanum species are more frequent than wild type species (S. habrochaites or S. bulbocastanum). Within Capsicum species, there are more hits to C. annuum than C. chinense or other distantly related capsicum species such as C. chacoense.
Figure 4An instance of a KEGG map for Pyrimidine metabolism pathway. Each box represents the enzyme code involved in each section of the pathway. The colored boxes are depicting identified enzymes by a) Sanger-EST assembly and b) transcriptome assembly. The KEGG files can be downloaded from Pepper GeneChip website (https://pepper.ucdavis.edu).