| Literature DB >> 27721653 |
Pratibha Kottapalli1, Mauricio Ulloa2, Kameswara Rao Kottapalli1, Paxton Payton2, John Burke2.
Abstract
The objective of this study was to explore the known narrow genetic diversity and discover single-nucleotide polymorphic (SNP) markers for marker-assisted breeding within Pima cotton (Gossypium barbadense L.) leaf transcriptomes. cDNA from 25-day plants of three diverse cotton genotypes [Pima S6 (PS6), Pima S7 (PS7), and Pima 3-79 (P3-79)] was sequenced on Illumina sequencing platform. A total of 28.9 million reads (average read length of 138 bp) were generated by sequencing cDNA libraries of these three genotypes. The de novo assembly of reads generated transcriptome sets of 26,369 contigs for PS6, 25,870 contigs for PS7, and 24,796 contigs for P3-79. A Pima leaf reference transcriptome was generated consisting of 42,695 contigs. More than 10,000 single-nucleotide polymorphisms (SNPs) were identified between the genotypes, with 100% SNP frequency and a minimum of eight sequencing reads. The most prevalent SNP substitutions were C-T and A-G in these cotton genotypes. The putative SNPs identified can be utilized for characterizing genetic diversity, genotyping, and eventually in Pima cotton breeding through marker-assisted selection.Entities:
Keywords: Pima cotton; marker-assisted selection; next-generation sequencing; single-nucleotide polymorphism
Year: 2016 PMID: 27721653 PMCID: PMC5049682 DOI: 10.4137/GEI.S40377
Source DB: PubMed Journal: Genomics Insights ISSN: 1178-6310
Summary of cDNA library sequencing of three Pima (G. barbadense) genotypes and de novo assembly statistics of Pima leaf transcriptomes.
| PIMA S6 | PIMA S7 | PIMA 3-79 | |
|---|---|---|---|
| Total number of reads | 9,946,184 | 9,960,492 | 8,963,102 |
| Average lengths of reads | 139 | 138 | 136 |
| Average quality of all reads (Q score) | 36 | 32 | 32 |
| Contig N50 (bases) | 1,379 | 1,392 | 1,393 |
| Average lengths of contigs | 1,035 | 1,079 | 1,073 |
| Average reads per contig | 319 | 323 | 292 |
| Number of contigs | 26,369 | 25,870 | 24,796 |
Figure 1Functional categorization of Pima leaf reference transcriptome consisting of 42,695 contigs using Mercator tool (http://mapman.gabipd.org/web/guest/app/mercator).
Number of SNPs and contigs identified between Pima G. barbadense genotypes, Pima S6 (PS6—26,369 contigs), Pima 3-79 (P3-79), and Pima S7 (PS7—25,870 contigs) using SeqMan Pro 11.2.
| COMPARISON | SNP% | READS 4 | NO. OF CONTIG | READS 8 | NO. OF CONTIG |
|---|---|---|---|---|---|
| 25 | 369,030 | 24,590 | 301,487 | 23,047 | |
| PS6 as reference | 50 | 151,824 | 21,458 | 117,425 | 19,545 |
| 75 | 33,977 | 12,927 | 20,756 | 9,558 | |
| 95 | 10,922 | 5,989 | 4,005 | 2,708 | |
| 100 | 10,520 | 5,796 | 3,603 | 2,489 | |
| 25 | 369,022 | 24,590 | 301,489 | 23,046 | |
| PS6 as reference | 50 | 151,822 | 21,457 | 117,423 | 19,544 |
| 75 | 33,976 | 12,926 | 20,755 | 9,557 | |
| 95 | 10,921 | 5,988 | 4,004 | 2,707 | |
| 100 | 10,520 | 5,796 | 3,603 | 2,489 | |
| 25 | 364,625 | 24,099 | 284,254 | 22,072 | |
| PS7 as reference | 50 | 153,803 | 21,016 | 113,215 | 18,530 |
| 75 | 35,655 | 12,793 | 20,286 | 8,982 | |
| 95 | 11,219 | 5,948 | 3,587 | 2,347 | |
| 100 | 10,790 | 5,746 | 3,158 | 2,129 |
Figure 2Nucleotide substitution percentage from 10,447 contigs of different comparisons with SNPs identified with a high stringent filter of 100% match and minimum of eight reads using Seqman Pro module of Lasergene software 11.2.
Figure 3Alignment view of the reads of Pima 3-79 (P3-79) against the Contig 573 (675 bp) of Pima S6 (PS6). C ↔ T substitution, detected with parameters of 100% SNP identity and depth of the reads greater than 25 reads at 226 bp region (C: black color and T: blue color). SNPs were detected between genotypes PS6 transcriptome (26,369 contigs—reference set) and P3-79 (9.94 million reads call SNP reads) using SeqMan Pro module of Lasergene software 11.2.
Figure 4Normalized melting peaks identified in SNPs between the three genotypes PS6, PS7, and P3-79 using HRM analysis. Each color represents an SNP genotype.
Mapping of total number of reads from the tetraploid AD2 Pima G. barbadense genotypes, Pima S6 (PS6), Pima 3-79 (P3-79), and Pima S7 (PS7) to predicted genes of tetraploid AD1 G. hirsutum Upland TM-1 (Zhang et al;26 Li et al25), diploid A2 G. arboreum (Li et al22), and D5 G. raimondii using CLC Genomics software 5.5.1.
| COUNT | PERCENT OF READS | AVERAGE LENGTH | NO. OF BASES | PERCENT OF BASES | |
|---|---|---|---|---|---|
| References | 70,478 | – | 1,179.63 | 83,137,743 | – |
| Mapped reads | 22,064,533 | 80.45% | 144.30 | 3,183,967,241 | 84.14% |
| Not mapped reads | 5,360,342 | 19.55% | 111.95 | 600,077,721 | 15.86% |
| Total reads | 27,424,875 | 100.00% | 137.98 | 3,784,044,962 | 100.00% |
| References | 76,943 | – | 1,040.74 | 80,077,659 | – |
| Mapped reads | 21,928,801 | 79.96% | 144.24 | 3,162,966,654 | 83.59% |
| Not mapped reads | 5,496,074 | 20.04% | 113 | 621,078,308 | 16.41% |
| Total reads | 27,424,875 | 100.00% | 137.98 | 3,784,044,962 | 100.00% |
| References | 40,134 | – | 1,088.95 | 43,703,865 | – |
| Mapped reads | 21,898,463 | 79.85% | 144.27 | 3,159,324,677 | 83.49% |
| Not mapped reads | 5,526,412 | 20.15% | 113.04 | 624,720,285 | 16.51% |
| Total reads | 27,424,875 | 100.00% | 137.98 | 3,784,044,962 | 100.00% |
| References | 77,267 | – | 1,852.05 | 143,102,382 | – |
| Mapped reads | 24,553,580 | 89.53% | 144.27 | 3,542,322,665 | 93.61% |
| Not mapped reads | 2,871,295 | 10.47% | 84.19 | 241,722,297 | 6.39% |
| Total reads | 27,424,875 | 100.00% | 137.98 | 3,784,044,962 | 100.00% |