| Literature DB >> 28179981 |
Jorge Duitama1, Lina Kafuri2, Daniel Tello2, Ana María Leiva3, Bernhard Hofinger4, Sneha Datta4, Zaida Lentini5, Ericson Aranzales3, Bradley Till4, Hernán Ceballos3.
Abstract
Cassava is one of the most important food security crops in tropical countries, and a competitive resource for the starch, food, feed and ethanol industries. However, genomics research in this crop is much less developed compared to other economically important crops such as rice or maize. The International Center for Tropical Agriculture (CIAT) maintains the largest cassava germplasm collection in the world. Unfortunately, the genetic potential of this diversity for breeding programs remains underexploited due to the difficulties in phenotypic screening and lack of deep genomic information about the different accessions. A chromosome-level assembly of the cassava reference genome was released this year and only a handful of studies have been made, mainly to find quantitative trait loci (QTL) on breeding populations with limited variability. This work presents the results of pooled targeted resequencing of more than 1500 cassava accessions from the CIAT germplasm collection to obtain a dataset of more than 2000 variants within genes related to starch functional properties and herbicide tolerance. Results of twelve bioinformatic pipelines for variant detection in pooled samples were compared to ensure the quality of the variant calling process. Predictions of functional impact were performed using two separate methods to prioritize interesting variation for genotyping and cultivar selection. Targeted resequencing, either by pooled samples or by similar approaches such as Ecotilling or capture, emerges as a cost effective alternative to whole genome sequencing to identify interesting alleles of genes related to relevant traits within large germplasm collections.Entities:
Keywords: Cassava; Herbicide tolerance; Pooled targeted resequencing; SNP detection; Starch biosynthesis
Year: 2017 PMID: 28179981 PMCID: PMC5295625 DOI: 10.1016/j.csbj.2017.01.002
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Metabolic reactions related to starch biosynthesis. Arrows indicate reactions catalyzed by the enzymes listed close to the corresponding arrow.
Fig. 2Read alignment statistics per pool. a) Number of fragments sequenced as paired-end reads for each pool. Counts are discriminated as number of fragments aligning with the expected distance and orientation (proper pair) to a unique region of the genome, fragments aligning as a proper pair to multiple regions and fragments not aligned or not aligned as a proper pair. The line indicates the percentage of fragments that could be uniquely assigned to a targeted region defined by the coordinates of its corresponding primer pair. b) Distribution of the number of fragments assigned to each target region within each pool.
Fig. 3Comparison of variant calls with different pipelines. a) Number of total variants detected by each variant caller; b) Comparison of number of SNPs called by each SNP discovery tool on alignments obtained with bowtie2 and with BWA; c) Comparison of number of SNPs called between different SNP calling tools on bowtie2 alignments; d) Comparison of number of SNPs called between different SNP calling tools on BWA alignments; e) Distribution of differences in predicted alternative allele frequency between pools for the curated dataset of SNPs; f) Distribution of minor allele frequency for SNPs identified only by VipR discriminating SNPs found in a dataset of variants obtained from WGS data. The line indicates the percentage of such SNPs within each category.
Fig. 4Functional analysis of variants. a) Distribution of alternative allele frequencies observed over the 8 pools for the dataset obtained removing SNPs that were called only by vipR. b) Distribution of SNPs within coding regions of the genes sequenced in this study. The line represents the number of SNPs per kilo base pair c) Reads supporting a 1 bp deletion changing the open reading frame to generate an early stop codon in the allele of the AHAS gene at chromosome 17. The upper panel is a visualization using the integrative genomics viewer (IGV) of the reads spanning the region (gray rectangles). Colors different than gray indicate base calls different than the reference allele. The highlighted column shows reads reporting a 1 bp deletion. The lower panel shows a view of the JBrowse visualizer available in phytozome of the highlighted subregion, including the nucleotide sequence and the six possible amino acid translations. The arrow indicates the location of the frameshift deletion.