| Literature DB >> 24602056 |
Xu Chi1, Yingchun Zhang, Zheyong Xue, Laibao Feng, Huaqing Liu, Feng Wang, Xiaoquan Qi.
Abstract
Chemical mutagenesis is routinely used to create large numbers of rare mutations in plant and animal populations, which can be subsequently subjected to selection for beneficial traits and phenotypes that enable the characterization of gene functions. Several next-generation sequencing (NGS)-based target enrichment methods have been developed for the detection of mutations in target DNA regions. However, most of these methods aim to sequence a large number of target regions from a small number of individuals. Here, we demonstrate an effective and affordable strategy for the discovery of rare mutations in a large sodium azide-induced mutant rice population (F2 ). The integration of multiplex, semi-nested PCR combined with NGS library construction allowed for the amplification of multiple target DNA fragments for sequencing. The 8 × 8 × 8 tridimensional DNA sample pooling strategy enabled us to obtain DNA sequences of 512 individuals while only sequencing 24 samples. A stepwise filtering procedure was then elaborated to eliminate most of the false positives expected to arise through sequencing error, and the application of a simple Student's t-test against position-prone error allowed for the discovery of 16 mutations from 36 enriched targeted DNA fragments of 1024 mutagenized rice plants, all without any false calls.Entities:
Keywords: NGS; induced mutation detection; large samples pooling; multiplexed target enrichment
Mesh:
Substances:
Year: 2014 PMID: 24602056 PMCID: PMC4265296 DOI: 10.1111/pbi.12174
Source DB: PubMed Journal: Plant Biotechnol J ISSN: 1467-7644 Impact factor: 9.803
Figure 1Schematic illustration of the semi-nested PCR-based multiple target enrichment method, based on tridimensional pooling of DNA. Two independent sets of 512 DNA samples were arranged in the form of 8 × 8 × 8 arrays, from each of which 24 DNA bulks, each comprising 64 individuals, were created by pooling equimolar amounts of DNA taken from the samples arrayed in each plane in all three orthogonal directions. The DNA was then sheared, ligated to adaptors and amplified by three rounds of PCR. The first PCR was based on a target-specific forward primer and an adapter-derived reverse primer. The second PCR was based on a target-specific nested primer with a 5′ overhang and the nested adapter-derived reverse primer. The overhang and the adapter that anchored the NGS-compatible forward and reverse primers were used for the third round PCR. Bar codes were embedded within the adaptors. The resulting NGS library consisted of a population of fragments of variable length, each having one fixed and one variable end.
Figure 2Summary of the pilot experiment. (a) The number of enriched target fragments and the enrichment efficiency. The pie chart at the top shows the total number of reads and the maximum and minimum proportions of reads in the five enriched libraries produced by varying the number of target-specific semi-nested PCR primers. The red line indicates the number of enriched target fragments when 3–13 target-specific semi-nested PCR primers were used to amplify templates of the corresponding subpools. The histogram in cyan indicates the ratio of on-target sequences. (b) Sequencing depth of the enriched target DNA fragments. (c) A box plot showing the rate of both sequencing errors and genuine mutations in the five subpools. The mean rate of each nucleotide substitution type is shown above the box plot. (d) The sequencing error rate in the vicinity of GGT/ACC trinucleotides. (e) The detected base substitution rates of C to T or G to A plotted against sequencing depth. (f) The background sequencing error rate plotted against sequencing depth. The dotted lines indicate expected allele frequencies of 15.6 × 10−3 and 7.8 × 10−3 in a pooled sample comprising, respectively, a single homozygous and a single heterozygous mutant along with 63 wild-type individuals.
Figure 3The probability density function of the normal distribution of DNA proportions of genuine mutations in the bulked samples (red area) and the ratio of base substitution numbers in total sequencing depth (the m/n ratio, blue area). The red and blue arrows indicate the one-tail left-hand borders equivalent to 97.5% of the DNA proportion and the m/n ratio, respectively.
Figure 4The putative and confirmed candidates detected in the tridimensional pooled DNA samples. (a) The log10 of the number of presumptive mutations remaining after each filtering step (original numbers shown above the relevant column). (b) Validation of the candidate mutations. The grey dotted lines indicate the one-tail left-hand border, equivalent to 95% of the ‘true mutations’ base substitution rate (3.02 × 10−3). The confirmed genuine mutations are marked by red dots. Sequencing errors are marked by blue circles.
The number of genuine mutations identified using different MaCBD values
| MaCBD | Pool A | Pool B | ||||
|---|---|---|---|---|---|---|
| Number of genuine mutations | Number of false-positive mutations | False-positive rate (%) | Number of genuine mutations | Number of false-positive mutations | False-positive rate (%) | |
| 4 | 6 | 4 | 40 | 12 | 3 | 20 |
| 5 | 6 | 6 | 50 | 13 | 6 | 32 |
| 6 | 6 | 10 | 62 | 13 | 11 | 46 |
MaCBD, Maximum cumulative bulk data.