| Literature DB >> 34040621 |
Samuel Daniel Lup1, David Wilson-Sánchez1, Sergio Andreu-Sánchez1, José Luis Micol1.
Abstract
Mapping-by-sequencing strategies combine next-generation sequencing (NGS) with classical linkage analysis, allowing rapid identification of the causal mutations of the phenotypes exhibited by mutants isolated in a genetic screen. Computer programs that analyze NGS data obtained from a mapping population of individuals derived from a mutant of interest to identify a causal mutation are available; however, the installation and usage of such programs requires bioinformatic skills, modifying or combining pieces of existing software, or purchasing licenses. To ease this process, we developed Easymap, an open-source program that simplifies the data analysis workflows from raw NGS reads to candidate mutations. Easymap can perform bulked segregant mapping of point mutations induced by ethyl methanesulfonate (EMS) with DNA-seq or RNA-seq datasets, as well as tagged-sequence mapping for large insertions, such as transposons or T-DNAs. The mapping analyses implemented in Easymap have been validated with experimental and simulated datasets from different plant and animal model species. Easymap was designed to be accessible to all users regardless of their bioinformatics skills by implementing a user-friendly graphical interface, a simple universal installation script, and detailed mapping reports, including informative images and complementary data for assessment of the mapping results. Easymap is available at http://genetics.edu.umh.es/resources/easymap; its Quickstart Installation Guide details the recommended procedure for installation.Entities:
Keywords: NGS; bioinformatics; bulked segregant analysis; candidate mutations; forward genetics; linkage analysis mapping; mapping-by-sequencing
Year: 2021 PMID: 34040621 PMCID: PMC8143052 DOI: 10.3389/fpls.2021.655286
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
FIGURE 1Overview of two typical mapping-by-sequencing experiments with Easymap in Arabidopsis. (A) Experimental design. For EMS-induced mutants, an outcross or backcross is first performed. The F1 plants derived from the cross are selfed, and the resulting F2 is screened for the mutant phenotype to create a phenotypically mutant mapping population. A control sample is required for the mapping analysis, which can be either one of the parental individuals crossed or, alternatively, a pool of phenotypically wild-type F2 individuals. For mapping of large insertions, the DNA of different insertional mutant lines can be sequenced individually or pooled, and no control sample is required. (B) Input files. Easymap takes NGS paired-end or single-end short reads as input. The remaining mandatory input files are available on public databases for each model species. (C) Easymap workflows. The user selects the experimental design used for mutation mapping from a variety of options for both EMS mutation mapping (backcross and outcross strategies, alternative control samples) and tagged-sequence mapping (paired-end and single-end reads). (D) Output. Easymap produces comprehensive mapping reports with organized tabular data to ease interpretation of the results. As an example of EMS-induced mutations, data from the Arabidopsis suppressor of overexpression of CONSTANS 1-2 (soc1-2) mutant (Sun and Schneeberger, 2015) was used for this figure. Allele frequency (AF) versus position plots are drawn for each chromosome containing the polymorphisms used for the analysis. A candidate region is highlighted in pink; all putative EMS-type mutations contained in this region are regarded as candidates, and their position and relevant information are presented in a table, such as the gene affected by the mutation. For each gene affected by a candidate mutation, a gene plot is made in which the position of the mutation is shown, followed by further information (genotyping primers, flanking sequences, functional annotation, etc.). As an example of large insertion mapping, the figure includes data from an unpublished mapping experiment made in our laboratory (see Table 2). A genomic overview is drawn showing the positions of the insertions found. Read depth (RD) histograms are generated for each read cluster pointing to an insertion site showing the information supporting the insertion. Finally, a gene plot is made for each gene interrupted by an insertion.
Validation of large-insertion mapping strategies with real experimental data.
| Pool of 10 SALK mutants | 17 of 19 known insertions were detected by Easymap; the remaining 2 were filtered out as false positives due to very few supporting reads (average RD per sample was 4.5×, instead of the recommended 10×) | |
| Lup and Micol, unpublished | Pool of 6 SALK mutants | 9 insertions detected |
| Mutagenized T1c-19 rice | 2 of 2 insertions detected; 2 clearly distinguishable false positives | |
| Mutagenized TT51-1 rice | 2 of 2 insertions detected; 47 clearly distinguishable false positives due to the presence of an endogenous sequence in the insertion sequence | |
| Mutant T027 | 2 of 2 insertions detected; 1 false positive common to all lines from this article (omitted below) | |
| Mutant T182 | 1 of 1 insertion detected; 1 false positive | |
| Mutant T204 | 1 of 1 insertion detected | |
| Mutant T273 | 1 of 1 insertion detected split in two clusters due to a large deletion in the mutagenized genome | |
| Mutant T400 | 1 of 1 insertion detected |
Validation of point-mutation mapping strategies using published, real experimental data.
| Same as reference sequence | Backcross/M2 | Wild-type parental strain | DNA | CM | |
| DNA | CM and 2 more candidates within the CI | ||||
| DNA | CM and one more candidate within the CI | ||||
| RNA | Correct CI, CM undetectable with our methods | ||||
| Phenotypically wild-type siblings | DNA | CM and 3 more candidates within the CI | |||
| DNA | CM and 2 more candidates within the CI | ||||
| RNA | Correct CI, CM unknown in the original paper | ||||
| Outcross | Wild-type parental strain | DNA | Correct CI, CM unknown in the original paper | ||
| Genetically distant strain crossed to the mutant | DNA | CM and 6 more candidates within the CI | |||
| DNA | Correct CI, CM unknown in the original paper | ||||
| Different from reference sequence | Backcross/M2 | Phenotypically wild-type siblings | DNA | CM is the only candidate within the CI | |
| DNA | Correct CI, CM lost due to very low coverage | ||||
| RNA | Correct CI, CM is fine-mapped in later experiments | ||||
| Outcross | Wild-type parental strain | DNA | Correct CI, CM lost due to experimental design | ||
| DNA | CM and 70 more candidates within the CI |
FIGURE 2Some strategies for EMS-induced mutation mapping implemented in Easymap. (A–D) The input reads are processed into control and test SNP lists. The lists are contrasted to determine the SNPs that can be informative for mapping, which are subjected to an allelic frequency (AF) analysis to find the mapping region. A candidate region is defined around the center of the mapping region, and the potentially causal SNPs within the candidate region are collected as candidate SNPs. (A) For a mutant strain obtained in the reference genetic background, a backcross is performed to obtain the mapping population and the control sample used is the parental of the mutagenized line. (B) For a mutant obtained in the reference genetic background, an outcross is performed to obtain the mapping population, and the control sample is the polymorphic wild-type parent. (C) For a mutant obtained in a non-reference strain, a backcross is performed to obtain the mapping population, and the control sample used is a pool of phenotypically wild-type F2 individuals. (D) For a mutant obtained in a non-reference strain, an outcross is performed to obtain the mapping population, and the control sample is the parent of the mutagenized line. (E–H) Selection of the experimental design corresponding to panels (A–D) in the multiple-choice selectors of the graphic interface of Easymap.
FIGURE 3Large insertion mapping with Easymap. (A–C) Local alignment analysis. (A) The DNA insert appears in blue, over genomic DNA in gray. Individual reads are taken from the mutant genome. (B) The reads are aligned to the insertion sequence. Locally aligned reads (e.g., 1) are selected and sorted according to the end that is truncated (in blue and green). (C) The selected reads are aligned to the genomic reference sequence. The blue triangle indicates the position of the insertion in the mutant genome. (D–F) Paired-read analysis. (D) Paired reads are taken from the mutant genome. (E) The reads are aligned to the insertion sequence. Unaligned reads with aligned mates (e.g., 2) are selected and sorted according to their position in relation to the insertion (in blue and green). (F) The selected reads are aligned to the reference sequence, delimiting a candidate region for the insertion site. (G,H) Read depth histograms for examples of local alignment (G) and paired-read analyses (H). (G) False-positive insertion, characterized by low overall read depths and disorganized data. (H) True-positive insertion, characterized by high read depths and organized data.
Third-party software packages included in Easymap.
| Linkage analysis mapping | HISAT2 | hisat2-build for genome indexing; hisat2 with default options for paired-end or single-end read alignment | |
| SAMtools | samtools sort to convert BAM files to SAM; samtools mpileup for first step in variant calling; with arguments: -t DP, ADF, ADR for specific output formatting of VCF file -C50 to fix overestimated mapping qualities (high stringency mode) | ||
| BCFtools | bcftools call for second step in variant calling, with argument -mv to report only polymorphic sites | ||
| HTSlib | A dependency for BCFtools | ||
| Tagged sequence mapping | Bowtie 2 | bowtie2-build for genome indexing; bowtie2 with default options for paired-end read alignment; bowtie2 -local for local alignment of paired-end or single-end reads |
Experimental designs supported by different open-source programs used for mapping-by-sequencing.
| Point mutations | Backcross | Parental line | D | D | D/R | ||
| Phenotypically wild-type F2 or M2 | D | D/R | |||||
| Outcross | Parental line | D | D/R | ||||
| Phenotypically wild-type F2 or M2 | D | D/R | |||||
| Large insertions | – | – | D | D | |||