Literature DB >> 35321412

High-performance pipeline for MutMap and QTL-seq.

Yu Sugihara^1,2, Lester Young³, Hiroki Yaegashi¹, Satoshi Natsume¹, Daniel J Shea¹, Hiroki Takagi⁴, Helen Booker^3,5, Hideki Innan⁶, Ryohei Terauchi^1,2, Akira Abe¹.

Abstract

Summary: Bulked segregant analysis implemented in MutMap and QTL-seq is a powerful and efficient method to identify loci contributing to important phenotypic traits. However, the previous pipelines were not user-friendly to install and run. Here, we describe new pipelines for MutMap and QTL-seq. These updated pipelines are approximately 5-8 times faster than the previous pipeline, are easier for novice users to use, and can be easily installed through bioconda with all dependencies. Availability: The new pipelines of MutMap and QTL-seq are written in Python and can be installed via bioconda. The source code and manuals are available online (MutMap: https://github.com/YuSugihara/MutMap, QTL-seq: https://github.com/YuSugihara/QTL-seq). ©2022 Sugihara et al.

Entities: Chemical

Keywords: Agricultural science; Bioinformatics; Bulked-segregant analysis; Mutation mapping; QTL analysis

Year: 2022 PMID： 35321412 PMCID： PMC8935991 DOI： 10.7717/peerj.13170

Source DB: PubMed Journal: PeerJ ISSN： 2167-8359 Impact factor: 2.984

Introduction

Bulked segregant analysis (Michelmore, Paran & Kesseli, 1991; Giovannoni et al., 1991; Li & Xu, 2021), as implemented in MutMap (Abe et al., 2012) and QTL-seq (Takagi et al., 2013), is a powerful and efficient method to identify loci contributing to important phenotypic traits. MutMap requires whole-genome resequencing of a single individual from the original cultivar and the pooled sequences of F2 progeny from a cross between the original cultivar and mutant. MutMap uses the sequence of the original cultivar to polarize the site frequencies of neighboring markers and identifies loci with an unexpected site frequency, simulating the genotype of F2 progeny. QTL-seq was adapted from MutMap to identify single or multiple loci contributing to important phenotypic traits. It utilizes sequences pooled from two segregating progeny populations with extreme opposite traits (e.g., resistant vs. susceptible to a pathogen) and single whole-genome resequencing of either of the parental cultivars. The original QTL-seq algorithm assumes that loci controlling phenotypic traits fix in opposite directions in two bulked populations through self-fertilizing. Therefore, QTL-seq is usually applied to homozygous genomes of the self-fertilizing plant but not to heterozygous genomes obligated to outcross. Despite their usefulness, these programs are not user-friendly to install or run and require multiple user inputs. Another problem is that the programs requires Coval (Kosugi et al., 2013) for variant calling, which relies on the older versions of SAMtools (before 0.1.8). Updated software including PyBSASeq (Zhang & Panthee, 2020) and QTL-seqr (Mansfeld & Grumet, 2018) have been developed (Li & Xu, 2021). In this study, we describe newly developed pipelines for MutMap and QTL-seq with updated features.

Implementation

The new pipelines support read trimming by Trimmomatic (Bolger, Lohse & Usadel, 2014), replacing fastx-toolkit in the previous pipeline. Trimmed reads are aligned by BWA-MEM (Li & Durbin, 2009), replacing BWA-SAMPE, BWA-ALN and Coval. Improperly paired reads and PCR duplicates are filtered by SAMtools (Li et al., 2009). Subsequently, a VCF file is generated by the “mpileup” command implemented in BCFtools (Li, 2011). The user can start the analysis from any point in the process, e.g., from raw FASTQs, trimmed FASTQs, BAM files, or a VCF file. MutPlot and QTL-plot, which are standalone programs, were developed for postprocessing of VCF files. Low-quality variants in a VCF file are filtered out based on mapping quality and strand bias and the actual and expected SNP-indices calculated based on the AD (allele depth) value of each sample pool (Abe et al., 2012). In QTL-seq, a ΔSNP-index is calculated by subtracting the SNP-index of one bulk from the other (Takagi et al., 2013). As an option, multiple testing correction (Huang et al., 2020) was also adopted to the simulation. Both pipelines ignore the SNPs which are missing in the parental sample. Candidate causal mutations in the VCF file are shown graphically after optionally executing SnpEff (Cingolani et al., 2012), which assesses the impact of located mutations on putatively expressed genes. The procedures are connected by a Python script.

Methods

To compare the performance of the new and old pipelines, we ran MutMap and QTL-seq on an AMD EPYC 7501 processor (Base 2.0 GHz) with 48 GB RAM and 12 threads (located at ROIS National Institute of Genetics in Japan). MutMap was run for two datasets, dataset 1 and dataset 2, used in the previous research (Abe et al., 2012). The original rice cultivar of both datasets was Hitomebore. The mutant bulks for dataset 1 and dataset 2 were Hit1917-pl and Hit1917-sd, respectively. These datasets can be downloaded as DRR004451 (Hitomebore), DRR001785 (Hit1917-pl), and DRR001787 (Hit1917-sd). MutMap v2.3.2 was run with the option “-n 20” as both mutant bulks contain 20 lines. The other parameters of MutMap v2.3.2 were set as default. For both datasets, “IRGSP-1.0” was used as the reference genome. QTL-seq was run for the two datasets, dataset 3 and dataset 4, used in the previous study (Takagi et al., 2013). Dataset 3 was obtained from recombinant inbred lines (RILs) derived from a cross between Hitomebore and Nortai. Dataset 4 was obtained from F2 progeny derived from a cross between Hitomebore and WRC57. We used a rice cultivar Hitomebore as a parental sequence for both datasets. These datasets can be downloaded as DRR004451 (Hitomebore), DRR003237 and DRR003238 (RILs derived from F1 between Hitomebore and Nortai), and DRR003341 and DRR003342 (F2 progeny derived from F1 between Hitomebore and WRC57). For dataset 3, we ran QTL-seq v2.2.2 with the options “-n1 20 -n2 20 -F 6” because both bulks contain 20 F6 RILs. For dataset 4, we ran QTL-seq v2.2.2 with the option “-n1 50 -n2 50 -F 2” as both bulks contain 50 F2 progeny. The remaining parameters of QTL-seq v2.2.2 were set to their default values. For both datasets, “IRGSP-1.0” was used as the reference genome.

Pipeline workflow and performance of MutMap and QTL-seq.

(A) The pipeline workflow of MutMap. (B) The pipeline workflow of QTL-seq. (C) Speed comparison between the new (v2.3.2) and old (v1.4.5) pipelines of MutMap. (D) Speed comparison between the new (v2.2.2) and old (v1.4.5) pipelines of QTL-seq. The values are mean ± SD (n = 3).

Results and Conclusions

The new MutMap and QTL-seq pipelines are approximately 5–8 times faster than the previous pipelines. The significantly reduced processing time of the updated pipelines was accomplished by utilizing more applications with parallel processing (Trimmomatic, SAMtools, and BCFtools) for different steps including SNP calling and by omitting the previously implemented creation of a consensus FASTA file (Fig. 1). The ability of the updated pipeline to use a wider range of input file formats reduced the time required for file-management and data handling and makes the software easier to use. Further time-savings were accomplished with the new pipeline by removing user interactions that were required in the previous version. Although the number of SNPs plotted was slightly different, the results of the old version and the new version were similar or had slightly better confidence index values (Fig. S1).

Figure 1

Pipeline workflow and performance of MutMap and QTL-seq.

The simulation-based statistical test was adopted as the default because it allows addressing substantial heterogeneity in read depth among SNPs without any assumptions of statistical distributions of SNP-indices. We also implemented multiple testing correction following the parameters in the previous research (Huang et al., 2020). However, the method described in Huang et al. (2020) requires biological information such as number of chromosomes, genome size, and total centimorgan, which are not available in the majority of organisms, hence severely restricting the applicability. As stated by Li & Xu (2021), the role of bulked segregant analysis is to map the target QTLs as a primary test, regardless of the statistical threshold criteria. Currently, these new pipelines can be installed through bioconda with all dependencies. The new pipelines of MutMap and QTL-seq have improved performance and are more user-friendly to install and run, making them very useful for the purpose of genetics studies.

Comparison of the results between the new and old pipelines for MutMap and QTL-seq

(A) MutMap plot of Hit1917-pl from MutMap v1.4.5. (B) MutMap plot of Hit1917-pl from MutMap v2.3.2. (C) MutMap plot of Hit1917-sd from MutMap v1.4.5. (D) MutMap plot of Hit1917-sd from MutMap v2.3.2. (E) QTL-seq plot of RILs from QTL-seq v1.4.5. (F) QTL-seq plot of RILs from QTL-seq v2.2.2. (G) QTL-seq plot of F2 progeny from QTL-seq v1.4.5. (H) QTL-seq plot of F2 progeny from QTL-seq v2.2.2. Click here for additional data file.

14 in total

1. Genome sequencing reveals agronomically important loci in rice using MutMap.

Authors: Akira Abe; Shunichi Kosugi; Kentaro Yoshida; Satoshi Natsume; Hiroki Takagi; Hiroyuki Kanzaki; Hideo Matsumura; Kakoto Yoshida; Chikako Mitsuoka; Muluneh Tamiru; Hideki Innan; Liliana Cano; Sophien Kamoun; Ryohei Terauchi
Journal: Nat Biotechnol Date: 2012-01-22 Impact factor: 54.908

2. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

Authors: Pablo Cingolani; Adrian Platts; Le Lily Wang; Melissa Coon; Tung Nguyen; Luan Wang; Susan J Land; Xiangyi Lu; Douglas M Ruden
Journal: Fly (Austin) Date: 2012 Apr-Jun Impact factor: 2.160

3. Isolation of molecular markers from specific chromosomal intervals using DNA pools from existing mapping populations.

Authors: J J Giovannoni; R A Wing; M W Ganal; S D Tanksley
Journal: Nucleic Acids Res Date: 1991-12-11 Impact factor: 16.971

4. QTLseqr: An R Package for Bulk Segregant Analysis with Next-Generation Sequencing.

Authors: Ben N Mansfeld; Rebecca Grumet
Journal: Plant Genome Date: 2018-07 Impact factor: 4.089

5. QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations.

Authors: Hiroki Takagi; Akira Abe; Kentaro Yoshida; Shunichi Kosugi; Satoshi Natsume; Chikako Mitsuoka; Aiko Uemura; Hiroe Utsushi; Muluneh Tamiru; Shohei Takuno; Hideki Innan; Liliana M Cano; Sophien Kamoun; Ryohei Terauchi
Journal: Plant J Date: 2013-02-18 Impact factor: 6.417

6. BRM: a statistical method for QTL mapping based on bulked segregant analysis by deep sequencing.

Authors: Likun Huang; Weiqi Tang; Suhong Bu; Weiren Wu
Journal: Bioinformatics Date: 2020-04-01 Impact factor: 6.937

Review 7. Bulk segregation analysis in the NGS era: a review of its teenage years.

Authors: Zhiqiang Li; Yuhui Xu
Journal: Plant J Date: 2022-02-14 Impact factor: 6.417

8. Coval: improving alignment quality and variant calling accuracy for next-generation sequencing data.

Authors: Shunichi Kosugi; Satoshi Natsume; Kentaro Yoshida; Daniel MacLean; Liliana Cano; Sophien Kamoun; Ryohei Terauchi
Journal: PLoS One Date: 2013-10-08 Impact factor: 3.240

9. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

10. Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors: Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal: Bioinformatics Date: 2014-04-01 Impact factor: 6.937

1 in total

1. Chromosome-level Thlaspi arvense genome provides new tools for translational research and for a newly domesticated cash cover crop of the cooler climates.

Authors: Adam Nunn; Isaac Rodríguez-Arévalo; Zenith Tandukar; Katherine Frels; Adrián Contreras-Garrido; Pablo Carbonell-Bejerano; Panpan Zhang; Daniela Ramos Cruz; Katharina Jandrasits; Christa Lanz; Anthony Brusa; Marie Mirouze; Kevin Dorn; David W Galbraith; Brice A Jarvis; John C Sedbrook; Donald L Wyse; Christian Otto; David Langenberger; Peter F Stadler; Detlef Weigel; M David Marks; James A Anderson; Claude Becker; Ratan Chopra
Journal: Plant Biotechnol J Date: 2022-02-06 Impact factor: 13.263

1 in total