Literature DB >> 34586393

MuWU: Mutant-seq library analysis and annotation.

Tyll Stöcker¹, Lena Altrogge¹, Caroline Marcon², Yan Naing Win², Frank Hochholdinger², Heiko Schoof¹.

Abstract

MOTIVATION: Insertional mutagenesis allows for the creation of loss-of-function mutations on a genome-wide scale. In theory, every gene can be "knocked out" via the insertion of an additional DNA sequence. Resources of sequence-indexed mutants of plant and animal model organisms are instrumental for functional genomics studies. Such repositories significantly speed up the acquisition of interesting genotypes and allow for the validation of hypotheses regarding phenotypic consequences in reverse genetics. To create such resources, comprehensive sequencing of flanking sequence tags using protocols such as Mutant-seq requires various downstream computational tasks, and these need to be performed in an efficient and reproducible manner.
RESULTS: Here we present MuWU, an automated Mutant-seq workflow utility initially created for the identification of Mutator insertion sites of the BonnMu resource, representing a reverse genetics mutant collection for functional genetics in maize (Zea mays). MuWU functions as a fast, one-stop downstream processing pipeline of Mutant-seq reads. It takes care of all complex bioinformatic tasks, such as identifying tagged genes and differentiating between germinal and somatic mutations/insertions. Furthermore, MuWU automatically assigns insertions to the corresponding mutated seed stocks. We discuss the implementation and how parameters can easily be adapted to use MuWU for other species/transposable elements.
AVAILABILITY AND IMPLEMENTATION: MuWU is a Snakemake based workflow and freely available at https://github.com/tgstoecker/MuWU. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Year: 2021 PMID： 34586393 PMCID： PMC8756183 DOI： 10.1093/bioinformatics/btab679

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Both in forward and reverse genetic studies, sequence-indexed insertional libraries constitute essential resources for researchers around the globe. Notable examples are transgenic T-DNA insertion lines in Arabidopsis (Alonso ) or several mutant collections based on the use of transposable elements (TEs) in rice (Hirochika ) and maize (Liang ; Marcon ; McCarty ). For efficiency, when screening large mutant collections, targeted sequencing approaches such as Mutant-seq (Mu-seq; McCarty ; Supplementary Fig. S1) are used. Mutagenized families are pooled according to a grid design (Supplementary Fig. S2; McCarty ; Urbański ). Mu-seq can take this one step further by distinguishing between somatic and germinal (heritable) mutations. If row and column pools are taken from independent somatic cell lineages, heritable insertions have to appear in both axes of the grid and can thus be singled out for further analyses. To our knowledge, MuWU is the first openly available tool for the analysis of insertional libraries generated using a Mu-seq approach. MuWU efficiently combines the necessary bioinformatics analyses to enable the benefits of the Mu-seq approach: filtering of pre-existing insertion sites as well as somatic insertions. In contrast to other approaches, MuWU does not require paired-end sequencing. MuWU was created as a solution to the recurring processing of Mu insertional mutagenesis sequencing libraries in maize as part of the newly created and expanding BonnMu resource (Marcon ), the first European mutant resource of its kind. BonnMu insertions are continuously being integrated into the MaizeGDB.org genome browser (https://www.maizegdb.org; Portwood ).

2 Software description

2.1 Input files and data preparation

All software and dependencies are either installed at runtime or run inside a singularity container allowing for completely automated detection and annotation of insertion sites. The grid design of a Mu-seq experiment results in 2n samples, with the size n in our experiments usually being 24 pools of 24 F2 families each. This results in the sequencing of 576 families in 48 sequencing libraries (24 row and 24 column pools). In addition to sequencing reads, MuWU requires genome sequence FASTA file as well as a suitable annotation file. Also, a library-specific stock matrix table should be supplied indicating the position of each mutagenized maize family in the row x column grid design to infer germinal insertions (Supplementary Fig. S2). The first steps of MuWU deal with quality control and alignment of the reads to the reference genome sequence. Most notable are the removal of transposon terminal inverted repeat (TIR) sequences and adapters from both ends of the raw Mu-seq reads. A summary of all statistics is generated in one HTML report.

2.2 Annotation procedure

Insertion sites are identified with our Python tool (insertions.py) which takes advantage of the TE-specific target site duplication (TSD) at the insertion flanking region. At this first step, at least two sequencing reads at both sides of the TSD are required to support the insertion. In a second step, germinal insertion sites of one row and one column pool of the grid layout are filtered to allow assignment of a distinct mutagenized family. Thus, heritable mutations are identified by keeping insertions with shared genomic coordinates in only one row and one column sample each. Parallel implementation of both steps of the algorithm allows us to circumvent this initial computational bottleneck - with a 1 thread per sample granularity reducing runtime significantly. In MuWU’s subsequent annotation, we analyze insertion sites inside or near gene models. By identifying the specific combination of row and column pool for each germinal insertion event, we can perform a matrix lookup in the library-specific seed stock table and add stock information to the output. The final outputs are a table of germinal insertions, their read coverage and corresponding stock, as well as id, gene lengths and coordinates of the genes the insertions were assigned to, and a similar table for all insertions. The complete workflow is shown in Supplementary Figure S3.

3 Conclusion

MuWU is an efficient workflow solution reducing the bioinformatics steps of the BonnMu database resource to a one-command job finished in <1 h per library (Intel(R) Xeon(R) CPU E5-2690 v4@ 2.60 GHz; 24 cores). It implements the bioinformatics part of Mu-seq, an improved strategy to distinguish between germinal and somatic insertions, combining the advantages of an experimental layout in grid design with the analysis of TSDs (Liang ; McCarty ). Automation of library annotation is essential for reproducibility and consistency as we continue to expand our database effort with upcoming libraries. While MuWU is not the only bioinformatics tool for TE insertion detection, it is to our knowledge the only openly available tool suitable for Mu-seq data and, in contrast to other tools for TE insertion detection, works with single-end sequencing as it does not rely on the analysis of discordant read pairs. MuWU is not specific for Mutator insertions or maize but can detect any insertion event that causes TSDs. The user can configure TE-specific and adapter sequences, TSD length and read support threshold. We have implemented a secondary mode (‘GENERIC’) which does not require an experimental design that allows inferring germinal insertion events and is thus more widely applicable. As the workflow is built on Snakemake (Köster and Rahmann, 2012), the addition of further analyses to the automatic handling of Mu-seq libraries allows for modular expansion of its current state. Click here for additional data file.

9 in total

1. Genome-wide LORE1 retrotransposon mutagenesis and high-throughput insertion detection in Lotus japonicus.

Authors: Dorian Fabian Urbański; Anna Małolepszy; Jens Stougaard; Stig Uggerhøj Andersen
Journal: Plant J Date: 2011-12-01 Impact factor: 6.417

2. Steady-state transposon mutagenesis in inbred maize.

Authors: Donald R McCarty; Andrew Mark Settles; Masaharu Suzuki; Bao Cai Tan; Susan Latshaw; Tim Porch; Kevin Robin; John Baier; Wayne Avigne; Jinsheng Lai; Joachim Messing; Karen E Koch; L Curtis Hannah
Journal: Plant J Date: 2005-10 Impact factor: 6.417

3. Snakemake--a scalable bioinformatics workflow engine.

Authors: Johannes Köster; Sven Rahmann
Journal: Bioinformatics Date: 2012-08-20 Impact factor: 6.937

4. A Sequence-Indexed Mutator Insertional Library for Maize Functional Genomics Study.

Authors: Lei Liang; Ling Zhou; Yuanping Tang; Niankui Li; Teng Song; Wen Shao; Ziru Zhang; Peng Cai; Fan Feng; Yafei Ma; Dongsheng Yao; Yang Feng; Zeyang Ma; Han Zhao; Rentao Song
Journal: Plant Physiol Date: 2019-10-21 Impact factor: 8.340

5. Genome-wide insertional mutagenesis of Arabidopsis thaliana.

Authors: José M Alonso; Anna N Stepanova; Thomas J Leisse; Christopher J Kim; Huaming Chen; Paul Shinn; Denise K Stevenson; Justin Zimmerman; Pascual Barajas; Rosa Cheuk; Carmelita Gadrinab; Collen Heller; Albert Jeske; Eric Koesema; Cristina C Meyers; Holly Parker; Lance Prednis; Yasser Ansari; Nathan Choy; Hashim Deen; Michael Geralt; Nisha Hazari; Emily Hom; Meagan Karnes; Celene Mulholland; Ral Ndubaku; Ian Schmidt; Plinio Guzman; Laura Aguilar-Henonin; Markus Schmid; Detlef Weigel; David E Carter; Trudy Marchand; Eddy Risseeuw; Debra Brogden; Albana Zeko; William L Crosby; Charles C Berry; Joseph R Ecker
Journal: Science Date: 2003-08-01 Impact factor: 47.728

6. Rice mutant resources for gene discovery.

Authors: Hirohiko Hirochika; Emmanuel Guiderdoni; Gynheung An; Yue-Ie Hsing; Moo Young Eun; Chang-Deok Han; Narayana Upadhyaya; Srinivasan Ramachandran; Qifa Zhang; Andy Pereira; Venkatesan Sundaresan; Hei Leung
Journal: Plant Mol Biol Date: 2004-02 Impact factor: 4.076

7. BonnMu: A Sequence-Indexed Resource of Transposon-Induced Maize Mutations for Functional Genomics Studies.

Authors: Caroline Marcon; Lena Altrogge; Yan Naing Win; Tyll Stöcker; Jack M Gardiner; John L Portwood; Nina Opitz; Annika Kortz; Jutta A Baldauf; Charles T Hunter; Donald R McCarty; Karen E Koch; Heiko Schoof; Frank Hochholdinger
Journal: Plant Physiol Date: 2020-08-07 Impact factor: 8.340

8. MaizeGDB 2018: the maize multi-genome genetics and genomics database.

Authors: John L Portwood; Margaret R Woodhouse; Ethalinda K Cannon; Jack M Gardiner; Lisa C Harper; Mary L Schaeffer; Jesse R Walsh; Taner Z Sen; Kyoung Tak Cho; David A Schott; Bremen L Braun; Miranda Dietze; Brittney Dunfee; Christine G Elsik; Nancy Manchanda; Ed Coe; Marty Sachs; Philip Stinard; Josh Tolbert; Shane Zimmerman; Carson M Andorf
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

9. Mu-seq: sequence-based mapping and identification of transposon induced mutations.

Authors: Donald R McCarty; Sue Latshaw; Shan Wu; Masaharu Suzuki; Charles T Hunter; Wayne T Avigne; Karen E Koch
Journal: PLoS One Date: 2013-10-23 Impact factor: 3.240

9 in total