| Literature DB >> 31141611 |
Zhi Zhang1, Paul P Jung1, Valentin Grouès1, Patrick May1, Carole Linster1, Enrico Glaab1.
Abstract
BACKGROUND: Quantitative trait locus (QTL) mapping using bulk segregants is an effective approach for identifying genetic variants associated with phenotypes of interest in model organisms. By exploiting next-generation sequencing technology, the QTL mapping accuracy can be improved significantly, providing a valuable means to annotate new genetic variants. However, setting up a comprehensive analysis framework for this purpose is a time-consuming and error-prone task, posing many challenges for scientists with limited experience in this domain.Entities:
Keywords: BSA; QTLs; mapping
Mesh:
Year: 2019 PMID: 31141611 PMCID: PMC6571488 DOI: 10.1093/gigascience/giz060
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1Overview of the analysis workflow for the BSA4Yeast web application. The experimental design and other parameters can be specified on the web interface. Representative results shown at the bottom include (from left to right): allele frequency, G′ statistic values, and functional annotations. SacCer3: the reference genome of Saccharomyces cerevisiae; SNV: single-nucleotide variant.
Figure 2Overview of the 3 main phases of the software workflow and the individual analysis steps they include. From left to right the 3 phases cover the following tasks: (i) pre-processing and alignment of the short reads against the yeast genome, (ii) identifying genetic markers between 2 parental lines, and (iii) performing QTL analyses and comprehensively annotating the results.
Figure 3The software framework behind BSA4Yeast. The web application uses Flask and Nginx on a virtual machine, as well as Gunicorn as a Web Server Gateway Interface. Celery is employed as an asynchronous task queue/job queue system, and Redis as a message broker. The metadata for the output files are recorded in an SQLite database.
Figure 4Example for visualization of QTL results: (A) annotation table for 2 parental lines (25 first entries); (B) QTL map; (C) chromosome 14 allele frequency; (D) chromosome 15 allele frequency.
Figure 5The analysis result of the fastq test file. (A) A partial annotation on QTL regions. (B) A summary of the G′ values for each chromosome. (C) A summary type of mutations from the 2 parental lines.
Summary statistics for the 3 input files used for the example analyses
| Format | Category | P1 (sake_strain) | P2 (white_tecc_strain) | H_bulk | L_bulk |
|---|---|---|---|---|---|
| fastq | No. of raw reads | 5,669,323 (×2) | 3,476,744 (×2) | 10,580,579 | 11,915,417 |
| bam | No. of reads aligned to reference | 8,826,270 | 5,528,628 | 10,123,086 | 11,323,447 |
| Mapping rate | 95.67% | 94.76% | 86.67% | 95.70% | |
| Average of depth coverage | 35.3 | 22.1 | 40.5 | 45.3 | |
| map | No. of markers | 47,770 | 47,770 |
Note: ”ref” refers to the reference genome for baker’s yeast (S. cerevisiae).
Input parameters for analysis of fastq files
| Parameter | Value |
|---|---|
| Input file type | Fastq |
| No. of biological replicates | 1 |
| Title for the result | myresult |
| 1- or 2-tailed bulk design | 2 |
| The P1 Fastq file | ["sake_strain_P1_1.fastq", "sake_strain_P1_2.fastq"] |
| The P2 Fastq file | ["white_tecc_strain_P2_1.fastq", "white_tecc_strain_P2_2.fastq"] |
| The Bulk H_fastq file | [["Bulk_H.fastq"]] |
| The Bulk L_fastq file | [["Bulk_L.fastq"]] |
| No. of the depth of coverage | 10 |
| Type of smoothing kernel | Tricube |
| Width of the smoothing kernel (bp) | 33,750 |
| Chromosome number to draw | All |
| Whether to draw raw G′ values or not | No |