| Literature DB >> 25610492 |
Graham J Etherington1,2, Jacqueline Monaghan1, Cyril Zipfel1, Dan MacLean1.
Abstract
BACKGROUND: Analysis of mutants isolated from forward-genetic screens has revealed key components of several plant signalling pathways. Mapping mutations by position, either using classical methods or whole genome high-throughput sequencing (HTS), largely relies on the analysis of genome-wide polymorphisms in F2 recombinant populations. Combining bulk segregant analysis with HTS has accelerated the identification of causative mutations and has been widely adopted in many research programmes. A major advantage of HTS is the ability to perform bulk segregant analysis after back-crossing to the parental line rather than out-crossing to a polymorphic ecotype, which reduces genetic complexity and avoids issues with phenotype penetrance in different ecotypes. Plotting the positions of homozygous polymorphisms in a mutant genome identifies areas of low recombination and is an effective way to detect molecular linkage to a phenotype of interest.Entities:
Keywords: Forward-genetics; High-throughput sequencing; Mapping; Single nucleotide polymorphisms; Web application
Year: 2014 PMID: 25610492 PMCID: PMC4301057 DOI: 10.1186/s13007-014-0041-7
Source DB: PubMed Journal: Plant Methods ISSN: 1746-4811 Impact factor: 4.993
Figure 1Screen-shot of the CandiSNP web application. CandiSNP is openly accessible online at http://candisnp.tsl.ac.uk. The application is laid out so users can make their way through the application in numbered steps. Users choose which genome they would like to use for comparison (the program currently supports Arabidopsis, rice, tomato, grape, maize and soybean genomes). The option of filtering SNPs concentrated around centromeres is also provided. Users then upload their SNP data file, indicate their preferred allele frequency cut-off, and choose from a number of different palettes for SNP visualization.
Figure 2Bioinformatics pipeline for sequence analysis. Pipeline indicating the preparatory steps required by the user (A) prior to running the CandiSNP web application (B).
Figure 3Pipeline for bulking segregants and identification of unique SNPs. (A) The recessive bak1-5 mob1 and bak1-5 mob2 mutants were back-crossed to the parent bak1-5, allowed to self-fertilize in the F1, and phenotypically scored in the F2 for the mob phenotype. Positive segregants were bulk harvested and genomic DNA was prepared and sequenced using the Illumina HiSeq platform. For comparison, the bak1-5 genome was also sequenced. A similar genetics pipeline could be employed for dominant mutants, but material would need to be bulked from segregants that were phenotypically verified as homozygous in the F3 generation. (B) A three-way comparison between the bak1-5, bak1-5 mob1, and bak1-5 mob2 genomes identified the total number of unique SNPs in each genome.
Identification of unique and candidate SNPs in the parental and mutant genomes
|
|
|
| |
|---|---|---|---|
| Total SNPs compared to Col-0 TAIR10 | 2639 | 4188 | 3581 |
| Unique SNPs compared to the parent | 2639 | 1633 | 1006 |
| Unique SNPs, AF >75% | 785 | 88 | 143 |
| Unique SNPs, AF >75%, annotated coding | 240 | 16 | 41 |
| Unique SNPs, AF >75%, annotated coding, non-synon | 168 | 9a | 31b |
To identify SNPs unique to each genome, the parental and mutant genomes were compared and filtered. Unique SNPs in bak1-5 refer to those that are not found in the Col-0 TAIR10 genome. For the bak1-5 mob1 and bak1-5 mob2 datasets, SNPs shared between any of the three genomes (bak1-5, bak1-5 mob1 and bak1-5 mob2; Figure 3B) were removed, resulting in SNPs uniquely found in each of those genomes. Filtering for SNPs with an allele frequency higher than 75% that cause non-synonymous (‘non-synon’) changes in annotated coding regions resulted in a list of candidate causative mutations.
aCandidate causative SNPs for bak1-5 mob1 are listed in Table 2.
bCandidate causative SNPs for bak1-5 mob2 are listed in Table 3.
Candidate causative SNPs in -
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| 1 | 11892068 | C/T | 100 | At1g32830 | Transposable element | n.a. | Absent |
| 1 | 11892070 | G/T | 100 | At1g32830 | Transposable element | n.a. | Homa |
| 1 | 11892252 | T/G | 100 | At1g32830 | Transposable element | n.a. | Homa |
| 1 | 16516501 | T/C | 83.3 | At1g43745 | Transposable element | n.a. | Absent |
| 1 | 16525522 | T/C | 77.8 | At1g43755 | Transposable element | n.a. | Homa |
| 1 | 24243231 | G/A | 80.9 | At1g65270 | Unknown protein | G > S | Hom |
| 5 | 26457834 | G/A | 85.0 | At5g66210 | CPK28 | A > V | Homb |
| 5 | 26458077 | G/A | 78.5 | At5g66210 | CPK28 | S > L | Homb |
| 5 | 26474069 | G/A | 76.5 | At5g66270 | Zn-finger family protein | P > L | Hom |
Unique SNPs in annotated coding regions with allele frequencies (AF) over 75% identified by CandiSNP for bak1-5 mob1, listing the Chromosome number (Chr), position, reference base (Ref), sequenced alternate base (Alt), locus number (AGI), gene identification (Gene ID), amino acid change (AA change; ‘n.a.’ is not applicable). All SNPs were confirmed in at least three independent back-crossed lines (F3 generation) by Sanger sequencing compared to bak1-5. SNPs that were homozygous (Hom) or not present (Absent) are listed.
aThese SNPs were also identified in bak1-5 by Sanger sequencing (however, not by Illumina sequencing) and are therefore not unique to bak1-5 mob1.
bThese SNPs are the causative mutations for bak1-5 mob1 [13].
Candidate causative SNPs in -
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| 1 | 7446564 | C/T | 76.2 | At1g21270 | WAK2 | P > L | Seg |
| 1 | 11892984 | C/T | 100 | At1g32830 | Transposable element | n.a. | Homa,b |
| 1 | 16513961 | T/G | 77.1 | At1g43740 | Transposable element | n.a. | Homa,b |
| 1 | 17757465 | C/T | 76.0 | At1g48090 | Calcium-dependent lipid-binding protein | D > N | Seg |
| 1 | 18192647 | C/T | 76.0 | At1g49190 | ARR19 | R > W | Seg |
| 1 | 22178447 | C/T | 82.6 | At1g60140 | TPS10 | D > N | Hom |
| 2 | 2568811 | C/T | 83.3 | At2g06470 | Transposable element | n.a. | Homa |
| 2 | 5277241 | T/G | 100 | At2g12850 | Transposable element | n.a. | Not tested |
| 4 | 2362567 | C/A | 100 | At4g04655 | Transposable element | n.a. | Sega |
| 5 | 5569896 | G/A | 94.1 | At5g16930 | AAA-type ATPase family protein | W > stop | Seg |
| 5 | 14285189 | G/A | 81.3 | At5g36260 | Eukaryotic aspartyl protease family protein | S > L | Absent |
| 5 | 14579245 | G/A | 75.0 | At5g36935 | Transposable element | n.a. | Not tested |
| 5 | 15751875 | G/A | 77.2 | At5g39350 | Tetratricopeptide repeat-like superfamily protein | R > K | Seg |
| 5 | 17503318 | G/A | 82.7 | At5g43560 | TRAF-like superfamily protein | E > K | Seg |
| 5 | 17597830 | G/A | 80.0 | At5g43800 | Transposable element | n.a. | Not tested |
| 5 | 17820568 | G/A | 93.3 | At5g44240 | ALA2 | A > T | Seg |
| 5 | 18251689 | G/A | 83.3 | At5g45140 | Nuclear RNAP2 | P > S | Seg |
| 5 | 18261108 | G/A | 75.0 | At5g45150 | RTL3 | D > N | Seg |
| 5 | 18399206 | G/A | 80.9 | At5g45400 | RPA70C | V > M | Seg |
| 5 | 21859555 | G/A | 90.0 | At5g53840 | F-box/RNI-like/FBD-like domains-containing protein | S > F | Seg |
| 5 | 21939106 | G/A | 80.9 | At5g54062 | Unknown protein | E > K | Seg |
| 5 | 22002355 | G/A | 88.9 | At5g54203 | Transposable element | n.a. | Absent |
| 5 | 22066915 | G/A | 79.3 | At5g54340 | C2H2 and C2HC zinc-finger superfamily protein | V > I | Seg |
| 5 | 22430866 | G/A | 90.0 | At5g55310 | TOP1β | A > V | Seg |
| 5 | 22565056 | G/A | 76.5 | At5g55750 | Hydroxyproline-rich glycoprotein family protein | P > S | Seg |
| 5 | 26458017 | C/T | 93.1 | At5g66210 | CPK28 | W > stop | Homc |
| 5 | 26560691 | C/T | 95.8 | At5g66550 | Maf-like protein | G > R | Hom |
| 5 | 26626055 | C/T | 78.9 | At5g66690 | UGT72E2 | P > S | Hom |
| 5 | 26710709 | C/T | 75.0 | At5g66880 | SNRK2.3 | P > S | Hom |
| 5 | 26716839 | C/T | 80.9 | At5g66900 | CC-NB-LRR family protein | D > N | Hom |
| 5 | 26935248 | C/T | 75.0 | At5g67500 | VDAC2 | T > I | Hom |
Unique SNPs in annotated coding regions with allele frequencies (AF) over 75% identified by CandiSNP for bak1-5 mob2, listing the Chromosome number (Chr), position, reference base (Ref), sequenced alternate base (Alt), locus number (AGI), gene identification (Gene ID), amino acid change (AA change; ‘n.a.’ is not applicable). All SNPs were confirmed in at least three independent back-crossed lines (F3 generation) by Sanger sequencing compared to bak1-5. SNPs that were homozygous (Hom), segregating (Seg), not identified (Absent), or not tested are listed.
aThese SNPs were also identified in bak1-5 by Sanger sequencing (however, not by Illumina sequencing) and are therefore not unique to bak1-5 mob2.
bThese are single base deletion mutations.
cThis SNP is the causative mutation for bak1-5 mob2 [13].
Figure 4Chromosome 5 SNP density plots for - and -. All SNPs with allele frequencies >75% are plotted in grey, while candidate causative SNPs (defined as those causing non-synonymous changes in gene-coding regions) are plotted in red. The position of CPK28/At5g66210 is indicated.