| Literature DB >> 32143574 |
Jianbo Zhang1, Dilip R Panthee2.
Abstract
BACKGROUND: Bulked segregant analysis (BSA), coupled with next-generation sequencing, allows the rapid identification of both qualitative and quantitative trait loci (QTL), and this technique is referred to as BSA-Seq here. The current SNP index method and G-statistic method for BSA-Seq data analysis require relatively high sequencing coverage to detect significant single nucleotide polymorphism (SNP)-trait associations, which leads to high sequencing cost.Entities:
Keywords: Bulked segregant analysis, BSA-Seq; PyBSASeq; QTL; SNP-trait association
Mesh:
Year: 2020 PMID: 32143574 PMCID: PMC7060572 DOI: 10.1186/s12859-020-3435-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Chromosomal distribution of SNPs
| Chromosome | sSNP | totalSNP | sSNP/totalSNP |
|---|---|---|---|
| 1 | 52,093 | 160,780 | 0.324 |
| 2 | 48,912 | 125,059 | 0.391 |
| 3 | 3502 | 45,927 | 0.076 |
| 4 | 3743 | 62,317 | 0.060 |
| 5 | 15,482 | 102,474 | 0.151 |
| 6 | 7653 | 159,857 | 0.048 |
| 7 | 12,679 | 128,658 | 0.099 |
| 8 | 54,372 | 132,646 | 0.410 |
| 9 | 1709 | 57,971 | 0.029 |
| 10 | 28,711 | 98,646 | 0.291 |
| 11 | 5235 | 180,319 | 0.029 |
| 12 | 6260 | 48,430 | 0.129 |
| Genome-wide | 240,351 | 1,303,084 | 0.184 |
Fig. 1Genomic distributions of SNPs and sSNPs/totalSNP ratios. The red horizontal lines are the thresholds obtained via resampling. a The sSNPs (black) and total SNPs (blue). b The ratio of sSNPs to total SNPs
Fig. 2Genomic distribution of sSNP/totalSNP ratios at different sequencing coverage levels. The red horizontal lines are the thresholds obtained via resampling. a 40% of the original sequence reads. b 30% of the original sequence reads. c 20% of the original sequence reads
Fig. 3Genomic distribution of ∆(SNP index) at different sequencing coverage levels. The red curves indicate 99% confidence intervals obtained via simulation. a The original sequence reads. b 40% of the original sequence reads. c 30% of the original sequence reads. d 20% of the original sequence reads
Fig. 4Genomic distribution of G-statistic at different sequencing coverage levels. The red curves are the G-statistic thresholds obtained via simulation. a The original sequence reads. b 40% of the original sequence reads. c 30% of the original sequence reads. d 20% of the original sequence reads
The first five rows of the GATK4 output file
| CHROMa | POSb | REFc | ALTd | 834927.ADe | 834927.GQf | 834931.ADe | 834931.GQf |
|---|---|---|---|---|---|---|---|
| 1 | 29,759 | C | G | 0,2 | 6 | 0,2 | 6 |
| 1 | 31,071 | A | G | 25,39 | 99 | 33,29 | 99 |
| 1 | 31,478 | C | T | 27,38 | 99 | 48,32 | 99 |
| 1 | 33,667 | A | G | 21,46 | 99 | 39,32 | 99 |
| 1 | 34,057 | C | T | 29,37 | 99 | 32,31 | 99 |
aThe chromosome on which the SNP is located
bThe position of the SNP on the chromosome
cThe base sequence of the SNP that is the same as the one from the reference genome
dThe base sequence that is different from REF
eThe allele depths (AD) of the SNP in the first bulk (ID: 834927) or the second bulk (ID: 834931). This column contains two numbers, the first one is the REF read (ADREF) and the second is the ALT read (ADALT)
fThe genotype quality of the SNP in the first bulk (ID: 834927) or the second bulk (ID: 834931)