| Literature DB >> 32265990 |
Fengyu He1,2,3, Shuangcheng Ding1,2,3, Hongwei Wang1,2,3, Feng Qin4.
Abstract
Genome-wide association study (GWAS), exploring the historical and evolutionary recombinations at the population level, is a major method adopted to identify quantitative trait loci (QTL) for complex traits. However, to summarize GWAS results, gene structure, and linkage disequilibrium (LD) in a single view, multiple tools are required. It is tedious to generate these three results and manually put them together; moreover, it may eventually lead to inaccuracies. On the other hand, genotype markers are usually detected by DNA- and/or RNA-Seq. For GWAS analysis based on RNA-Seq, markers from DNA-Seq provide more genetic information when displaying LD. The currently released software package does not have this function for an integrated analysis of LD, using genotypic markers different from that in association analysis. Here, we present an R package, IntAssoPlot, which provides an integrated visual display of GWAS results, along with LD and gene structure information, in a publication-ready format. The main panel of an IntAssoPlot plot has a connecting line linking the genome-wide association P-values on the -log10 scale with the gene structure and LD matrix. Importantly, IntAssoPlot is designed to plot GWAS results with LD calculated from genotypes different from those in GWAS analysis. IntAssoPlot provides a powerful visualization tool to gain an integrated insight into GWAS results. The functions provided by IntAssoPlot increase the efficiency by revealing GWAS results in a publication-ready format. Inspection of the output image can provide important biological information, including the loci that passed the genome-wide significance threshold, genes located at or near the significant loci, and the extent of LD within the selected region.Entities:
Keywords: LD; gene structure; genome-wide association study; integration; linking line; visualization
Year: 2020 PMID: 32265990 PMCID: PMC7100855 DOI: 10.3389/fgene.2020.00260
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Summary of functions implemented in IntAssoPlot.
| Function | Layer | Shared modules | Differed modules |
| IntGenicPlot | Upper layer: a scatter plot displaying -log10( | Scatter plot of -log10( | Gene structure |
| IntRegionalPlot | Arrowed line reflecting gene length and strand |
Comparison of features provided in GWAS visualization tools.
| Features in a single view | LocusZoom | cgmisc | Ldlink | Assocplots | IntAssoPlot |
| Scatter plot of marker-trait association | ✓ | ✓ | ✓ | ✓ | ✓ |
| Lead SNP LD heatmap | ✓ | ✓ | ✓ | ||
| LD matrix heatmap | ✓ | ✓ | |||
| Linking line from association to LD matrix | ✓ | ||||
| LD matrix from another genotype dataset | ✓ | ||||
| Gene structure | ✓ | ✓ | |||
| Integrated display of association, gene structure, and LD matrix | ✓ |
FIGURE 1Regional marker-trait associations plot (A: regional plot; B: LD color scale; C: LD matrix from another genotypes; and D: zoom-out) and a single gene-based marker-trait associations plot (E: single genic plot; F: adjusting flanking sequence; G: highlighting markers; and H: linking markers), using previously published data (Wang et al., 2016). For (A–D), the upper layer represents the marker-trait associations and LD of the most significant loci with the others; the middle layer shows the filtered gene models indicated by arrows (MaizeGDB release 5b.60); the bottom layer shows LD matrix. In (E–H), the main transcript structure of ZmVPP1 and the LD matrix are shown. Genotypes used to generate these plots were derived from Chia et al. (2012), Li et al. (2013), and Wang et al. (2016).
R commands used to generate Figure 1.
| Scale | Operation | Command | Output |
| Regional plot | Lead SNP LD + triangle LD + directioned gene length + linking line | IntRegionalPlot(chr = 9,left = 94178074-200000,right = 94178074 + 200000,gtf = gtf,association = association,hapmap = hapmap_am368,hapmap_ld = hapmap_am368, threshold = 5, leadsnp_size = 2) | |
| LD color scale | IntRegionalPlot(chr = 9,left = 94178074-200000,right = 94178074 + 200000,gtf = gtf, association = association,hapmap = hapmap_ am368,hapmap_ld = hapmap_am368, threshold = 5,leadsnp_size = 2,color02 = “gray81”,color04 = “gray61”,color06 = “gray41”,color08 = “gray11”,color10 = “gray1”) | ||
| Triangle LD from another set of genotype data | IntRegionalPlot(chr = 9,left = 94178074-200000,right = 94178074 + 200000,gtf = gtf, association = association,hapmap = hapmap _am368,hapmap_ld = hapmap2,threshold = 5, leadsnp_size = 2) | ||
| Zoom-in or -out | IntRegionalPlot(chr = 9,left = 94178074-2000,right = 94178074 + 6000,gtf = gtf, association = association,hapmap = hapmap_am368,hapmap_ld = hapmap_ am368,threshold = 5,leadsnp_size = 2) | ||
| Single genic plot | Triangle LD + gene structure + linking line | IntGenicPlot(‘GRMZM2G170927_T01’, gtf,association = zmvpp1_association, hapmap = zmvpp1_hapmap,leadsnp = FALSE,triangleLD = TRUE,threshold = 8) | |
| Flanking sequence | IntGenicPlot(‘GRMZM2G170927_T01’,gtf, association = zmvpp1_association,hapmap = zmvpp1_hapmap,leadsnp = FALSE, triangleLD = TRUE, threshold = 8,up = 500,down = 600) | ||
| Highlight marker by shape and/or color | IntGenicPlot(‘GRMZM2G170927_T01’,gtf, association = zmvpp1_association,hapmap = zmvpp1_hapmap,leadsnp = FALSE, triangleLD = TRUE,threshold = 8,up = 500, down = 600,marker2highlight = marker2highlight) | ||
| Selected markers linking from association to LD | IntGenicPlot(‘GRMZM2G170927_T01’,gtf, association = zmvpp1_association,hapmap = zmvpp1_hapmap,leadsnp = FALSE,triangleLD = TRUE,threshold = 8,up = 500,down = 600, marker2highlight = marker2highlight,link2gene = marker2link,link2LD = marker2link) |