| Literature DB >> 29860504 |
Jonathan R Belyeu1,2, Thomas J Nicholas1,2, Brent S Pedersen1,2, Thomas A Sasani1,2, James M Havrilla1,2, Stephanie N Kravitz1,2, Megan E Conway1, Brian K Lohman1,2, Aaron R Quinlan1,2,3, Ryan M Layer1,2.
Abstract
SV-plaudit is a framework for rapidly curating structural variant (SV) predictions. For each SV, we generate an image that visualizes the coverage and alignment signals from a set of samples. Images are uploaded to our cloud framework where users assess the quality of each image using a client-side web application. Reports can then be generated as a tab-delimited file or annotated Variant Call Format (VCF) file. As a proof of principle, nine researchers collaborated for 1 hour to evaluate 1,350 SVs each. We anticipate that SV-plaudit will become a standard step in variant calling pipelines and the crowd-sourced curation of other biological results.Code available at https://github.com/jbelyeu/SV-plauditDemonstration video available at https://www.youtube.com/watch?v=ono8kHMKxDs.Entities:
Mesh:
Year: 2018 PMID: 29860504 PMCID: PMC6030999 DOI: 10.1093/gigascience/giy064
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Example samplot images of putative deletion calls that were scored as (A) unanimously GOOD, (B) unanimously BAD, and (C) ambiguous with a mix of GOOD and BAD scores with respect to the top sample (NA12878) in each plot. The black bar at the top of the figure indicates the genomic position of the predicted SV, and the following subfigures visualize the alignments and sequence coverage of each sample. Subplots report paired-end (square ends connected by a solid line, annotated as concordant and discordant paired-end reads in A) and split-read (circle ends connected by a dashed line, annotated in A) alignments by their genomic position (x-axis) and the distance between mapped ends (insert size, left y-axis). Colors indicate the type of event the alignment supports (black for deletion, red for duplication, and blue and green for inversion) and intensity indicates the concentration of alignments. The grey filled shapes report the sequence coverage distribution in the locus for each sample (right y-axis, annotated in A). The samples in each panel are a trio of father (NA12891), mother (NA12892), and daughter (NA12878).
Figure 2:(A) The distribution of the time elapsed from when an image was presented to when it was scored. (B) The distribution of curation scores. (C) The SV size distribution for all, unanimous (score 0 or 1), unambiguous (score <0.2 or >0.8), and ambiguous (score >= 0.2 and <= 0.8) variants. (D) A comparison of predictions for deletions between CNVNATOR copy number calls (y-axis), SVTYPER genotypes (color, “Ref.” is homozygous reference and “Non-ref.” is heterozygous or homozygous alternate), and curation scores (x-axis). This demonstrates a general agreement between all methods with a concentration of reference genotypes and copy number 2 (no evidence for a deletion) at curation score <0.2, and non-reference and copy number one or zero events (evidence for a deletion) at curation score >0.8. There are also false positives for CNVNATOR (copy number <2 at score = 0) and false negatives for SVTYPER (reference genotype at score = 1).
Figure 3:The SV-plaudit process. (A) Samplot generates an image for each SV from VCF considering a set of alignment (BAM or CRAM) files. (B) PlotCritic uploads the images to an Amazon S3 bucket and prepares DynamoDB tables. Users select a curation answer (“GOOD,” “BAD,” or “DE NOVO”) for each SV image. DynamoDB logs user responses and generates reports. Within a report, a curation score function can be specified by mapping answer options to values and selecting an aggregation function. Here “GOOD” and “DE NOVO” were mapped to 1, “BAD” to 0, and the mean was used. An especially useful output option is a VCF annotated with the curation scores (shown here in bold as a SVP).