Literature DB >> 31222198

Duphold: scalable, depth-based annotation and curation of high-confidence structural variant calls.

Brent S Pedersen^1,2, Aaron R Quinlan^1,2,3.

Abstract

Most structural variant (SV) detection methods use clusters of discordant read-pair and split-read alignments to identify variants yet do not integrate depth of sequence coverage as an additional means to support or refute putative events. Here, we present "duphold," a new method to efficiently annotate SV calls with sequence depth information that can add (or remove) confidence to SVs that are predicted to affect copy number. Duphold indicates not only the change in depth across the event but also the presence of a rapid change in depth relative to the regions surrounding the break-points. It uses a unique algorithm that allows the run time to be nearly independent of the number of variants. This performance is important for large, jointly called projects with many samples, each of which must be evaluated at thousands of sites. We show that filtering on duphold annotations can greatly improve the specificity of SV calls. Duphold can annotate SV predictions made from both short-read and long-read sequencing datasets. It is available under the MIT license at https://github.com/brentp/duphold.

Entities: Chemical Disease Gene Species

Keywords: algorithm; genomics; structural variation

Mesh：

Year: 2019 PMID： 31222198 PMCID： PMC6479422 DOI： 10.1093/gigascience/giz040

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 6.524

Findings

Motivation

Structural variants (SVs) are a broad class of genetic variation including duplications, deletions, inversions, insertions, and translocations. SVs are known to be more difficult to detect with high accuracy than single-nucleotide and insertion-deletion variants. As such, the false-positive rate can be high. The most commonly used SV callers [1-5] use 2 types of sequence alignments to discover structural variation: paired-end reads having an unusual orientation or insert size (so-called “discordant pairs”), and split reads, where the sequence is aligned to different parts of the genome. These methods work well, and while some make use of coverage information at the break-points, they do not directly integrate the aligned sequence depth within and around an event to detect or filter SV calls. This is an important limitation because, for example, we expect a true hemizygous deletion to exhibit 50% of the sequence coverage of flanking diploid regions. On the basis of our experience in evaluating the veracity of thousands of candidate SVs with SV-plaudit [6], we noted 2 consistent patterns that distinguished confident deletion and duplication calls from apparent false-positive results. First, events without an obvious reduction or increase in coverage are much less likely to appear as “real” events to the human eye. Second, events with a rapid change in depth at (or near) the break-points are more plausible. Obvious false-positive calls lack either or both of those signals. We therefore developed duphold to enforce the observations we made through manual inspection and rapidly annotate SV calls to prioritize high-quality variant predictions.

Implementation

The duphold software uses hts-nim [7] to quickly extract coverage information from a BAM or CRAM file into an array using the methodology described in mosdepth [8]. Once in array format, it can be queried very rapidly. The depth profiles are used to quickly annotate a variant call format (VCF) [9] file of structural SVs with coverage calculated from a BAM or CRAM file of alignments. Briefly, duphold operates on each chromosome sequentially; it allocates an (int16) array whose size is the length of the current chromosome (this array uses ∼500 megabytes of memory for the 249 megabase human chromosome 1), iterates over each read in a BAM or CRAM for that chromosome, and increments any bases where an aligned read (or segment of a read) starts and decrements any bases where an aligned read (or part of a read) ends. A segment of a read is defined by the SAM [10] CIGAR operations. Once duphold has processed all segments for all alignments in a chromosome, it performs a cumulative sum that results in a per-base coverage value in the array. A 64-bit integer is used to track the actual depth, but the depth stored on the arrays is capped at the maximum value for a 16-bit integer (32,767) to prevent integer overflow. This algorithm is fully detailed in Pedersen and Quinlan [8]. Once the coverage array is filled, all remaining steps are independent of the number of alignments. Owing to the speed of in-memory array operations, subsequent depth calculations are nearly independent of the number of variants annotated in the VCF file. For each SV, duphold annotates the VCF sample format field of the variant with both the change in depth relative to the surrounding 1,000 bases on either side of the event and the fold-change in coverage in the event relative to other regions in the genome with similar guanine-cytosine (GC) content. We have evaluated different flanking distances, and 1,000 is sufficient to achieve an accurate estimate of coverage but small enough to avoid commonly unsequenced regions or gaps in coverage. To compare the coverage observed for each variant with genomic bins of similar GC content, duphold calculates the GC content in each non-overlapping, 250-base window in the chromosome along with the median depth in that window. This requires 0.55 CPU-seconds for chromosome 1. These per-window depth and GC values are used as a distribution against which to compare incoming variants. Once the depths and the GC windows are calculated, duphold uses them to annotate SV calls in VCF format. For each variant, the GC content is calculated for the genome interval defined by the variant, and the median depth inside the event is compared to the window values with a similar GC content to calculate a fold-change value (duphold bin fold-change [DHBFC]). Duphold then compares the median depth in the event to the median depth from the 1,000 bases on either side; this measure (named duphold flank fold-change [DHFFC]) captures the change in depth one would observe by eye upon visual inspection. The depth fold-change values are added to the sample's format information in the variant's VCF entry. Using the median for each metric makes the value more robust even when the reported break-points are inexact or shifted. Duphold is run on a single sample at a time, but it has options to facilitate parallelization across samples. It can run on a 25× whole-genome CRAM in <15 CPU-minutes and run-time will increase linearly with coverage.

Evaluation

Deletions

We evaluated duphold by annotating the LUMPY [1] calls and svtyper [11] genotypes we produced for the HG002 sample sequenced by the Genome in a Bottle [12] (GiaB). We compared these with the GiaB truth set of deletions for the same sample. We used the duphold annotations to filter to more stringent call sets and evaluate both precision and recall. Because duphold does not add any new variants, it can only improve precision, not recall. The duphold depth annotations enable simple filters that reduce the number of false-positive results while retaining most true-positive results (Table 1). For example, requiring that the fold-change of the deletion relative to the 1,000 bases flanking the deletion be <0.7 (DHFFC < 0.7) removes 61% [(83 – 32)/83] of the false-positive calls while retaining 99% (1,483/1,496) of the true-positive calls. The DHBFC metric measures the depth fold-change relative to bins with a similar GC content and performs similarly. Using more stringent filtering can further reduce the false-positive rate at the expense of the recall. The information used in this filtering is independent of the values reported by LUMPY and svtyper, which do not look at sequence depth metrics.

Table 1:

Evaluating the accuracy of deletion calls filtered by duphold annotations

Method	False discovery rate	False negative	False positive	True positive	Precision	Recall	F1 score
Unfiltered	0.053	276	83	1,496	0.947	0.844	0.893
DHBFC < 0.7	0.018	298	27	1,474	0.982	0.832	0.901
DHFFC < 0.7	0.021	289	32	1,483	0.979	0.837	0.902

We evaluated deletion calls from LUMPY+svtyper using truvari.py [13] with the GiaB v0.6 truth set. DHBFC: duphold bin fold-change, which compares to regions (bins) of similar GC content. DHFFC: duphold flank fold-change (with 1,000 base flank). This shows that using either DHBFC < 0.7 or DHFFC < 0.7 as a filtering criterion for deletions increases precision, removing 61% [(83 – 32)/83] of false-positive calls while retaining >99% (1,483/1,496) of true-positive calls in the case of using DHFFC.

Evaluating the accuracy of deletion calls filtered by duphold annotations We evaluated deletion calls from LUMPY+svtyper using truvari.py [13] with the GiaB v0.6 truth set. DHBFC: duphold bin fold-change, which compares to regions (bins) of similar GC content. DHFFC: duphold flank fold-change (with 1,000 base flank). This shows that using either DHBFC < 0.7 or DHFFC < 0.7 as a filtering criterion for deletions increases precision, removing 61% [(83 – 32)/83] of false-positive calls while retaining >99% (1,483/1,496) of true-positive calls in the case of using DHFFC. We examined each of the false-positive calls that remained after duphold filtering. These included a mixture of complex regions that had a loss of coverage, and some that looked as if they could be real variants, but with minimal alignment support. We also visually inspected each of the 13 (i.e., 1,496 – 1,483) true-positive results that duphold marked as low confidence owing to a flank fold-change >0.7 (DHFFC > 0.7). Most of these had a minimal change in coverage that did not meet our threshold, and many looked as if they did not have strong evidence for a call. We even noted 1 variant that looked like a duplication within a deletion, resulting in a copy-neutral event. While these highlight the limitations of a purely depth-based approach, we find the >2-fold reduction in false-positive results, in concert with a retention of 99% of true-positive results, to be a convincing demonstration of duphold’s power to remove the abundant false-positive SV predictions common to most analyses.

Duplications

Because LUMPY called only a single duplication in HG002 that was not found in GiaB, we were not able to evaluate the performance of duphold on duplications using that approach. Because the GiaB SV call set does not differentiate insertion events from duplications, we first classified any GiaB insertion as a duplication if the entirety of the reported insertion sequence was mapped by bwa-mem [14] with <5 mismatches to within 3 bases (start and end) of the variant. This resulted in 805 duplications for the truth set. To evaluate the specificity and sensitivity of duphold, we had to create homozygous reference variants. Specifically, for each heterozygous (0/1) or homozygous alternate (1/1) variant, we simulated a homozygous reference variant of the same size and type (e.g., for a heterozygous duplication, we simulated a homozygous reference duplication) and inserted it into the VCF. We limited the simulated variants to the high-confidence regions provided by GiaB and then retried any variant where >10% of the reference nucleotide sequence inside the simulated event was unknown (“N”). This approach provided a reasonable set of homozygous reference variants of a similar size distribution within the high-confidence GiaB regions. We evaluated the sensitivity and specificity of duphold using both the real and simulated deletions and duplications in Figs 1 and 2. While duphold is better able to differentiate deletions from random, copy-neutral locations, it still has an area under the curve (AUC) of 0.74 for heterozygous duplications and 0.73 for homozygous duplications. The dots in the receiver operating characteristic curves show the sensitivity and specificity of duphold at a cutoff of 0.7 for deletions and 1.3 for duplications. The reduced performance on duplications relative to deletions is expected because a heterozygous deletion results in a 2-fold change in depth while a heterozygous duplication results in only a 1.5-fold change. In addition, it could be that a subset of duplications in GiaB, which was created with a combination of technologies, cannot be detected with short-read Illumina data. While the performance shown in Figs 1 and 2 reflects all event sizes, when deletions are restricted to those >1 kilobase, duphold achieves AUCs of 0.97 and 1.0 for heterozygous and homozygous alternate genotypes, respectively. At that size, the number of duplications is too low to properly evaluate, but we expect that larger events will enable duphold to more accurately evaluate the depth inside the event, and therefore further improve performance.

Figure 1:

Figure 2:

Duphold scalability. The time to annotate (or genotype) for duphold and svtyper is shown (y-axis) as a function of the number of variants tested (x-axis). While svtyper (blue) exhibits a linear increase in type with the number of variants, duphold is relatively independent of the number of variants. There is an initial cost that makes the duphold strategy less efficient for few (less than ∼10,000) variants, but it scales well to annotating thousands of variants as we expect for large cohorts.

Evaluation of duphold on duplications and deletions of any size. We annotated 805 GiaB insertion calls as duplications and simulated homozygous reference (Hom. Ref.) events of similar size in order to evaluate the specificity and sensitivity of duphold. We show the distribution of DHFFC (duphold flank fold-change) for each genotype (homozygous reference [0/0] is blue, heterozygous [Het.] [0/1] is orange, and homozygous alternate [Hom. Alt.] [1/1] is green), for both duplications (A) and deletions (C). We then used those distributions to create receiver operating characteristic curves (B and D) and calculate AUCs that indicate the ability of duphold to differentiate 0/0 from 0/1 (orange) and 1/1 (green). The dots on the curves indicate a cutoff of 1.3 for duplications and 0.7 for deletions. Duphold scalability. The time to annotate (or genotype) for duphold and svtyper is shown (y-axis) as a function of the number of variants tested (x-axis). While svtyper (blue) exhibits a linear increase in type with the number of variants, duphold is relatively independent of the number of variants. There is an initial cost that makes the duphold strategy less efficient for few (less than ∼10,000) variants, but it scales well to annotating thousands of variants as we expect for large cohorts.

Scaling

We designed duphold with the expectation that it would be used on large datasets where both specificity and run time are critical. For this reason, we optimized it for situations in which it would be used to evaluate many thousands of variants. In an effort to measure scaling performance, we compared the times of both svtyper and duphold on subsets of the 1,000 Genomes phase 3 SVs [15] (Figs. 1 and 2). We note that we are not interested in the direct time comparison with svtyper because svtyper does more work to genotype the variants. Instead, the relevant pattern is the trajectory in order to demonstrate how well duphold scales. Whereas svtyper follows a linear increase in run time with the number of variants, duphold's performance is nearly independent of the number of variants, using either 1 or 3 threads. This performance is driven by the fact that all of the alignment data are read into efficient data structures that can be queried thousands of times per second. This strategy incurs a large initial cost to construct the data structure and therefore makes duphold less efficient for small variant sets. We have intentionally chosen to optimize for larger variant sets because this context is where efficiency is most important.

Methods

To evaluate the ability of duphold to prioritize SV calls, we used data from the GiaB project for sample HG002. We downloaded all fastqs from [16], aligned with bwa-mem [14], and marked duplicates with samblaster [17] to generate a CRAM file with ∼25× median sequence coverage. We used the GiaB SV calls and tier 1 regions from [18] as our truth set. We ran LUMPY (LUMPY, RRID:SCR_003253) [1] and svtyper [11] via smoove [19] to create and genotype SV calls. We evaluated the precision and recall before and after applying various filtering on the duphold-annotated variants using truvari [13]. Specifically, we used a modified version of truvari here: [20] to allow “.” filters to be considered as PASS. We used samplot [21] to look at individual variants that were called as true positive, false positive, and false negative. The truvari command used was as follows: truvari.py –s 300 –S 270 –b HG002_SVs_Tier1_v0.6.DEL.vcf.gz –c $lumpy_vcf –o eval-no-support –passonly –pctsim = 0 –r 20 –giabreport –f $fasta –no-ref –includebed HG002_SVs_Tier1_v0.6.bed –O 0.6 To demonstrate the utility of duphold on duplication calls, we annotated some GiaB insertion calls as duplications, using [22], and then simulated homozgyous reference calls of the same size and genomic distribution as the existing calls using [23]. To evaluate the scaling on realistic sites, we used duphold to annotate the same HG002 file, but on the 68,818 variants from the 1,000 Genomes SV calls at [24]. We limited those calls to the variants that could be genotyped by svtyper (excluding insertions). We then randomly chose 100, 1,000, 10k, 20k, 35k, and 50k variants and ran svtyper and duphold on each set. We also ran duphold with 3 threads to evaluate the benefit of parallelization. We downloaded the HG002 single-nucleotide polymorphism/insertion and deletion calls from [25].

Conclusions

Duphold enables rapid annotation of existing SV calls with sequence depth information that facilitates the distinction between high- and low-confidence deletions and duplications. Using the GiaB truth set, we have shown that we can exclude nearly 61% of false-positive SV predictions while retaining >99% of true-positive variants using a simple filter on a duphold-annotated VCF. Given the minimal additional run time of as few as 25 minutes for a 30× genome, this is a substantial improvement for the overall accuracy of SV call sets.

Availability of supporting source code and requirements

Project name: duphold Project home page: https://github.com/brentp/duphold Operating system(s): binary available for Linux (can be built on OSX and Windows) Programming language: nim Other requirements: htslib.so ≥ 1.8 License: MIT RRID:SCR_016938

Availability of supporting data

An archival copy of the code is available in the GigaScience GigaDB repository [26].

Abbreviations

AUC: area under the curve; DHBFC: duphold bin fold-change; DHFFC: duphold flank fold-change; GC: guanine-cytosine; GiaB: Genome in a Bottle; SV: structural variant; VCF: variant call format.

Competing interests

The authors declare that they have no competing interests.

Funding

B. Pedersen and A. Quinlan were supported by US National Institutes of Health National grants from the National Human Genome Research Institute (R01HG006693 and R01HG009141), the National Institute of General Medical Sciences (R01GM124355), and the National Cancer Institute (U24CA209999).

Authors' contributions

B.S.P. designed and wrote the software, performed the analyses, and co-wrote the manuscript. A.R.Q. co-wrote the manuscript. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. 12/20/2018 Reviewed Click here for additional data file. 2/14/2019 Reviewed Click here for additional data file. 1/13/2019 Reviewed Click here for additional data file. 2/24/2019 Reviewed Click here for additional data file. Click here for additional data file.

14 in total

1. Extensive sequencing of seven human genomes to characterize benchmark reference materials.

Authors: Justin M Zook; David Catoe; Jennifer McDaniel; Lindsay Vang; Noah Spies; Arend Sidow; Ziming Weng; Yuling Liu; Christopher E Mason; Noah Alexander; Elizabeth Henaff; Alexa B R McIntyre; Dhruva Chandramohan; Feng Chen; Erich Jaeger; Ali Moshrefi; Khoa Pham; William Stedman; Tiffany Liang; Michael Saghbini; Zeljko Dzakula; Alex Hastie; Han Cao; Gintaras Deikus; Eric Schadt; Robert Sebra; Ali Bashir; Rebecca M Truty; Christopher C Chang; Natali Gulbahce; Keyan Zhao; Srinka Ghosh; Fiona Hyland; Yutao Fu; Mark Chaisson; Chunlin Xiao; Jonathan Trow; Stephen T Sherry; Alexander W Zaranek; Madeleine Ball; Jason Bobe; Preston Estep; George M Church; Patrick Marks; Sofia Kyriazopoulou-Panagiotopoulou; Grace X Y Zheng; Michael Schnall-Levin; Heather S Ordonez; Patrice A Mudivarti; Kristina Giorda; Ying Sheng; Karoline Bjarnesdatter Rypdal; Marc Salit
Journal: Sci Data Date: 2016-06-07 Impact factor: 6.444

2. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

3. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation.

Authors: Ken Chen; John W Wallis; Michael D McLellan; David E Larson; Joelle M Kalicki; Craig S Pohl; Sean D McGrath; Michael C Wendl; Qunyuan Zhang; Devin P Locke; Xiaoqi Shi; Robert S Fulton; Timothy J Ley; Richard K Wilson; Li Ding; Elaine R Mardis
Journal: Nat Methods Date: 2009-08-09 Impact factor: 28.547

4. The variant call format and VCFtools.

Authors: Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal: Bioinformatics Date: 2011-06-07 Impact factor: 6.937

5. An integrated map of structural variation in 2,504 human genomes.

Authors: Peter H Sudmant; Tobias Rausch; Eugene J Gardner; Robert E Handsaker; Alexej Abyzov; John Huddleston; Yan Zhang; Kai Ye; Goo Jun; Markus Hsi-Yang Fritz; Miriam K Konkel; Ankit Malhotra; Adrian M Stütz; Xinghua Shi; Francesco Paolo Casale; Jieming Chen; Fereydoun Hormozdiari; Gargi Dayama; Ken Chen; Maika Malig; Mark J P Chaisson; Klaudia Walter; Sascha Meiers; Seva Kashin; Erik Garrison; Adam Auton; Hugo Y K Lam; Xinmeng Jasmine Mu; Can Alkan; Danny Antaki; Taejeong Bae; Eliza Cerveira; Peter Chines; Zechen Chong; Laura Clarke; Elif Dal; Li Ding; Sarah Emery; Xian Fan; Madhusudan Gujral; Fatma Kahveci; Jeffrey M Kidd; Yu Kong; Eric-Wubbo Lameijer; Shane McCarthy; Paul Flicek; Richard A Gibbs; Gabor Marth; Christopher E Mason; Androniki Menelaou; Donna M Muzny; Bradley J Nelson; Amina Noor; Nicholas F Parrish; Matthew Pendleton; Andrew Quitadamo; Benjamin Raeder; Eric E Schadt; Mallory Romanovitch; Andreas Schlattl; Robert Sebra; Andrey A Shabalin; Andreas Untergasser; Jerilyn A Walker; Min Wang; Fuli Yu; Chengsheng Zhang; Jing Zhang; Xiangqun Zheng-Bradley; Wanding Zhou; Thomas Zichner; Jonathan Sebat; Mark A Batzer; Steven A McCarroll; Ryan E Mills; Mark B Gerstein; Ali Bashir; Oliver Stegle; Scott E Devine; Charles Lee; Evan E Eichler; Jan O Korbel
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

6. Wham: Identifying Structural Variants of Biological Consequence.

Authors: Zev N Kronenberg; Edward J Osborne; Kelsey R Cone; Brett J Kennedy; Eric T Domyan; Michael D Shapiro; Nels C Elde; Mark Yandell
Journal: PLoS Comput Biol Date: 2015-12-01 Impact factor: 4.475

7. DELLY: structural variant discovery by integrated paired-end and split-read analysis.

Authors: Tobias Rausch; Thomas Zichner; Andreas Schlattl; Adrian M Stütz; Vladimir Benes; Jan O Korbel
Journal: Bioinformatics Date: 2012-09-15 Impact factor: 6.937

8. SpeedSeq: ultra-fast personal genome analysis and interpretation.

Authors: Colby Chiang; Ryan M Layer; Gregory G Faust; Michael R Lindberg; David B Rose; Erik P Garrison; Gabor T Marth; Aaron R Quinlan; Ira M Hall
Journal: Nat Methods Date: 2015-08-10 Impact factor: 28.547

9. LUMPY: a probabilistic framework for structural variant discovery.

Authors: Ryan M Layer; Colby Chiang; Aaron R Quinlan; Ira M Hall
Journal: Genome Biol Date: 2014-06-26 Impact factor: 13.583

10. SV-plaudit: A cloud-based framework for manually curating thousands of structural variants.

Authors: Jonathan R Belyeu; Thomas J Nicholas; Brent S Pedersen; Thomas A Sasani; James M Havrilla; Stephanie N Kravitz; Megan E Conway; Brian K Lohman; Aaron R Quinlan; Ryan M Layer
Journal: Gigascience Date: 2018-07-01 Impact factor: 6.524

15 in total

1. Dysgu: efficient structural variant calling using short or long reads.

Authors: Kez Cleal; Duncan M Baird
Journal: Nucleic Acids Res Date: 2022-05-20 Impact factor: 19.160

2. Parallel Genetic Origin of Foot Feathering in Birds.

Authors: Chiara Bortoluzzi; Hendrik-Jan Megens; Mirte Bosse; Martijn F L Derks; Bert Dibbits; Kimberly Laport; Steffen Weigend; Martien A M Groenen; Richard P M A Crooijmans
Journal: Mol Biol Evol Date: 2020-09-01 Impact factor: 16.240

3. De novo structural mutation rates and gamete-of-origin biases revealed through genome sequencing of 2,396 families.

Authors: Jonathan R Belyeu; Harrison Brand; Harold Wang; Xuefang Zhao; Brent S Pedersen; Julie Feusier; Meenal Gupta; Thomas J Nicholas; Joseph Brown; Lisa Baird; Bernie Devlin; Stephan J Sanders; Lynn B Jorde; Michael E Talkowski; Aaron R Quinlan
Journal: Am J Hum Genet Date: 2021-03-05 Impact factor: 11.043

4. Identification of High-Confidence Structural Variants in Domesticated Rainbow Trout Using Whole-Genome Sequencing.

Authors: Sixin Liu; Guangtu Gao; Ryan M Layer; Gary H Thorgaard; Gregory D Wiens; Timothy D Leeds; Kyle E Martin; Yniv Palti
Journal: Front Genet Date: 2021-02-25 Impact factor: 4.599

5. Enhancing breakpoint resolution with deep segmentation model: A general refinement method for read-depth based structural variant callers.

Authors: Yao-Zhong Zhang; Seiya Imoto; Satoru Miyano; Rui Yamaguchi
Journal: PLoS Comput Biol Date: 2021-10-11 Impact factor: 4.475

6. Samplot: a platform for structural variant visual validation and automated filtering.

Authors: Jonathan R Belyeu; Murad Chowdhury; Joseph Brown; Brent S Pedersen; Michael J Cormier; Aaron R Quinlan; Ryan M Layer
Journal: Genome Biol Date: 2021-05-25 Impact factor: 13.583

7. NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data.

Authors: Michael D Linderman; Crystal Paudyal; Musab Shakeel; William Kelley; Ali Bashir; Bruce D Gelb
Journal: Gigascience Date: 2021-07-01 Impact factor: 6.524

8. Comprehensive variant calling from whole-genome sequencing identifies a complex inversion that disrupts ZFPM2 in familial congenital diaphragmatic hernia.

Authors: Thomas J Nicholas; Najla Al-Sweel; Andrew Farrell; Rong Mao; Pinar Bayrak-Toydemir; Christine E Miller; Dawn Bentley; Rachel Palmquist; Barry Moore; Edgar J Hernandez; Michael J Cormier; Eric Fredrickson; Katherine Noble; Shawn Rynearson; Carson Holt; Mary Anne Karren; Joshua L Bonkowsky; Martin Tristani-Firouzi; Mark Yandell; Gabor Marth; Aaron R Quinlan; Luca Brunelli; Reha M Toydemir; Brian J Shayota; John C Carey; Steven E Boyden; Sabrina Malone Jenkins
Journal: Mol Genet Genomic Med Date: 2022-02-04 Impact factor: 2.183

9. A 12 kb multi-allelic copy number variation encompassing a GC gene enhancer is associated with mastitis resistance in dairy cattle.

Authors: Young-Lim Lee; Haruko Takeda; Gabriel Costa Monteiro Moreira; Latifa Karim; Erik Mullaart; Wouter Coppieters; Ruth Appeltant; Roel F Veerkamp; Martien A M Groenen; Michel Georges; Mirte Bosse; Tom Druet; Aniek C Bouwman; Carole Charlier
Journal: PLoS Genet Date: 2021-07-21 Impact factor: 5.917

10. A Comprehensive Map of Mycobacterium tuberculosis Complex Regions of Difference.

Authors: D Bespiatykh; J Bespyatykh; I Mokrousov; E Shitikov
Journal: mSphere Date: 2021-07-21 Impact factor: 4.389