Surajit Bhattacharya1, Hayk Barseghyan1,2,3, Emmanuèle C Délot1,2, Eric Vilain4,5. 1. Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, 20010, USA. 2. Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, 20052, USA. 3. Bionano Genomics Inc, San Diego, CA, 92121, USA. 4. Center for Genetic Medicine Research, Children's Research Institute, Children's National Hospital, Washington, DC, 20010, USA. evilain@gwu.edu. 5. Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC, 20052, USA. evilain@gwu.edu.
Abstract
BACKGROUND: Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. RESULTS: We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient's phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR's annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. CONCLUSIONS: The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.
BACKGROUND: Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. RESULTS: We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient's phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR's annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. CONCLUSIONS: The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.
Authors: Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras Journal: Bioinformatics Date: 2012-10-25 Impact factor: 6.937
Authors: Sue Richards; Nazneen Aziz; Sherri Bale; David Bick; Soma Das; Julie Gastier-Foster; Wayne W Grody; Madhuri Hegde; Elaine Lyon; Elaine Spector; Karl Voelkerding; Heidi L Rehm Journal: Genet Med Date: 2015-03-05 Impact factor: 8.822
Authors: Michal Levy-Sakin; Steven Pastor; Yulia Mostovoy; Le Li; Alden K Y Leung; Jennifer McCaffrey; Eleanor Young; Ernest T Lam; Alex R Hastie; Karen H Y Wong; Claire Y L Chung; Walfred Ma; Justin Sibert; Ramakrishnan Rajagopalan; Nana Jin; Eugene Y C Chow; Catherine Chu; Annie Poon; Chin Lin; Ahmed Naguib; Wei-Ping Wang; Han Cao; Ting-Fung Chan; Kevin Y Yip; Ming Xiao; Pui-Yan Kwok Journal: Nat Commun Date: 2019-03-04 Impact factor: 14.919
Authors: Hila Sharim; Assaf Grunwald; Tslil Gabrieli; Yael Michaeli; Sapir Margalit; Dmitry Torchinsky; Rani Arielly; Gil Nifker; Matyas Juhasz; Felix Gularek; Miguel Almalvez; Brandon Dufault; Sreetama Sen Chandra; Alexander Liu; Surajit Bhattacharya; Yi-Wen Chen; Eric Vilain; Kathryn R Wagner; Jonathan Pevsner; Jeff Reifenberger; Ernest T Lam; Alex R Hastie; Han Cao; Hayk Barseghyan; Elmar Weinhold; Yuval Ebenstein Journal: Genome Res Date: 2019-03-07 Impact factor: 9.043
Authors: Mark T W Ebbert; Tanner D Jensen; Karen Jansen-West; Jonathon P Sens; Joseph S Reddy; Perry G Ridge; John S K Kauwe; Veronique Belzil; Luc Pregent; Minerva M Carrasquillo; Dirk Keene; Eric Larson; Paul Crane; Yan W Asmann; Nilufer Ertekin-Taner; Steven G Younkin; Owen A Ross; Rosa Rademakers; Leonard Petrucelli; John D Fryer Journal: Genome Biol Date: 2019-05-20 Impact factor: 13.583
Authors: Hayk Barseghyan; Wilson Tang; Richard T Wang; Miguel Almalvez; Eva Segura; Matthew S Bramble; Allen Lipson; Emilie D Douine; Hane Lee; Emmanuèle C Délot; Stanley F Nelson; Eric Vilain Journal: Genome Med Date: 2017-10-25 Impact factor: 11.117