Satomi Mitsuhashi1,2, Martin C Frith3,4,5, Naomichi Matsumoto6. 1. Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan. satomits.gfd@mri.tmd.ac.jp. 2. Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University, M&D Tower 24F, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan. satomits.gfd@mri.tmd.ac.jp. 3. Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan. 4. Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan. 5. Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo, Japan. 6. Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan. naomat@yokohama-cu.ac.jp.
Abstract
BACKGROUND: Tandem repeats are highly mutable and contribute to the development of human disease by a variety of mechanisms. It is difficult to predict which tandem repeats may cause a disease. One hypothesis is that changeable tandem repeats are the source of genetic diseases, because disease-causing repeats are polymorphic in healthy individuals. However, it is not clear whether disease-causing repeats are more polymorphic than other repeats. METHODS: We performed a genome-wide survey of the millions of human tandem repeats using publicly available long read genome sequencing data from 21 humans. We measured tandem repeat copy number changes using tandem-genotypes. Length variation of known disease-associated repeats was compared to other repeat loci. RESULTS: We found that known Mendelian disease-causing or disease-associated repeats, especially CAG and 5'UTR GGC repeats, are relatively long and polymorphic in the general population. We also show that repeat lengths of two disease-causing tandem repeats, in ATXN3 and GLS, are correlated with near-by GWAS SNP genotypes. CONCLUSIONS: We provide a catalog of polymorphic tandem repeats across a variety of repeat unit lengths and sequences, from long read sequencing data. This method especially if used in genome wide association study, may indicate possible new candidates of pathogenic or biologically important tandem repeats in human genomes.
BACKGROUND: Tandem repeats are highly mutable and contribute to the development of human disease by a variety of mechanisms. It is difficult to predict which tandem repeats may cause a disease. One hypothesis is that changeable tandem repeats are the source of genetic diseases, because disease-causing repeats are polymorphic in healthy individuals. However, it is not clear whether disease-causing repeats are more polymorphic than other repeats. METHODS: We performed a genome-wide survey of the millions of human tandem repeats using publicly available long read genome sequencing data from 21 humans. We measured tandem repeat copy number changes using tandem-genotypes. Length variation of known disease-associated repeats was compared to other repeat loci. RESULTS: We found that known Mendelian disease-causing or disease-associated repeats, especially CAG and 5'UTR GGC repeats, are relatively long and polymorphic in the general population. We also show that repeat lengths of two disease-causing tandem repeats, in ATXN3 and GLS, are correlated with near-by GWAS SNP genotypes. CONCLUSIONS: We provide a catalog of polymorphic tandem repeats across a variety of repeat unit lengths and sequences, from long read sequencing data. This method especially if used in genome wide association study, may indicate possible new candidates of pathogenic or biologically important tandem repeats in human genomes.
Entities:
Keywords:
Genome-wide analysis; Nanopore long read sequencing; Tandem repeats; Triplet repeat disease
Authors: Kishwar Shafin; Trevor Pesout; Ryan Lorig-Roach; Marina Haukness; Hugh E Olsen; Colleen Bosworth; Joel Armstrong; Kristof Tigyi; Nicholas Maurer; Sergey Koren; Fritz J Sedlazeck; Tobias Marschall; Simon Mayes; Vania Costa; Justin M Zook; Kelvin J Liu; Duncan Kilburn; Melanie Sorensen; Katy M Munson; Mitchell R Vollger; Jean Monlong; Erik Garrison; Evan E Eichler; Sofie Salama; David Haussler; Richard E Green; Mark Akeson; Adam Phillippy; Karen H Miga; Paolo Carnevali; Miten Jain; Benedict Paten Journal: Nat Biotechnol Date: 2020-05-04 Impact factor: 54.908
Authors: William J Astle; Heather Elding; Tao Jiang; Dave Allen; Dace Ruklisa; Alice L Mann; Daniel Mead; Heleen Bouman; Fernando Riveros-Mckay; Myrto A Kostadima; John J Lambourne; Suthesh Sivapalaratnam; Kate Downes; Kousik Kundu; Lorenzo Bomba; Kim Berentsen; John R Bradley; Louise C Daugherty; Olivier Delaneau; Kathleen Freson; Stephen F Garner; Luigi Grassi; Jose Guerrero; Matthias Haimel; Eva M Janssen-Megens; Anita Kaan; Mihir Kamat; Bowon Kim; Amit Mandoli; Jonathan Marchini; Joost H A Martens; Stuart Meacham; Karyn Megy; Jared O'Connell; Romina Petersen; Nilofar Sharifi; Simon M Sheard; James R Staley; Salih Tuna; Martijn van der Ent; Klaudia Walter; Shuang-Yin Wang; Eleanor Wheeler; Steven P Wilder; Valentina Iotchkova; Carmel Moore; Jennifer Sambrook; Hendrik G Stunnenberg; Emanuele Di Angelantonio; Stephen Kaptoge; Taco W Kuijpers; Enrique Carrillo-de-Santa-Pau; David Juan; Daniel Rico; Alfonso Valencia; Lu Chen; Bing Ge; Louella Vasquez; Tony Kwan; Diego Garrido-Martín; Stephen Watt; Ying Yang; Roderic Guigo; Stephan Beck; Dirk S Paul; Tomi Pastinen; David Bujold; Guillaume Bourque; Mattia Frontini; John Danesh; David J Roberts; Willem H Ouwehand; Adam S Butterworth; Nicole Soranzo Journal: Cell Date: 2016-11-17 Impact factor: 41.582
Authors: Annalisa Buniello; Jacqueline A L MacArthur; Maria Cerezo; Laura W Harris; James Hayhurst; Cinzia Malangone; Aoife McMahon; Joannella Morales; Edward Mountjoy; Elliot Sollis; Daniel Suveges; Olga Vrousgou; Patricia L Whetzel; Ridwan Amode; Jose A Guillen; Harpreet S Riat; Stephen J Trevanion; Peggy Hall; Heather Junkins; Paul Flicek; Tony Burdett; Lucia A Hindorff; Fiona Cunningham; Helen Parkinson Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971
Authors: Wouter De Coster; Peter De Rijk; Arne De Roeck; Tim De Pooter; Svenn D'Hert; Mojca Strazisar; Kristel Sleegers; Christine Van Broeckhoven Journal: Genome Res Date: 2019-06-11 Impact factor: 9.043