Lian Deng1, Haiyi Lou1, Xiaoxi Zhang1,2, Bhooma Thiruvahindrapuram3, Dongsheng Lu1, Christian R Marshall3,4,5, Chang Liu1, Bo Xie1, Wanxing Xu1,2, Lai-Ping Wong6, Chee-Wei Yew7, Aghakhanian Farhang8,9, Rick Twee-Hee Ong6, Mohammad Zahirul Hoque10, Abdul Rahman Thuhairah11, Bhak Jong12,13,14, Maude E Phipps9, Stephen W Scherer3,4,15,16, Yik-Ying Teo6,17,18,19,20, Subbiah Vijay Kumar21, Boon-Peng Hoh22,23, Shuhua Xu24,25,26,27,28. 1. Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China. 2. School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China. 3. The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada. 4. Genome Diagnostics, Department of Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada. 5. Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada. 6. Saw Swee Hock School of Public Health, National University of Singapore, Singapore, 117597, Singapore. 7. Biotechnology Research Institute, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia. 8. Jefrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Jalan Lagoon Selatan, Sunway, 46150, Subang Jaya, Selangor, Malaysia. 9. Tropical Medicine and Biology Platform, Monash University Malaysia, Jalan Lagoon Selatan, 46150 Sunway, Subang Jaya, Selangor, Malaysia. 10. Faculty of Medicine and Health Sciences, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia. 11. Clinical Pathology Diagnostic Centre Research Laboratory, Faculty of Medicine, Universiti Teknologi MARA, Sungai Buloh Campus, 47000 Sg Buloh, Subang Jaya, Selangor, Malaysia. 12. Personal Genomics Institute, Genome Research Foundation, Suwon, Republic of Korea. 13. Geromics, Ulsan, 44919, Republic of Korea. 14. Biomedical Engineering Department, The Genomics Institute, UNIST, Ulsan, Republic of Korea. 15. Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada. 16. Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. 17. NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore, 117456, Singapore. 18. Life Sciences Institute, National University of Singapore, Singapore, Singapore. 19. Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore. 20. Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672, Singapore. 21. Biotechnology Research Institute, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia. vijay@ums.edu.my. 22. Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China. hoh.boopeng@gmail.com. 23. Faculty of Medicine and Health Sciences, UCSI University, Jalan Menara Gading, Taman Connaught, Cheras, 56000, Kuala Lumpur, Malaysia. hoh.boopeng@gmail.com. 24. Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China. xushua@picb.ac.cn. 25. School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China. xushua@picb.ac.cn. 26. Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China. xushua@picb.ac.cn. 27. Collaborative Innovation Center of Genetics and Development, Shanghai, 200438, China. xushua@picb.ac.cn. 28. Human Phenome Institute, Fudan University, Shanghai, 201203, China. xushua@picb.ac.cn.
Abstract
BACKGROUND: Recent advances in genomic technologies have facilitated genome-wide investigation of human genetic variations. However, most efforts have focused on the major populations, yet trio genomes of indigenous populations from Southeast Asia have been under-investigated. RESULTS: We analyzed the whole-genome deep sequencing data (~ 30×) of five native trios from Peninsular Malaysia and North Borneo, and characterized the genomic variants, including single nucleotide variants (SNVs), small insertions and deletions (indels) and copy number variants (CNVs). We discovered approximately 6.9 million SNVs, 1.2 million indels, and 9000 CNVs in the 15 samples, of which 2.7% SNVs, 2.3% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify de novo variants and estimated the autosomal mutation rates to be 0.81 × 10- 8 - 1.33 × 10- 8, 1.0 × 10- 9 - 2.9 × 10- 9, and ~ 0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example is a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples. CONCLUSION: Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.
BACKGROUND: Recent advances in genomic technologies have facilitated genome-wide investigation of human genetic variations. However, most efforts have focused on the major populations, yet trio genomes of indigenous populations from Southeast Asia have been under-investigated. RESULTS: We analyzed the whole-genome deep sequencing data (~ 30×) of five native trios from Peninsular Malaysia and North Borneo, and characterized the genomic variants, including single nucleotide variants (SNVs), small insertions and deletions (indels) and copy number variants (CNVs). We discovered approximately 6.9 million SNVs, 1.2 million indels, and 9000 CNVs in the 15 samples, of which 2.7% SNVs, 2.3% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify de novo variants and estimated the autosomal mutation rates to be 0.81 × 10- 8 - 1.33 × 10- 8, 1.0 × 10- 9 - 2.9 × 10- 9, and ~ 0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example is a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples. CONCLUSION: Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.
Authors: Afshin Parsa; Yen-Pei C Chang; Reagan J Kelly; Mary C Corretti; Kathleen A Ryan; Shawn W Robinson; Stephen S Gottlieb; Sharon L R Kardia; Alan R Shuldiner; Stephen B Liggett Journal: Clin Transl Sci Date: 2011-02 Impact factor: 4.689
Authors: Benjamin M Neale; Yan Kou; Li Liu; Avi Ma'ayan; Kaitlin E Samocha; Aniko Sabo; Chiao-Feng Lin; Christine Stevens; Li-San Wang; Vladimir Makarov; Paz Polak; Seungtai Yoon; Jared Maguire; Emily L Crawford; Nicholas G Campbell; Evan T Geller; Otto Valladares; Chad Schafer; Han Liu; Tuo Zhao; Guiqing Cai; Jayon Lihm; Ruth Dannenfelser; Omar Jabado; Zuleyma Peralta; Uma Nagaswamy; Donna Muzny; Jeffrey G Reid; Irene Newsham; Yuanqing Wu; Lora Lewis; Yi Han; Benjamin F Voight; Elaine Lim; Elizabeth Rossin; Andrew Kirby; Jason Flannick; Menachem Fromer; Khalid Shakir; Tim Fennell; Kiran Garimella; Eric Banks; Ryan Poplin; Stacey Gabriel; Mark DePristo; Jack R Wimbish; Braden E Boone; Shawn E Levy; Catalina Betancur; Shamil Sunyaev; Eric Boerwinkle; Joseph D Buxbaum; Edwin H Cook; Bernie Devlin; Richard A Gibbs; Kathryn Roeder; Gerard D Schellenberg; James S Sutcliffe; Mark J Daly Journal: Nature Date: 2012-04-04 Impact factor: 49.962
Authors: Gavin Band; Quang Si Le; Luke Jostins; Matti Pirinen; Katja Kivinen; Muminatou Jallow; Fatoumatta Sisay-Joof; Kalifa Bojang; Margaret Pinder; Giorgio Sirugo; David J Conway; Vysaul Nyirongo; David Kachala; Malcolm Molyneux; Terrie Taylor; Carolyne Ndila; Norbert Peshu; Kevin Marsh; Thomas N Williams; Daniel Alcock; Robert Andrews; Sarah Edkins; Emma Gray; Christina Hubbart; Anna Jeffreys; Kate Rowlands; Kathrin Schuldt; Taane G Clark; Kerrin S Small; Yik Ying Teo; Dominic P Kwiatkowski; Kirk A Rockett; Jeffrey C Barrett; Chris C A Spencer Journal: PLoS Genet Date: 2013-05-23 Impact factor: 5.917
Authors: Qun Lin; Yan Huang; Carmen J Booth; Volker H Haase; Randall S Johnson; M Celeste Simon; Frank J Giordano; Zhong Yun Journal: J Am Heart Assoc Date: 2013-12-10 Impact factor: 5.501