Lidong Guo1,2,3, Mengyang Xu2,3,4,5, Wenchao Wang2, Shengqiang Gu1, Xia Zhao6, Fang Chen6, Ou Wang4,5, Xun Xu4,5, Inge Seim7,8, Guangyi Fan2,3,4,5, Li Deng9,10,11,12, Xin Liu13,14,15,16. 1. BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China. 2. BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China. 3. State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China. 4. BGI-Shenzhen, Shenzhen, 518083, China. 5. China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China. 6. MGI, BGI-Shenzhen, Shenzhen, 518083, China. 7. Integrative Biology Laboratory, College of Life Sciences, Nanjing Normal University, Nanjing, 210046, China. 8. School of Biology and Environmental Science, Queensland University of Technology, Brisbane, 4000, Australia. 9. BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China. dengli1@genomics.cn. 10. State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China. dengli1@genomics.cn. 11. BGI-Shenzhen, Shenzhen, 518083, China. dengli1@genomics.cn. 12. China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China. dengli1@genomics.cn. 13. BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China. liuxin@genomics.cn. 14. State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China. liuxin@genomics.cn. 15. BGI-Shenzhen, Shenzhen, 518083, China. liuxin@genomics.cn. 16. China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China. liuxin@genomics.cn.
Abstract
BACKGROUND: Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. RESULTS: In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder . CONCLUSIONS: SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.
BACKGROUND: Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. RESULTS: In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder . CONCLUSIONS: SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.
Entities:
Keywords:
Genome assembly; Next-generation sequencing; Scaffolding; Synthetic long reads
Authors: Feng Zhen Chen; Li Jin You; Fan Yang; Li Na Wang; Xue Qin Guo; Fei Gao; Cong Hua; Cong Tan; Lin Fang; Ri Qiang Shan; Wen Jun Zeng; Bo Wang; Ren Wang; Xun Xu; Xiao Feng Wei Journal: Yi Chuan Date: 2020-08-20
Authors: Fiona Kaper; Sajani Swamy; Brandy Klotzle; Sarah Munchel; Joseph Cottrell; Marina Bibikova; Han-Yu Chuang; Semyon Kruglyak; Mostafa Ronaghi; Michael A Eberle; Jian-Bing Fan Journal: Proc Natl Acad Sci U S A Date: 2013-03-18 Impact factor: 11.205
Authors: Brock A Peters; Bahram G Kermani; Andrew B Sparks; Oleg Alferov; Peter Hong; Andrei Alexeev; Yuan Jiang; Fredrik Dahl; Y Tom Tang; Juergen Haas; Kimberly Robasky; Alexander Wait Zaranek; Je-Hyuk Lee; Madeleine Price Ball; Joseph E Peterson; Helena Perazich; George Yeung; Jia Liu; Linsu Chen; Michael I Kennemer; Kaliprasad Pothuraju; Karel Konvicka; Mike Tsoupko-Sitnikov; Krishna P Pant; Jessica C Ebert; Geoffrey B Nilsen; Jonathan Baccash; Aaron L Halpern; George M Church; Radoje Drmanac Journal: Nature Date: 2012-07-11 Impact factor: 49.962