Ruifeng Hu1,2,3,4, Guibo Sun1,2,3,4, Xiaobo Sun5,6,7,8. 1. Beijing Key Laboratory of Innovative Drug Discovery of Traditional Chinese Medicine (Natural Medicine) and Translational Medicine, Beijing, China. 2. Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, 151 Malianwa North Road, Haidian District, Beijing, 100193, People's Republic of China. 3. Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine, Ministry of Education, Beijing, China. 4. Key Laboratory of the Efficacy Evaluation of Chinese Medicine against Glycolipid Metabolism Disorder Disease, State Administration of Traditional Chinese Medicine, Beijing, China. 5. Beijing Key Laboratory of Innovative Drug Discovery of Traditional Chinese Medicine (Natural Medicine) and Translational Medicine, Beijing, China. xbsun@implad.ac.cn. 6. Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, 151 Malianwa North Road, Haidian District, Beijing, 100193, People's Republic of China. xbsun@implad.ac.cn. 7. Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine, Ministry of Education, Beijing, China. xbsun@implad.ac.cn. 8. Key Laboratory of the Efficacy Evaluation of Chinese Medicine against Glycolipid Metabolism Disorder Disease, State Administration of Traditional Chinese Medicine, Beijing, China. xbsun@implad.ac.cn.
Abstract
BACKGROUND: The single molecule, real time (SMRT) sequencing technology of Pacific Biosciences enables the acquisition of transcripts from end to end due to its ability to produce extraordinarily long reads (>10 kb). This new method of transcriptome sequencing has been applied to several projects on humans and model organisms. However, the raw data from SMRT sequencing are of relatively low quality, with a random error rate of approximately 15 %, for which error correction using next-generation sequencing (NGS) short reads is typically necessary. Few tools have been designed that apply a hybrid sequencing approach that combines NGS and SMRT data, and the most popular existing tool for error correction, LSC, has computing resource requirements that are too intensive for most laboratory and research groups. These shortcomings severely limit the application of SMRT long reads for transcriptome analysis. RESULTS: Here, we report an improved tool (LSCplus) for error correction with the LSC program as a reference. LSCplus overcomes the disadvantage of LSC's time consumption and improves quality. Only 1/3-1/4 of the time and 1/20-1/25 of the error correction time is required using LSCplus compared with that required for using LSC. CONCLUSIONS: LSCplus is freely available at http://www.herbbol.org:8001/lscplus/ . Sample calculations are provided illustrating the precision and efficiency of this method regarding error correction and isoform detection.
BACKGROUND: The single molecule, real time (SMRT) sequencing technology of Pacific Biosciences enables the acquisition of transcripts from end to end due to its ability to produce extraordinarily long reads (>10 kb). This new method of transcriptome sequencing has been applied to several projects on humans and model organisms. However, the raw data from SMRT sequencing are of relatively low quality, with a random error rate of approximately 15 %, for which error correction using next-generation sequencing (NGS) short reads is typically necessary. Few tools have been designed that apply a hybrid sequencing approach that combines NGS and SMRT data, and the most popular existing tool for error correction, LSC, has computing resource requirements that are too intensive for most laboratory and research groups. These shortcomings severely limit the application of SMRT long reads for transcriptome analysis. RESULTS: Here, we report an improved tool (LSCplus) for error correction with the LSC program as a reference. LSCplus overcomes the disadvantage of LSC's time consumption and improves quality. Only 1/3-1/4 of the time and 1/20-1/25 of the error correction time is required using LSCplus compared with that required for using LSC. CONCLUSIONS: LSCplus is freely available at http://www.herbbol.org:8001/lscplus/ . Sample calculations are provided illustrating the precision and efficiency of this method regarding error correction and isoform detection.
Entities:
Keywords:
Error correction; RNA-seq; SMRT sequencing; Time-consumption
Authors: Jason L Weirather; Pegah Tootoonchi Afshar; Tyson A Clark; Elizabeth Tseng; Linda S Powers; Jason G Underwood; Joseph Zabner; Jonas Korlach; Wing Hung Wong; Kin Fai Au Journal: Nucleic Acids Res Date: 2015-06-03 Impact factor: 16.971
Authors: Guilherme B Dias; Musaad A Altammami; Hamadttu A F El-Shafie; Fahad M Alhoshani; Mohamed B Al-Fageeh; Casey M Bergman; Manee M Manee Journal: Sci Rep Date: 2021-05-11 Impact factor: 4.379
Authors: Richard I Kuo; Yuanyuan Cheng; Runxuan Zhang; John W S Brown; Jacqueline Smith; Alan L Archibald; David W Burt Journal: BMC Genomics Date: 2020-10-30 Impact factor: 3.969