Billy T Lau1,2, Dmitri Pavlichin1, Anna C Hooker1, Alison Almeda1, Giwon Shin1, Jiamin Chen1, Malaya K Sahoo3, Chun Hong Huang3, Benjamin A Pinsky3,4, Ho Joon Lee5, Hanlee P Ji6,7. 1. Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, CCSR 1120, Stanford, CA, 94305-5151, USA. 2. Stanford Genome Technology Center West, Stanford University, Palo Alto, CA, 94304, USA. 3. Department of Pathology, Stanford University School of Medicine, Stanford, CA, 94305, USA. 4. Department of Medicine, Division of Infectious Diseases and Geographic Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA. 5. Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, CCSR 1120, Stanford, CA, 94305-5151, USA. hojoon@stanford.edu. 6. Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, CCSR 1120, Stanford, CA, 94305-5151, USA. genomics_ji@stanford.edu. 7. Stanford Genome Technology Center West, Stanford University, Palo Alto, CA, 94304, USA. genomics_ji@stanford.edu.
Abstract
BACKGROUND: The genome of SARS-CoV-2 is susceptible to mutations during viral replication due to the errors generated by RNA-dependent RNA polymerases. These mutations enable the SARS-CoV-2 to evolve into new strains. Viral quasispecies emerge from de novo mutations that occur in individual patients. In combination, these sets of viral mutations provide distinct genetic fingerprints that reveal the patterns of transmission and have utility in contact tracing. METHODS: Leveraging thousands of sequenced SARS-CoV-2 genomes, we performed a viral pangenome analysis to identify conserved genomic sequences. We used a rapid and highly efficient computational approach that relies on k-mers, short tracts of sequence, instead of conventional sequence alignment. Using this method, we annotated viral mutation signatures that were associated with specific strains. Based on these highly conserved viral sequences, we developed a rapid and highly scalable targeted sequencing assay to identify mutations, detect quasispecies variants, and identify mutation signatures from patients. These results were compared to the pangenome genetic fingerprints. RESULTS: We built a k-mer index for thousands of SARS-CoV-2 genomes and identified conserved genomics regions and landscape of mutations across thousands of virus genomes. We delineated mutation profiles spanning common genetic fingerprints (the combination of mutations in a viral assembly) and a combination of mutations that appear in only a small number of patients. We developed a targeted sequencing assay by selecting primers from the conserved viral genome regions to flank frequent mutations. Using a cohort of 100 SARS-CoV-2 clinical samples, we identified genetic fingerprints consisting of strain-specific mutations seen across populations and de novo quasispecies mutations localized to individual infections. We compared the mutation profiles of viral samples undergoing analysis with the features of the pangenome. CONCLUSIONS: We conducted an analysis for viral mutation profiles that provide the basis of genetic fingerprints. Our study linked pangenome analysis with targeted deep sequenced SARS-CoV-2 clinical samples. We identified quasispecies mutations occurring within individual patients and determined their general prevalence when compared to over 70,000 other strains. Analysis of these genetic fingerprints may provide a way of conducting molecular contact tracing.
BACKGROUND: The genome of SARS-CoV-2 is susceptible to mutations during viral replication due to the errors generated by RNA-dependent RNA polymerases. These mutations enable the SARS-CoV-2 to evolve into new strains. Viral quasispecies emerge from de novo mutations that occur in individual patients. In combination, these sets of viral mutations provide distinct genetic fingerprints that reveal the patterns of transmission and have utility in contact tracing. METHODS: Leveraging thousands of sequenced SARS-CoV-2 genomes, we performed a viral pangenome analysis to identify conserved genomic sequences. We used a rapid and highly efficient computational approach that relies on k-mers, short tracts of sequence, instead of conventional sequence alignment. Using this method, we annotated viral mutation signatures that were associated with specific strains. Based on these highly conserved viral sequences, we developed a rapid and highly scalable targeted sequencing assay to identify mutations, detect quasispecies variants, and identify mutation signatures from patients. These results were compared to the pangenome genetic fingerprints. RESULTS: We built a k-mer index for thousands of SARS-CoV-2 genomes and identified conserved genomics regions and landscape of mutations across thousands of virus genomes. We delineated mutation profiles spanning common genetic fingerprints (the combination of mutations in a viral assembly) and a combination of mutations that appear in only a small number of patients. We developed a targeted sequencing assay by selecting primers from the conserved viral genome regions to flank frequent mutations. Using a cohort of 100 SARS-CoV-2 clinical samples, we identified genetic fingerprints consisting of strain-specific mutations seen across populations and de novo quasispecies mutations localized to individual infections. We compared the mutation profiles of viral samples undergoing analysis with the features of the pangenome. CONCLUSIONS: We conducted an analysis for viral mutation profiles that provide the basis of genetic fingerprints. Our study linked pangenome analysis with targeted deep sequenced SARS-CoV-2 clinical samples. We identified quasispecies mutations occurring within individual patients and determined their general prevalence when compared to over 70,000 other strains. Analysis of these genetic fingerprints may provide a way of conducting molecular contact tracing.
Authors: Nicole R Sexton; Everett Clinton Smith; Hervé Blanc; Marco Vignuzzi; Olve B Peersen; Mark R Denison Journal: J Virol Date: 2016-07-27 Impact factor: 5.103
Authors: Rebecca J Rockett; Alicia Arnott; Connie Lam; Rosemarie Sadsad; Verlaine Timms; Karen-Ann Gray; John-Sebastian Eden; Sheryl Chang; Mailie Gall; Jenny Draper; Eby M Sim; Nathan L Bachmann; Ian Carter; Kerri Basile; Roy Byun; Matthew V O'Sullivan; Sharon C-A Chen; Susan Maddocks; Tania C Sorrell; Dominic E Dwyer; Edward C Holmes; Jen Kok; Mikhail Prokopenko; Vitali Sintchenko Journal: Nat Med Date: 2020-07-09 Impact factor: 53.440
Authors: Patrick Flaherty; Georges Natsoulis; Omkar Muralidharan; Mark Winters; Jason Buenrostro; John Bell; Sheldon Brown; Mark Holodniy; Nancy Zhang; Hanlee P Ji Journal: Nucleic Acids Res Date: 2011-10-19 Impact factor: 16.971
Authors: Anna Cushing; Amanda Kamali; Mark Winters; Erik S Hopmans; John M Bell; Susan M Grimes; Li C Xia; Nancy R Zhang; Ronald B Moss; Mark Holodniy; Hanlee P Ji Journal: Sci Rep Date: 2015-11-05 Impact factor: 4.379
Authors: M R Capobianchi; M Rueca; F Messina; E Giombini; F Carletti; F Colavita; C Castilletti; E Lalle; L Bordi; F Vairo; E Nicastri; G Ippolito; C E M Gruber; B Bartolini Journal: Clin Microbiol Infect Date: 2020-03-27 Impact factor: 8.067
Authors: Brenda Martínez-González; María Eugenia Soria; Lucía Vázquez-Sirvent; Cristina Ferrer-Orta; Rebeca Lobo-Vega; Pablo Mínguez; Lorena de la Fuente; Carlos Llorens; Beatriz Soriano; Ricardo Ramos-Ruíz; Marta Cortón; Rosario López-Rodríguez; Carlos García-Crespo; Pilar Somovilla; Antoni Durán-Pastor; Isabel Gallego; Ana Isabel de Ávila; Soledad Delgado; Federico Morán; Cecilio López-Galíndez; Jordi Gómez; Luis Enjuanes; Llanos Salar-Vidal; Mario Esteban-Muñoz; Jaime Esteban; Ricardo Fernández-Roblas; Ignacio Gadea; Carmen Ayuso; Javier Ruíz-Hornillos; Nuria Verdaguer; Esteban Domingo; Celia Perales Journal: Pathogens Date: 2022-06-08
Authors: Dmitri S Pavlichin; HoJoon Lee; Stephanie U Greer; Susan M Grimes; Tsachy Weissman; Hanlee P Ji Journal: Nucleic Acids Res Date: 2022-04-26 Impact factor: 19.160