Literature DB >> 30380071

DSMNC: a database of somatic mutations in normal cells.

Xuexia Miao1, Xi Li1,2, Lifei Wang1,2, Caihong Zheng1, Jun Cai1,2.   

Abstract

Numerous non-inherited somatic mutations, distinct from those of germ-line origin, occur in somatic cells during DNA replication per cell-division. The somatic mutations, recording the unique genetic cell-lineage 'history' of each proliferating normal cell, are important but remain to be investigated because of their ultra-low frequency hidden in the genetic background of heterogeneous cells. Luckily, the recent development of single-cell genomics biotechnologies enables the screening and collection of the somatic mutations, especial single nucleotide variations (SNVs), occurring in normal cells. Here, we established DSMNC: a database of somatic mutations in normal cells (http://dsmnc.big.ac.cn/), which provides most comprehensive catalogue of somatic SNVs in single cells from various normal tissues. In the current version, the database collected ∼0.8 million SNVs accumulated in ∼600 single normal cells (579 human cells and 39 mouse cells). The database interface supports the user-friendly capability of browsing and searching the SNVs and their annotation information. DSMNC, which serves as a timely and valuable collection of somatic mutations in individual normal cells, has made it possible to analyze the burdens and signatures of somatic mutations in various types of heterogeneous normal cells. Therefore, DSMNC will significantly improve our understanding of the characteristics of somatic mutations in normal cells.

Entities:  

Year:  2019        PMID: 30380071      PMCID: PMC6323907          DOI: 10.1093/nar/gky1045

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Numerous non-inherited somatic mutations are occurring and accumulating with cell divisions, which record the unique genetic feature of each proliferating somatic cell beginning from a zygote. For these somatic mutations, the better understanding of their characteristics and potential roles in cell-lineage determination, aging or disease occurrence is extremely important (1–3). The somatic mutations that contribute to the rapid proliferation of abnormal cells were observable due to tumor clonality and thus were especially concerned in previous tumor genomic studies (4,5). But, the somatic mutations in heterogeneous cells remain largely unexplored and their signatures in most healthy cells are not well known. The current development of advanced biotechnologies of single-cell genomics enables the screening of the somatic mutations hidden in genetic background of heterogeneous normal cells. Single-cell genomics biotechnologies have advanced rapidly with the two most common trace-DNA amplification strategies. The first strategy, for example linear amplification via transposon insertion (LIANTI), comprises a straightforward way to extract and amplify pg-level nucleic acids of a single cell using reaction reagents (6–8). The latter is a strategy of single-cell-derived clonal cultured, for example organoid formation, thereby enabling the ancestor cell genome to undergo high-fidelity expansion with the benefit of mitotic cell divisions (9,10). In recent studies, investigators have successfully surveyed somatic mutations in various types of single cells utilizing these single-cell genomics technologies. The number of somatic mutations, especial single nucleotide variations (SNVs), explored in normal cells has been growing gradually in past two years. However, to date, no database has been designed to capture these data for subsequent analysis. We gathered single-cell DNA amplification and sequencing data, and developed the unique DSMNC database (a Database of Somatic Mutations in Normal Cells). DSMNC collects somatic SNVs occurring in single normal cells into a high quality, comprehensive resource. All of the SNVs were supported by advanced single-cell DNA amplification strategies and reliable deep sequencing data. We expect that this elaborate database will serve as an important catalyst for biologists in broad research areas to understand the signatures of somatic mutations occurring in normal cells.

DATA COLLECTION AND DATABASE CONTENT

We gathered single-cell DNA amplification and sequencing data from resources of 12 published studies and our in-house studies (1,2,9,11–19) (Figure 1, Supplementary Table S1). The single-cell DNA amplification strategies include pg-level nucleic acids amplification strategy using reaction reagents and single-cell-derived clonal culture strategy. The detailed single-cell DNA amplification methods such as multiple displacement amplification (MDA), multiple annealing and looping-based amplification cycles (MALBAC), linear amplification via transposon insertion (LIANTI), single-stem-cell organoid formation, and cell reprogramming based single-cell clonal culture, have been proved to be effective in published literatures (6–10). The raw sequencing reads were aligned to the human and mouse genome assemblies (hg19 and mm9). To exclude germ-line variants from the sequencing data of each single cell (or cell clone), we used the bulk DNA sequencing data of tissue samples of the same human (or mouse) body where single cells (or cell clones) were obtained as controls. Reliable somatic SNVs in each cell were then screened and filtered from the single-cell genomic sequencing data via the MuTect algorithm with the following strict criteria (20), e.g. (a) variant sites had a minimum coverage of 15 and Phred-scaled base quality above 15; (b) the mutant allele SNV frequency was in the range of 0.3–0.7, whereas it was 0 or 1 in the corresponding bulk DNA sample; (c) the mutant allele was supported by at least two reads in the both forward and reverse strands (Figure 1).
Figure 1.

Schematically illustrates the general workflow and features of the database.

Schematically illustrates the general workflow and features of the database. In summary, DSMNC currently contains a catalogue of ∼0.77 million somatic SNVs occurring in over 579 human cells and ∼0.014 million somatic SNVs in 39 mouse cells from ∼300 individuals (Figure 1; A detailed description please refer to Table 1). These single cells cover various cell types of blood, brain, colon, liver, skin, stomach, bowel and others (Table 1) (1,2,9,11–19). All the somatic SNVs and the related information were loaded into database server MySQL 5.7.19. Each row of the main database table contains the key item of somatic SNV ID and its annotation information with chromosome loci, nucleotide type in reference or mutation allele, supported read depths in genomic sequencing data, associated gene symbol, cell type, single-cell DNA amplification and sequencing method, and control bulk DNA samples (Figure 1). We will continue to collect more somatic SNVs in normal cells and update the database in the future. Besides, other genomic mutation data including germ-line SNPs from dbSNP137/dbSNP128 and COSMIC tumor SNVs were gathered and built into our database for further comparison study of mutational signatures (21,22).
Table 1.

Statistics of DSMNC database content

SpeciesCategoriesSingle-Cell/ Individual numbersSNVs numbers
HUMAN Cell type
blood24/109171
brain186/2186 838
colon21/644 404
liver10/513 915
skin324/204594 790
small intestine14/921 261
Single-cell DNA amplification method
Multiple displacement amplification155/1879 750
Single-stem-cell clonal culture51/1210 402
Organoid formation45/1979 580
Cell reprogramming based single-cell clonal culture328/205600 647
Total 579/254∼770 000
MOUSE cell type
brain7/51646
large bowel7/21625
prostate4/1590
small bowel8/23477
stomach6/21022
mouse embryonic fibroblasts2/21950
adipocyte progenitor cells5/33893
Single-cell DNA amplification method
Organoid formation25/26714
Cell reprogramming based single-cell clonal culture14/107489
Total 39/12∼14 000
Statistics of DSMNC database content

WEB INTERFACE

The webserver of the DSMNC database was built using Apache 2.4.27. The web interface was implemented in PHP and JavaScript. And the Search and Browse webpages were produced by Jbrowse 1.13.1. A fully functional database of DSMNC is freely available on the website at the link of http://dsmnc.big.ac.cn/. We recommend Google Chrome for visiting the database. The database interface supports the user-friendly capability of browsing, searching and downloading all the DSMNC data without login or registration (Figure 2).
Figure 2.

Screenshots of the web interfaces in DSMNC. The web interface of DSMNC comprises four main functional components: Home webpage, Browse webpage, Search webpage and Download webpage. The view of DSMNC database summary is given in the Home webpage. Users can browse the detailed genomic information on selected group of somatic SNVs in text format or visualized image format. The Search webpage allows users to retrieve the list of somatic SNVs indexed by gene symbols or chromosome regions. And somatic SNVs indexed by the accession ID for each single cell can be downloaded in the Download webpage.

Screenshots of the web interfaces in DSMNC. The web interface of DSMNC comprises four main functional components: Home webpage, Browse webpage, Search webpage and Download webpage. The view of DSMNC database summary is given in the Home webpage. Users can browse the detailed genomic information on selected group of somatic SNVs in text format or visualized image format. The Search webpage allows users to retrieve the list of somatic SNVs indexed by gene symbols or chromosome regions. And somatic SNVs indexed by the accession ID for each single cell can be downloaded in the Download webpage. Five main webpages are been included in the database: Home, Browse, Search, Download and Help (Figure 2). Users can browse the detailed genomic information on somatic mutations in text format or visualized image format according to their own unique needs. In the table text format of Browse webpage, the list of somatic SNVs in a single cell and their annotation information is returned when clicking on the accession ID for each single cell (e.g. ID_7_individual_1_single-cell_1). Additionally, users can select the keywords below ‘organism’, ‘organ’ or ‘single-cell amplification method’ in the left toolbar menu to view the subset of the somatic SNVs in normal cells. By clicking on the menu ‘Browse → Browse by chromosome’ in the Browse webpage, users can change the table text format into the visualized image format. Somatic SNVs as well as comparable germ-line SNPs within an adjustable chromosomal region are visualized on JBrowse. User-friendly control elements such as zoom in/out, box select and track check are available for the creation of landscape maps of somatic SNVs in a genomic region. More detailed description on the SNV can be found in a popup sub-window when the users click the track of each SNV in JBrowse. In this genomic region selectable tracks of SNP density, calculated with two sliding window of 1 bp and 1 kb respectively, can be browsed. DSMNC also provides an option in the Search webpage that allows users to retrieve the list of somatic SNVs by gene symbols or chromosome regions. Besides the SNV list, the result of ‘Search by region’ also contains an additional statistic sheet including information of synonymous/non-synonymous ratios, mutation-type signatures and mutation density for the SNVs within a user-specified genomic region. ‘Search by gene’ has been designed to be capable of exact search and fuzzy search by a gene symbol. Also the item supports the searching input of multiple gene symbols, which should be separated by semicolon. All data in the database indexed by the accession ID for each single cell can be downloaded in the Download webpage. And finally, a detailed tutorial how to use DSMNC is available on the Help webpage.

PROSPECTIVE ANALYSIS ON SNVs IN DSMNC

As the purpose of collecting somatic mutations in individual normal cells, the database DSMNC has made it possible to analyze the burdens and signatures of somatic mutations in various types of heterogeneous normal cells. We did some preliminary analysis on somatic SNVs collected in DSMNC database. In Figure 3A, we summarized the mutation loads of normal cells among different tissues. The data suggest that hundreds of somatic SNVs per cell accumulated beginning from a zygote, but cells from different cell types have distinct mutation loads. We further described the SNV spectra of normal cells compared with the germ-line SNP spectra (Figure 3B). The mutation signatures of somatic mutations are quite different from the ones of germ-line mutations. Beyond the above prospective analysis on SNVs in DSMNC, we believe that the DSMNC database is important and generally interesting to the biologists for further investigations on the characteristics of somatic mutations occurring in normal cells.
Figure 3.

Prospective analysis on SNV signatures in DSMNC. (A) Mutation loads in normal single cells from different types of human tissues. (B) The somatic SNV spectra in normal cells comparing with the SNP spectra. The SNP information of Human and Mouse was retrieved from the dbSNP137 and dbSNP128, respectively.

Prospective analysis on SNV signatures in DSMNC. (A) Mutation loads in normal single cells from different types of human tissues. (B) The somatic SNV spectra in normal cells comparing with the SNP spectra. The SNP information of Human and Mouse was retrieved from the dbSNP137 and dbSNP128, respectively.

CONCLUSION AND DISCUSSION

Recent advances in single-cell DNA amplification technologies enable the construction of our database DSMNC, a collection of somatic mutations in heterogeneous normal cells. We implemented the user-friendly database interface with capability of browsing, searching and downloading the detailed SNVs information at the website link of http://dsmnc.big.ac.cn/. To our knowledge, DSMNC is unique for collection somatic mutations data occurring in single normal cells. The data in DSMNC would facilitate the understanding of the mutation signatures in normal cells. DSMNC is of broad appeal and utility for the biologists in the research areas such as genetics, evolution, development and cancer. For example, numbers of somatic SNVs in this database would enable the comparisons of mutation loads and hotspot regions during mitosis among various normal cell types, as concerned about by the genetic biologists; Moreover, the single-cell profiles of somatic mutation from age-spanning donors collected in our DSMNC database are necessary for the studies of ageing. The natural selection with somatic mutations, enabling selection for or against somatic normal cells, would attract the evolutionary biologists’ attentions; Lastly, the observable somatic mutations in a clonal tumor mass are composed of the ones accumulated in normal cells and the tumor-driver ones occurring during tumorigenesis. The database DSMNC is providing a chance for cancer biologists to re-define the true driver mutations in cancer which did not occur before tumorigenesis. Click here for additional data file.
  22 in total

1.  dbSNP: a database of single nucleotide polymorphisms.

Authors:  E M Smigielski; K Sirotkin; M Ward; S T Sherry
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification.

Authors:  F B Dean; J R Nelson; T L Giesler; R S Lasken
Journal:  Genome Res       Date:  2001-06       Impact factor: 9.043

3.  Rapid growth of a hepatocellular carcinoma and the driving mutations revealed by cell-population genetic analysis of whole-genome data.

Authors:  Yong Tao; Jue Ruan; Shiou-Hwei Yeh; Xuemei Lu; Yu Wang; Weiwei Zhai; Jun Cai; Shaoping Ling; Qiang Gong; Zecheng Chong; Zhengzhong Qu; Qianqian Li; Jiang Liu; Jin Yang; Caihong Zheng; Changqing Zeng; Hurng-Yi Wang; Jing Zhang; Sheng-Han Wang; Lingtong Hao; Lili Dong; Wenjie Li; Min Sun; Wei Zou; Caixia Yu; Chaohua Li; Guojing Liu; Lan Jiang; Jin Xu; Huanwei Huang; Chunyan Li; Shuangli Mi; Bing Zhang; Baoxian Chen; Wenming Zhao; Songnian Hu; Shi-Mei Zhuang; Yang Shen; Suhua Shi; Christopher Brown; Kevin P White; Ding-Shinn Chen; Pei-Jer Chen; Chung-I Wu
Journal:  Proc Natl Acad Sci U S A       Date:  2011-07-05       Impact factor: 11.205

4.  Low incidence of DNA sequence variation in human induced pluripotent stem cells generated by nonintegrating plasmid expression.

Authors:  Linzhao Cheng; Nancy F Hansen; Ling Zhao; Yutao Du; Chunlin Zou; Frank X Donovan; Bin-Kuan Chou; Guangyu Zhou; Shijie Li; Sarah N Dowey; Zhaohui Ye; Settara C Chandrasekharappa; Huanming Yang; James C Mullikin; P Paul Liu
Journal:  Cell Stem Cell       Date:  2012-03-02       Impact factor: 24.633

5.  The origin and evolution of mutations in acute myeloid leukemia.

Authors:  John S Welch; Timothy J Ley; Daniel C Link; Christopher A Miller; David E Larson; Daniel C Koboldt; Lukas D Wartman; Tamara L Lamprecht; Fulu Liu; Jun Xia; Cyriac Kandoth; Robert S Fulton; Michael D McLellan; David J Dooling; John W Wallis; Ken Chen; Christopher C Harris; Heather K Schmidt; Joelle M Kalicki-Veizer; Charles Lu; Qunyuan Zhang; Ling Lin; Michelle D O'Laughlin; Joshua F McMichael; Kim D Delehaunty; Lucinda A Fulton; Vincent J Magrini; Sean D McGrath; Ryan T Demeter; Tammi L Vickery; Jasreet Hundal; Lisa L Cook; Gary W Swift; Jerry P Reed; Patricia A Alldredge; Todd N Wylie; Jason R Walker; Mark A Watson; Sharon E Heath; William D Shannon; Nobish Varghese; Rakesh Nagarajan; Jacqueline E Payton; Jack D Baty; Shashikant Kulkarni; Jeffery M Klco; Michael H Tomasson; Peter Westervelt; Matthew J Walter; Timothy A Graubert; John F DiPersio; Li Ding; Elaine R Mardis; Richard K Wilson
Journal:  Cell       Date:  2012-07-20       Impact factor: 41.582

Review 6.  Somatic mutation, genomic variation, and neurological disease.

Authors:  Annapurna Poduri; Gilad D Evrony; Xuyu Cai; Christopher A Walsh
Journal:  Science       Date:  2013-07-05       Impact factor: 47.728

7.  Genome-wide detection of single-nucleotide and copy-number variations of a single human cell.

Authors:  Chenghang Zong; Sijia Lu; Alec R Chapman; X Sunney Xie
Journal:  Science       Date:  2012-12-21       Impact factor: 47.728

8.  Mutational heterogeneity in cancer and the search for new cancer-associated genes.

Authors:  Michael S Lawrence; Petar Stojanov; Paz Polak; Gregory V Kryukov; Kristian Cibulskis; Andrey Sivachenko; Scott L Carter; Chip Stewart; Craig H Mermel; Steven A Roberts; Adam Kiezun; Peter S Hammerman; Aaron McKenna; Yotam Drier; Lihua Zou; Alex H Ramos; Trevor J Pugh; Nicolas Stransky; Elena Helman; Jaegil Kim; Carrie Sougnez; Lauren Ambrogio; Elizabeth Nickerson; Erica Shefler; Maria L Cortés; Daniel Auclair; Gordon Saksena; Douglas Voet; Michael Noble; Daniel DiCara; Pei Lin; Lee Lichtenstein; David I Heiman; Timothy Fennell; Marcin Imielinski; Bryan Hernandez; Eran Hodis; Sylvan Baca; Austin M Dulak; Jens Lohr; Dan-Avi Landau; Catherine J Wu; Jorge Melendez-Zajgla; Alfredo Hidalgo-Miranda; Amnon Koren; Steven A McCarroll; Jaume Mora; Brian Crompton; Robert Onofrio; Melissa Parkin; Wendy Winckler; Kristin Ardlie; Stacey B Gabriel; Charles W M Roberts; Jaclyn A Biegel; Kimberly Stegmaier; Adam J Bass; Levi A Garraway; Matthew Meyerson; Todd R Golub; Dmitry A Gordenin; Shamil Sunyaev; Eric S Lander; Gad Getz
Journal:  Nature       Date:  2013-06-16       Impact factor: 49.962

9.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.

Authors:  Kristian Cibulskis; Michael S Lawrence; Scott L Carter; Andrey Sivachenko; David Jaffe; Carrie Sougnez; Stacey Gabriel; Matthew Meyerson; Eric S Lander; Gad Getz
Journal:  Nat Biotechnol       Date:  2013-02-10       Impact factor: 54.908

10.  Hypermutation of the inactive X chromosome is a frequent event in cancer.

Authors:  Natalie Jäger; Matthias Schlesner; David T W Jones; Simon Raffel; Jan-Philipp Mallm; Kristin M Junge; Dieter Weichenhan; Tobias Bauer; Naveed Ishaque; Marcel Kool; Paul A Northcott; Andrey Korshunov; Ruben M Drews; Jan Koster; Rogier Versteeg; Julia Richter; Michael Hummel; Stephen C Mack; Michael D Taylor; Hendrik Witt; Benedict Swartman; Dietrich Schulte-Bockholt; Marc Sultan; Marie-Laure Yaspo; Hans Lehrach; Barbara Hutter; Benedikt Brors; Stephan Wolf; Christoph Plass; Reiner Siebert; Andreas Trumpp; Karsten Rippe; Irina Lehmann; Peter Lichter; Stefan M Pfister; Roland Eils
Journal:  Cell       Date:  2013-10-17       Impact factor: 41.582

View more
  7 in total

1.  Mosaicism by somatic non-functional mutations: one cell lineage at a time.

Authors:  Willy Albert Flegel
Journal:  Haematologica       Date:  2019-03       Impact factor: 9.941

2.  Pan-cancer analyses of synonymous mutations based on tissue-specific codon optimality.

Authors:  Xia Ran; Jinyuan Xiao; Fang Cheng; Tao Wang; Huajing Teng; Zhongsheng Sun
Journal:  Comput Struct Biotechnol J       Date:  2022-07-06       Impact factor: 6.155

3.  Comprehensive characterization of posttranscriptional impairment-related 3'-UTR mutations in 2413 whole genomes of cancer patients.

Authors:  Wenqing Wei; Wenyan Gao; Qinglan Li; Yuhao Liu; Hongyan Chen; Yongping Cui; Zhongsheng Sun; Zhihua Liu
Journal:  NPJ Genom Med       Date:  2022-06-02       Impact factor: 6.083

4.  MutaXome: A Novel Database for Identified Somatic Variations of In silico Analyzed Cancer Exome Datasets.

Authors:  P Padmavathi; K Chandrashekar; Anagha S Setlur; Vidya Niranjan
Journal:  Cancer Inform       Date:  2022-05-13

5.  SomaMutDB: a database of somatic mutations in normal human tissues.

Authors:  Shixiang Sun; Yujue Wang; Alexander Y Maslov; Xiao Dong; Jan Vijg
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

6.  When a Synonymous Variant Is Nonsynonymous.

Authors:  Mauno Vihinen
Journal:  Genes (Basel)       Date:  2022-08-19       Impact factor: 4.141

7.  Prevalence and architecture of posttranscriptionally impaired synonymous mutations in 8,320 genomes across 22 cancer types.

Authors:  Huajing Teng; Wenqing Wei; Qinglan Li; Meiying Xue; Xiaohui Shi; Xianfeng Li; Fengbiao Mao; Zhongsheng Sun
Journal:  Nucleic Acids Res       Date:  2020-02-20       Impact factor: 16.971

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.