Oliver Eickelberg1, Alexander V Misharin2, Patricia J Sime3. 1. Division of Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania. 2. Division of Pulmonary and Critical Care Medicine, Northwestern University, Evanston, Illinois and. 3. Department of Internal Medicine, Virginia Commonwealth University, Richmond, Virginia.
Genomic technologies, specifically microarrays and
high-throughput DNA and RNA sequencing (RNA-seq), have had an enormous impact on basic
and translational research in lung biology. The findings from some of these studies have
now transformed into diagnostic tests, which are rapidly changing our clinical practice
(1). Advances in high-throughput single-cell
genomics and multiomics approaches enable investigators to study organs and tissues at
the level of fundamental units of life—cells. Single-cell technologies allow
investigators to overcome “the averaging problem” inherent to the analysis
at the whole-tissue level or by pooling specific cell populations selected on a handful
of markers, which are subjected to composition bias. Hence, single-cell technologies
have rapidly become a logical method of choice to study lung biology and disease (2). Specifically, single-cell RNA-seq has been
applied to create reference atlases of normal human lung tissue (3–6), to study
cellular crosstalk in multicellular lung cancer niches (7, 8), to discover novel cell types
in the normal lung (9) or in pulmonary fibrosis
(10–16), to evaluate remodeling of the airway epithelium in asthma
(17) or as a result of smoking (18), and to study the immune response in cystic
fibrosis (19) and in coronavirus disease
(COVID-19) (20–22). By applying the power of single-cell techniques to causal
experiments in model organisms or in vitro systems, investigators were
able to provide mechanistic insights about the role of specific cell types in disease
pathogenesis (23–25).In line with previous studies, in this issue of the Journal, Carraro,
Mulay, Yao, and colleagues (pp. 1540–1550) applied
single-cell RNA-seq to evaluate the hierarchical relationship between airway epithelial
cells of patients with idiopathic pulmonary fibrosis (IPF) and those of control subjects
(26). Performing single-cell RNAseq on
epithelial cells from airways of six control subjects and seven patients with IPF, the
authors used computational approaches to resolve previously reported major subsets of
airway epithelial cells, such as basal, club, goblet, ciliated, mucous, and serous
cells, and ionocytes (4, 9). The authors focused on the analysis of basal epithelial cells,
which play an important role as a progenitor population in the airways, contributing to
normal maintenance and repair after injury (27,
28). Several transcriptionally distinct
subsets were identified within basal cells and were named multipotent basal,
proliferating basal, activated basal, and secretory-primed basal cells. Previous
lineage-tracing studies in animal models identified basal cells as local progenitor
cells capable of giving rise to secretory and ciliated cells in the airways after the
injury and revealed the crucial role of Wnt and Notch pathways in this process (27, 28).
In agreement with previous studies (4, 18), this work from Carraro and colleagues
confirmed the existence of the multiple transitory cell types between basal and
secretory cells. The authors identified transcriptionally similar cell types in patients
with IPF and reported a substantial expansion of secretory-primed basal cell type in
IPF. Moreover, guided by the results of the single-cell transcriptomic profiling, a
screen of commercially available antibodies to antigens expressed on the surface of
these basal cell subsets identified anti-CD66 antibody as a marker separating
bona fide basal cells
(EPCAM+NGFR+CD66−) from secretory-primed basal cells
(EPCAM+NGFR+CD66+). This allowed the sorting of the live
basal and secretory-primed basal cells and the evaluation of their differentiation
potential in a series of in vitro assays. In contrast to true basal
cells, secretory-primed basal cells had limited capacity for self-renewal. Using
specific blocking antibodies against NOTCH1, NOTCH2, or NOTCH3, the authors validated
their distinct roles in the maintenance of the basal cells and secretory-primed basal
cells. Thus, the work by Carraro and colleagues provides an example of how single-cell
transcriptomic analyses, together with orthogonal validation techniques, uncover novel
disease mechanisms and reconcile findings from model organisms and human subjects.The wealth and depth of data generated by single-cell genomic techniques demand a
different approach to manuscript submission, evaluation, and publishing, with increased
responsibility for authors, reviewers, editors, and publishers, among others. Will the
reader be able to find the same reported cell in several different papers using similar
tissues yet different analytical approaches? Interestingly, Deprez and colleagues (6) performed single-cell RNA-seq of airway
biopsies from healthy volunteers in another manuscript recently published in the
Journal. The authors specifically resolved several
transcriptionally distinct subsets of basal cells, which they referred to as basal,
cycling basal, and suprabasal, among others. Although cluster labels and marker genes
between these two studies do not directly match, this does not imply that one group is
correct and the other wrong. Slight differences in nomenclature, methodology, and tissue
sources and, especially, differences in computational approaches explain why cell types
and clusters in those datasets differ. An integrated analysis of these and other
existing datasets in which single-cell RNA-seq data are uniformly processed will reveal
stable cell types and states reproducibly observed in human airways, and new
computational approaches (transfer learning) will enable rapid iterative validation in
newly generated datasets and reconcile differences in nomenclature or annotations (29).Such integrative analysis, as well as the validation of existing cellular populations and
discovery of novel cellular populations, demands appropriate sharing of single-cell
genomic data and accompanying metadata. Genomics data-sharing is mandated by American
Thoracic Society Journal guidelines and funding agencies, including the
NIH. Authors should strive to share as much data, metadata, and protocols as possible
and to do so in a manner that facilitates retrieval and reuse of the data by the
community. For an in-depth review of the current challenges in genomic data-sharing and
paths for mitigating these challenges, we recommend an outstanding recent review by Byrd
and colleagues (30). Peer review of manuscripts
reporting findings from single-cell genomics assays also brings unique challenges for
the reviewer and editor alike. Authors can mitigate these challenges by taking simple
actions, such as providing a statement about the availability of the raw data and
metadata on controlled repositories, including the Short Reads Archive via the Database
of Genotypes and Phenotypes, the European Genome-Phenome Archive, or the Chinese Genome
Sequence Archive for Human. Authors should ensure that reviewers have access to
processed data via public repositories (Gene Expression Omnibus or The European
Bioinformatics Institute Archive) at the time of the manuscript submission for review.
Reviewers can then be provided with access to detailed and well-annotated code, which
allows reproduction of the analysis performed by the authors. This is usually achieved
on platforms such as GitHub (https://github.com/) or Code Ocean
(https://codeocean.com/); the latter takes reproducible analysis one step
further and allows fully reproducible reanalysis in the preserved computational
environment in the cloud (“containers”), thus alleviating potential issues
related to outdated software packages and dependencies. Finally, authors can facilitate
peer review and data dissemination upon publication by providing an interactive web tool
for easy and intuitive dataset exploration by reviewers and readers without
computational expertise. Setting up such tools is not a complicated task, and the cost
of maintaining such websites is low. Moreover, two popular platforms, CellBrowser
(https://cells.ucsc.edu/) and Cellxgene (https://github.com/chanzuckerberg/cellxgene), offer dataset hosting on
their websites, thus alleviating concerns about preserving the anonymity of the peer
review.Undoubtedly, sharing the code and presenting the analysis of single-cell genomics
datasets in this manner requires substantial effort from the authors. On the positive
side, this comes with the benefit of fast and transparent peer review, in which
reviewers do not have to possess specialized skills or knowledge of software packages
but can rather fully focus on the authors’ interpretation of the results and
their relevance to pulmonary biology or disease. It is worth mentioning that peer review
is a lengthy process, and sharing data, code, or analysis results up front possesses the
risk of “being scooped.” These risks are usually mitigated by depositing
the manuscript linked to a specific dataset and analysis to one of the available and
reliable preprint servers, such as arXiv.org or bioRxiv.org. A great example of such an
approach is recent work from Habermann and colleagues, who made their high-quality,
well-annotated dataset and accompanying code publicly available at the time of
publishing the preprint, making it an invaluable resource for the community 10 months
before its publication in a peer-reviewed journal (11).In conclusion, single-cell genomics is rapidly becoming a standard tool for basic,
clinical, and translational research. These changes call for updates to guidelines for
sharing genomics and other big data. Because even publicly shared data and code could be
challenging to reanalyze (data could be mislabeled, deposited in an unusual format,
missing or removed after the upload), journals may eventually assume a role of a
“data guardian” and clone a version of the authors’ code to a
specific GitHub repository and also provide a snapshot (via checksum) of the files
deposited to the public repositories. Data and code integrity during and after the
publication process would therefore be ensured. In this editorial, we have focused on a
specific aspect related to single-cell genomics data. These ideas and suggestions,
however, can and should be applied to other types of high-content data, such as imaging,
metabolomics, proteomics, or mass cytometry, in the near future.
Authors: Gerard A Silvestri; Anil Vachani; Duncan Whitney; Michael Elashoff; Kate Porta Smith; J Scott Ferguson; Ed Parsons; Nandita Mitra; Jerome Brody; Marc E Lenburg; Avrum Spira Journal: N Engl J Med Date: 2015-05-17 Impact factor: 91.245
Authors: Tatsuya Tsukui; Kai-Hui Sun; Joseph B Wetter; John R Wilson-Kanamori; Lisa A Hazelwood; Neil C Henderson; Taylor S Adams; Jonas C Schupp; Sergio D Poli; Ivan O Rosas; Naftali Kaminski; Michael A Matthay; Paul J Wolters; Dean Sheppard Journal: Nat Commun Date: 2020-04-21 Impact factor: 14.919
Authors: Gianni Carraro; Apoorva Mulay; Changfu Yao; Takako Mizuno; Bindu Konda; Martin Petrov; Daniel Lafkas; Joe R Arron; Cory M Hogaboam; Peter Chen; Dianhua Jiang; Paul W Noble; Scott H Randell; Jonathan L McQualter; Barry R Stripp Journal: Am J Respir Crit Care Med Date: 2020-12-01 Impact factor: 21.405
Authors: Arun C Habermann; Austin J Gutierrez; Linh T Bui; Stephanie L Yahn; Nichelle I Winters; Carla L Calvi; Lance Peter; Mei-I Chung; Chase J Taylor; Christopher Jetter; Latha Raju; Jamie Roberson; Guixiao Ding; Lori Wood; Jennifer M S Sucre; Bradley W Richmond; Ana P Serezani; Wyatt J McDonnell; Simon B Mallal; Matthew J Bacchetta; James E Loyd; Ciara M Shaver; Lorraine B Ware; Ross Bremner; Rajat Walia; Timothy S Blackwell; Nicholas E Banovich; Jonathan A Kropski Journal: Sci Adv Date: 2020-07-08 Impact factor: 14.136
Authors: E Madissoon; A Wilbrey-Clark; R J Miragaia; K Saeb-Parsy; K T Mahbubani; N Georgakopoulos; P Harding; K Polanski; N Huang; K Nowicki-Osuch; R C Fitzgerald; K W Loudon; J R Ferdinand; M R Clatworthy; A Tsingene; S van Dongen; M Dabrowska; M Patel; M J T Stubbington; S A Teichmann; O Stegle; K B Meyer Journal: Genome Biol Date: 2019-12-31 Impact factor: 13.583
Authors: Maximilian Strunz; Lukas M Simon; Meshal Ansari; Jaymin J Kathiriya; Ilias Angelidis; Christoph H Mayr; George Tsidiridis; Marius Lange; Laura F Mattner; Min Yee; Paulina Ogar; Arunima Sengupta; Igor Kukhtevich; Robert Schneider; Zhongming Zhao; Carola Voss; Tobias Stoeger; Jens H L Neumann; Anne Hilgendorff; Jürgen Behr; Michael O'Reilly; Mareike Lehmann; Gerald Burgstaller; Melanie Königshoff; Harold A Chapman; Fabian J Theis; Herbert B Schiller Journal: Nat Commun Date: 2020-07-16 Impact factor: 14.919
Authors: Kyle J Travaglini; Ahmad N Nabhan; Lolita Penland; Rahul Sinha; Astrid Gillich; Rene V Sit; Stephen Chang; Stephanie D Conley; Yasuo Mori; Jun Seita; Gerald J Berry; Joseph B Shrager; Ross J Metzger; Christin S Kuo; Norma Neff; Irving L Weissman; Stephen R Quake; Mark A Krasnow Journal: Nature Date: 2020-11-18 Impact factor: 49.962
Authors: Anna J Podolanczuk; Alyson W Wong; Shigeki Saito; Joseph A Lasky; Christopher J Ryerson; Oliver Eickelberg Journal: Am J Respir Crit Care Med Date: 2021-06-01 Impact factor: 21.405