The mammary gland consists of cells with gene expression patterns reflecting their cellular origins, function, and spatiotemporal context. However, knowledge of developmental kinetics and mechanisms of lineage specification is lacking. We address this significant knowledge gap by generating a single-cell transcriptome atlas encompassing embryonic, postnatal, and adult mouse mammary development. From these data, we map the chronology of transcriptionally and epigenetically distinct cell states and distinguish fetal mammary stem cells (fMaSCs) from their precursors and progeny. fMaSCs show balanced co-expression of factors associated with discrete adult lineages and a metabolic gene signature that subsides during maturation but reemerges in some human breast cancers and metastases. These data provide a useful resource for illuminating mammary cell heterogeneity, the kinetics of differentiation, and developmental correlates of tumorigenesis.
The mammary gland consists of cells with gene expression patterns reflecting their cellular origins, function, and spatiotemporal context. However, knowledge of developmental kinetics and mechanisms of lineage specification is lacking. We address this significant knowledge gap by generating a single-cell transcriptome atlas encompassing embryonic, postnatal, and adult mouse mammary development. From these data, we map the chronology of transcriptionally and epigenetically distinct cell states and distinguish fetal mammary stem cells (fMaSCs) from their precursors and progeny. fMaSCs show balanced co-expression of factors associated with discrete adult lineages and a metabolic gene signature that subsides during maturation but reemerges in some humanbreast cancers and metastases. These data provide a useful resource for illuminating mammary cell heterogeneity, the kinetics of differentiation, and developmental correlates of tumorigenesis.
A deep understanding of complex tissues requires knowledge of the integrated
molecular circuitry of each of the tissue’s constituent cells. Prior work
used surface markers to fractionate the luminal, basal, and alveolar cells of the
mouse mammary gland, and their lineage-restricted progenitors and stem cells (Shackleton et al., 2006; Shehata et al 2012; Sleeman et al., 2006; Stingl et al.,
2006; Villadsen et al., 2007).
Delineating how the ratios and molecular profiles of these cell types change over
development can give valuable insights into the organization of the tissue and the
regulators of differentiation and homeostasis. It should also provide insight into
subversion of this organization by maladies such as cancer and identify cell states
that are susceptible to tumorigenesis and therapeutic targets to prevent or revert
tumorigenic phenotypes. We and others have previously reported relationships between
the expression profiles of mouse mammary stem/progenitor cell populations and humanbreast cancers (Lim et al., 2009; Pfefferle et al., 2015; Prat et al., 2010; Spike
et al., 2012). In particular, mouse fetal mammary stem cell
(fMVaSC)-containing isolates show significant relatedness to aggressive human breast
cancers (Pfefferle et al., 2015; Spike et al. 2012). However, it has been
challenging to distill critical molecular regulators and cell type-specific
biomarkers from bulk profiles since the cell type of interest often constitutes a
small fraction of the cell population. For example, transplantation assays show
adult mouse mammary stem cells comprise ~2% of sorted cell populations (Shackleton et al., 2006; Spike et al., 2012; Stingl et al, 2006; Wang et al.,
2015). While the stem cell fraction is much higher during fetal mammary
organogenesis, even the most enriched populations exhibit heterogeneity (Dravis et al., 2015; Spike et al., 2012; Spike
et al., 2014).Single-cell RNA sequencing (scRNA-seq) reveals the cellular and
transcriptional heterogeneity of complex tissues (Kumar et al., 2017). For example, expression profiles have recently been
obtained for single adult mouse mammary cells (Bach
et al., 2017; Pal et al., 2017).
However, these studies reveal neither the transcriptional programs that generate
mature cell types from primitive embryonic antecedents nor the timing with which
developmental transitions occur.Mouse mammary organogenesis occurs with stereotyped structures at
reproducible times (Veltmaat et al., 2003),
and with dramatic changes in stem cell function (Spike et al., 2012; Makarem et al.,
2013a). fMaSCs are the earliest cells shown by in vitro
mammosphere formation, in vivo lineage tracing, and transplantation
to fulfill all criteria for bipotent mammary stem cells (Makarem et al., 2013a; Spike et al., 2012; Van Keymeulen et
al., 2011). They become measurable on embryonic day 16 (E16), increase
dramatically to E18 (Spike et al., 2012), and
then decline immediately after birth to produce the architecturally simple mature
mammary epithelium (Giraddi et al., 2015;
Makarem et al., 2013b; Prater et al., 2014; Spike et al., 2012). Luminal and basal compartments appear to be
sustained by uni-potent cells in adults (Van
Keymeulen et al., 2011; Giraddi et al.,
2015; Wang et al., 2017; Wuidart et al., 2016), although rare
bipotential adult mammary cells may also exist (Rios
et al., 2014; Wang et al.,
2015).Here, we elucidate biological programs that distinguish fMaSCs from
differentiating cells. We generate a scRNA-seq dataset encompassing fetal,
postnatal, and adult mouse mammary epithelia, paying special attention to the
perinatal interval, over which the prevalent, multipotent fMaSC phenotype declines
and differentiation ensues (Makarem et al.,
2013a; Spike et al., 2012). The
data establish the chronology of emerging cell types in the mammary epithelium, and
underlying changes in chromatin accessibility. We also identify fMaSC gene
signatures related to chromatin architecture and metabolism that have significance
for fMaSC function and relevance for humanbreast cancers. Mapping the relationships
between single-cell tran-scriptomes as cells differentiate reveals two patterns of
gene expression that revise classic models of transcriptionally discrete cell types.
First, individual fMaSCs co-express genes associated with differentiating mammary
lineages. This supports a model where opposing lineage differentiation factors aid
in specifying the uncommitted stem cell state (Loh
and Lim, 2011). Second, the fMaSC-containing population constitutes a
single distribution of heterogeneous transcriptional states without discrete
subclusters corresponding to their measured stem cell fraction. This suggests that
stem cell capacity is distributed across heterogeneous cell profiles. The
observation has implications for the generation of stem cell activity during tissue
repair and cancer progression, and could explain historical difficulties in
purifying stem cells using limited markers.
RESULTS
Single-Cell Transcriptomes from Developing Mouse Mammary Glands Distinguish
Cell States
We used two approaches to generate scRNA-seq data from embryonic,
postnatal, and adult mammary cells sorted for surface Epcam, an epithelial
specific marker present throughout development on stem, progenitor, and
differentiated mouse mammary epithelial cells (Figures 1A, S1A, and S1B) (Shehata et al.,
2012; Spike et al., 2012). We
used the Chromium Drop-Seq platform (10x Genomics) to obtain transcriptomes from
E16 (n = 690), E18 (n = 1,047), postnatal day 4 (P4) (n = 849), and adult
mammary cells (n = 3,838). This includes 786 Epcam+ adult cells
sorted for coexpression of Cd49f, ~2% of which are inferred to be
functional stem cells based on mammary reconstitution assays (Prater et al., 2014; Shackleton et al., 2006; Spike et
al., 2012; Stingl et al.,
2006). We constructed a normalized gene expression matrix spanning these
developmental stages composed of 6,060 cells with 500–2,000 expressed
genes per cell, and a total of 22,184 genes expressed in five or more cells
(Andrews, 2010; Dillies et al., 2013; Dobin et al., 2013; Hu and Smyth,
2009; Katayama et al., 2013;
Li and Dewey, 2011; Li et al., 2009; Lin
et al., 2016; Marinov et al.,
2014) (Table
S1). We also used the C1 system (Fluidigm) to focus in greater depth
on the transition from the stem cell-rich, late embryonic stage to the immediate
postnatal stage. With C1, we sequenced 262 cells, which included differentiated
adult comparators with >1.5 million reads, 4,000–9,000 genes per
cell, and 13,355 genes expressed in five or more cells (Figures 1A and S1C–S1I; Table S2).
Figure 1.
Derivation and Clustering of Mouse Mammary Epithelial Single-Cell
Transcriptomes from Embryonic Development to Adulthood
(A) Isolation and sequencing of mammary cells from different
developmental stages. Two strategies are shown, Chromium Drop-Seq (10×
Genomics) and C1-microfluidic capture (Fluidigm), with differential output (gray
box).
(B) tSNE plot of single-cell transcriptomes from indicated developmental
stages.
(C) The plot in (B), overlaid with relative expression levels of mammary
lineage markers.
(D) NMF clustering of single-cell expression profiles (n = 6,059), shown
by the white-to-black correlation scale. Colored bars correspond to
developmental context (right; y axis) or NMF group (x axis).
(E) Projection of NMF groups identified in (D) onto the tSNE plot from
(B).
(F) tSNE plots for adult basal (Epcam,
Cd49f, left panel) and E18 cells (right panel)
isolated, processed, and plotted separately with colors corresponding to the NMF
grouping in (D).
We first visualized the Chromium-derived data using t-distributed
stochastic neighbor embedding (tSNE), a commonly used dimensionality reduction
approach that plots cells with similar profiles as nearby points (Figures 1B, 1C, 1E, and 1F) (van der Maaten and Hinton, 2008). This analysis
identified one small group of cells from multiple developmental stages, and
seven other groups in which all cells derive from a single stage (Figure 1B). Based on their lack of Epcam RNA, and high
levels of Vi-mentin RNA and other non-epithelial markers, the small mixed cell
group likely represents contaminating stroma (Figures 1C and S2A). Apart from these cells, adult
cells comprised three major clusters, P4 cells comprised two clusters, and E16
and E18 cells comprised one cluster each (Figure
1B). An analysis of known lineage markers indicates the adult groups
correspond to basal cells (Itgb1+, Krt14+), mature luminal
cells (Ly6a+, Krt8+), and alveolar precursor cells
(Cd14+, Csn3+, Krt8+) (Figure 1C). However, we note that lineage-associated
markers are often imperfect in that they show sporadic expression in other adult
groups (Figure 1C). The data also show that
individual cells from earlier developmental stages often express multiple
lineage markers (Figure 1C). For example,
while a minority of P4 cells exhibit a basal profile, the majority manifest
luminal and alveolar features (e.g., Krt8, Csn3, and Cd14 expression) in
addition to Krt14 (Figure 1). This
contrasts with the interpretation that prepubertal cells are largely basal
(Pal et al., 2017).We next applied non-negative matrix factorization (NMF) as an
alternative approach to cluster samples into transcriptional cell types based on
the 1,000 most variably expressed genes in the data matrix (Figures 1D and S1L) (Brunet et al., 2004; Lee and Seung, 1999; Saeed et al.,
2003; Shao and Höfer,
2017). This analysis included sorted adult basal cells
(Epcam+, Cd49f+). We observed a small cluster of
mixed-stage cells corresponding to the mixed tSNE cluster and three major adult
cell clusters: (1) a basal cluster including the vast majority of presorted
adult basal cells, (2) the presumptive luminal alveolar group, and (3) the
mature luminal group. Some P4 cells from the smaller of the two P4 tSNE clusters
were grouped with the adult basal compartment in this analysis (Figures 1D and 1E). Importantly, embryonic epithelial
cells again comprise a single cluster in this analysis.Functional assays indicate that up to 2% of adult basal cells, and
10%−50% of E18 cells possess mammary stem cell attributes, depending on
the markers used for isolation (Dravis et al.,
2015; Shackleton et al., 2006;
Spike et al., 2012; Stingl et al 2006; Wang et al., 2015). We examined these populations alone by tSNE to
attempt to identify minority stem cell clusters (Figure 1F). However, other than stromal or luminal alveolar
contaminants, neither group showed obvious subclusters (Figure 1F). The contaminating cell types likely
reflect the technical limitations of cell sorting, the biological inaccuracy of
marker-based purification, or both. Similar to Pal et al. (2017), we did identify occasional basal cells expressing
atypical markers such as Elf5, Muc1, and Kit, but these genes did not identify
the same cells or a coherent subgroup, and mixed phenotype adult profiles did
not occur at a rate exceeding the expected doublet frequency in these assays,
~1% for 10× derived profiles (see STAR Methods; 10× Chromium V2 Guide). These data suggest that
either the functional stem cell state can be generated by different
transcriptional programs or that multiple types of cells can act as facultative
stem cells under the assay conditions employed for functional testing.
Discrete Mammary Epithelial Lineages Arise Postnatally with Loss of Balanced
Lineage Factor Co-expression
As only the relationships between the most similar cells are well
represented in tSNE, it can be difficult to interpret cellular relationship more
globally across a diverse dataset. As an alternate approach to investigate the
relationships between single-cell expression profiles, differentiation states,
and developmental context, we plotted the single-cell data according to
diffusion components (DCs), a noise-tolerant, non-linear dimensionality
reduction method that reveals a global topology for the data based on local
similarities between points (Coifman et al.,
2005; Haghverdi et al., 2015)
(Figures 2A and 2E). The resulting
graph produced an intuitive developmental picture in which primitive (E16 and
E18) and adult cells occupied opposite ends of the DC2 axis, while P4 cells
localized to intermediate positions (Figure
2A). The regions occupied by E16 and E18 cells largely overlap, while
P4 cells form a continuum between the more primitive cells and those of the
adult. Importantly, while adult cells occupy discrete distributions at various
extremes along the DC1 axis, E16 and E18 cells do not bifurcate and are instead
positioned midway between the two DC1 extremes. P4 cells distribute in a unique
pattern along DC1, with a small group extending toward a tightly grouped set of
adult cells (Figure 2A, bottom right
quadrant) and a second larger group of cells extending as a single group toward
the two major adult groups (upper right quadrant). We provide a web tool for the
interactive visualization of gene expression across this dataset (http://uofuhealth.utah.edu/huntsman/labs/spike/d3.php).
Figure 2.
Diversification of Cell Types in the Mouse Mammary Epithelium
(A) Diffusion map of single-cell transcriptomes annotated by stage of
collection.
(B) A color gradient is used to depict the ratio of Krt8 and Krt14 on
the same diffusion map. Cells with much higher levels of Krt8 levels than Krt14
are green, and cells with the opposite configuration are red. Orange/yellow
color indicates balanced coexpression. Sphere radius assigned by maximum value
of Krt8 or Krt14. The right three panels represent immuno-staining of mouse
mammary epithelium for Krt8 (green) and Krt14 (red) at the given stages of
development.
(C) Placement of NMF groups in the diffusion map from (A) and their
correspondence to markers of known cell types as indicated. Sphere radius
represents normalized expression values for the given factors. Color, NMF
group.
(D) Multiplex fluorescent in situ hybridization for
select lineage markers shown in (C), in adult and embryonic mammary tissue.
Insets in second lower panel, Extreme digital zoom showing close-proximity red,
green, and blue signals.
(E) Pseudotemporal vectors through the diffusion map from (A)
color-coded to represent lineage branch points as shown in (F).
(F)Heatmap of regulon scores from SCENIC analysis. Rows, Individual
regulons. Columns, Cells organized according to pseudotemporal trajectories as
indicated below the heatmap. a, Adult lineage oriented regulons; b, primitive
regulons; *onset of balanced lineage regulon activity. Color coding is also
given for NMF groups and developmental stage as in Figure 1D.
Graphing cells according to the ratio of transcripts for the well-known
luminal and basal markers, Krt8 and Krt14, respectively, is concordant with
in situ staining for the products of these
lineage-associated genes over the same developmental window (Figure 2B). That is, we observed the following: (1)
cells with mixed keratin expression predominating early (E18.5); (2) emergence
of a minority Krt14-expressing P4 population that correlates with a
well-defined, elongating myoepithelial cell layer lining the P4 epithelium; (3)
a majority of P4 cells harboring a mixed lineage
Krt8+Krt14+ phenotype but positioned proximal to adult
luminal cell types in the diffusion map; and (4) the resolution of this group
into cells expressing Krt8 but lacking Krt14 in the adult gland, as would be
expected for mature luminal cells lining the ducts (Figure 2B). In addition, each NMF-derived cluster
represented a single region of the diffusion map in the first two DCs with the
exception of NMF group 1. NMF1 corresponds to the group of cells from mixed
developmental stages with stromal expression patterns and was largely
distinguishable from epithelial cells along a higher DC (DCS) (Figure 2C). NMF-VII and NMF-II localize to the
different extremes of DC1 and represent basal cells and Esr1+ luminal
cells expressing Ly6a (the gene for Seal), respectively (Figures 1C, 2C, and S2C). NMF-IV is
positioned intermediately and contains cells expressing Wfdc18, Csn3, Csn2, Kit,
and Itga2, indicative of an alveolar luminal phenotype (Figures 2C and S2B) (Pal et al., 2017).These distributions imply that embryonic cells are distinguished from
adult cells by expression patterns that change gradually over developmental time
along DC2, and by an intermediate phenotype with respect to lineage
differentiation (DC1). We chose a set of markers of these differences for
cross-validation in situ. Consistent with the predicted pattern
of expression from scRNA-seq, multiplex in situ hybridization
distinguished adult cells expressing Krt14, both Wfdc18 and Krt8, or Krt8 alone
(Figure 2D). By contrast, most cells in
the fetal mammary epithelium co-express all three markers (Figure 2D). Similarly, targets identified in scRNA-seq
data as being commonly expressed in fetal mammary cells, but not adult cells,
e.g., Sostdc1, were identified in situ in fetal but not adult
tissue (Figure 2D).These in-depth analyses using tSNE, NMF, and diffusion mapping were
concordant in that none was able to identify cellular subpopulations
corresponding to the percentage of stem cells estimated by functional assays.
Furthermore, they point to a predominance of mixed lineage phenotypes,
corroborated in situ, in early development that is lost as
development progresses.
The above expression patterns suggest that, as development progresses,
E18 cells resolve into two distinct cell types by P4. The first represents a
basal population, while the second tends toward luminal specification in spite
of persistent co-expression of some basal markers, such as Krt14. This
population then resolves into the two distinct adult luminal populations. This
model is consistent with the multi-potential state of fMaSCs, as well as recent
observations from lineage-tracing experiments suggesting that independent
Esr1+ and Esrl− populations of luminal cells
are established by adulthood (Van Keymeulen et
al., 2011; Wang et al.,
2017).We constructed pseudotemporal trajectories for this simple lineage
bifurcation model using principal curves through proximal populations in the
first two components of the diffusion map, although the data do not rule out
other differentiation trajectories (Figure
2E) (Hastie, 1989). We then
combined pseudotemporal ordering of cells with SCENIC analysis to analyze the
chronology of transcription program activation over mammary development (Figures
2F and S2C) (Aibar et al., 2017).Among the differentially activated regulons identified by this approach
were correlated sets that were highly expressed in one of the three adult cell
types as well as regulons that typify embryonic and/or P4 cells and are
downregulated as development progresses (Figure
2F). Each major adult cell type was also characterized by certain
repressed regulons, such as Ehf for basal, Creb3l2 for alveolar, and Atf4,
Cebpd, Cebpb, and YY1 for ER+ luminal cells (Figure 2F). While E16 cells were largely negative for
adult-associated regulon activity, there was a marked and balanced activation of
regulons corresponding to adult cell types at E18 and this balanced pattern
persisted into P4 cells (Figure 2F,
asterisk). Primitive cells positioned after the branch points of the
pseudotemporal trajectories grossly retained a high degree of similarity to
their pre-branchpoint counterparts. However, differential regulation of
lineage-associated factors could be observed. For instance, elevated Trp63
activity is evident in cells trending toward the basal branch, while Esr1 and
Spdef activity is upregulated in P4/NMF-IV cells that trend toward the
Esr1+ luminal group (Figure
2F). The luminal determinant FoxA1 was upregulated in most P4 luminal
cells but was subsequently downregulated in the Esrl− luminal
cells of the adult (Figure 2F).
A Continuum of Gene Expression Profiles Defines Early Stem Cell State
Transitions in the Mammary Gland
To gain a more highly detailed view of changing gene expression patterns
over the critical period in early development where multipotent fMaSC generate
more committed cells in the postnatal mammary epithelium (Figures 2 and S3) (Makarem et al., 2013a; Spike et al., 2012), we obtained deeper
transcriptomes of single cells across the E18-to-P4 transition using the C1
microfluidics platform (Fluidigm), with inclusion of adult cells for reference
(Figure 1A; Table S2). For this dataset, we
clustered samples by NMF into putative cell types based on their expression
patterns of the 1,500 most variably expressed genes (Figure S1J). We found that six
clusters provide a suitable division of the data (Figures 3A and S1K). NMF clusters 1–4
almost exclusively comprise cells from a single developmental stage, with
cluster 1 predominantly comprising P1 cells, while clusters 2–4 contain
adult cells (Figure 3A; Table S2). In contrast, the
compositions of clusters 5 and 6 were mixed among E18, P1, and P4 cells,
suggesting the presence of cell states that vary in their prevalence across
organismal development rather than being strictly defined by the time at which
the samples were isolated.
Figure 3.
Identification of Cell Types and Signatures across Early Mammary Epithelial
Development
(A) NMF correlation matrix of single cells (clusters/rank = 6) with NMF
sample groups 1–6 numbered and color-coded. Also shown and color-coded
are the developmental stages of isolation for each cell, and subclusters (a and
b) derived for adult cells and NMF groups 5 and 6 independently of the
correlation matrix shown.
(B) Relationships between single-cell profiles graphed according to the
first three diffusion components, and color-coded by stage of origin,
σ = 129.
(C) DC1 and two positions for the epithelial NMF groups,
σ = 55.
(D)Violin plots showing expression levels of luminal-associated Krt8
and basal-associated Krt14 among cells of NMF groups 2 and 3.
(E) A simplified schematic of the approximate positions of NMF
subclusters in the two-dimensional (2D) diffusion map from (C).
(F) A schematic describing the rank product-based procedure for
defining group-specific signatures.
(G) Heatmap of signature expression with cells sorted by NMF group with
annotation of select genes. Rows, Gene expression values. Columns, Samples
(i.e., single cells). Mean signature enrichment per cell is also given
(bottom).
(H) Graphical representation of GO categories for the fMASC
signature.
(I) Mean expression in individual cells of a luminal balancer signature
and a basal balancer signature. Also shown is the mean expression of
differentially expressed gene lists delineating the prominent subdivision of
MMPr cells. Brackets, Pearson correlation.
(J) Mean expression of the differentially expressed gene lists for
NMF5a versus NMF5b (i.e., fMaSC subdivision). Ordering of cells from (G) is
maintained in (I) and (J).
Similar to our Chromium-based analysis, adult cells and E18 cells
occupied opposite ends of the resulting graph along one DC axis (DC1), with P1
and P4 cells occupying intermediate positions (Figure 3B). The regions occupied by E18, P1, and P4 cells overlapped
significantly in a continuum of transcriptional changes related to advancing
organismal age. Adult subsets again occupied the extremes of a second DC axis
(DC2), corresponding to luminal and basal lineages based on their expression of
lineage-associated keratins (Figures
3B–3D), and could be further subdivided to reveal a population
of presumptive alveolar cells despite their lower numbers in this analysis
(Figures 3A, 3C, 3E, S4D, and S4E) (Visvader and Stingl, 2014). However, as in Chromium
data, the lineage bifurcation noted for adult cells was largely absent from E18,
P1, and P4 cells, and again we did not observe discrete subpopulations
correlating in number to the predicted stem cell frequencies (Figures 3B–3E). However, we could identify
other minority cell types as NMF1 and −4 segregated from the major
epithelial clusters along DC5 and DC3, respectively (Figure S4A). In contrast, most of
the variance in the remaining four NMF groups was directed along the first two
DCs (Figures 3C and 3E).We next found signature genes distinguishing each major predicted cell
type (i.e., NMF1 to −6) using a pairwise, non-para-metric, rank-product
(RP) approach (Breitling et al., 2004). We
took the genes that were over-represented in each NMF group versus every other
NMF group as the group-specific gene signature (Figures 3F and 3G). This unsupervised differential expression
analysis led us to denote the small cluster NMF1 as
“Matrix”-expressing cells based on their expression of
mesenchymal/matrix-related transcripts (e.g., Coll, 3, 5, 6, 12, 15, 18; Fn1;
Vim), and to denote NMF4 as “lmmune”-related cells based on their
expression of class II transcripts (e.g., B2m; Fcerlg; H2-Aa, Ab1, Eb1, M2, Q6,
Q7, T22) (Figure 3G). NMF1 and −4 in
this analysis may represent subsets of the non-epithe-lial groups identified
from Chromium data (Figures 1 and 2). While the origins and functions of these
minority populations remain to be determined, we note that MHC-expressing
mammary cells have been described (Elliott et
al., 1988: Forero et al..
2016). The analysis also confirmed the adult basal and luminal phenotypes
of NMF2 and −3, respectively, as it re-identified a known basal and
luminal markers for each group (Skibinski et
al., 2014) (Figure 3G; Table S2).
Identification of fMaSC Expression Programs from scRNA-Seq
The data indicate that NMF5 corresponds to the fMaSC transcriptional
state, as (1) DC analysis localizes NMF5 to a distal position relative to adult
cells, (2) most NMF5 cells derive from a developmental time point with very high
mammary stem cell activity, and (3) the NMF5 signature (857 genes, hereafter
“fMaSC signature”) includes Eya2, Itag6, Nrg1, Sostdcl, Sox10,
Myb, Lsr, Sfrpl, Bcl11a, Pthlh, Sema3B, Slitrk2, and others that we previously
identified as highly expressed in bulk fMaSC signatures, including some we have
shown to be functionally relevant (Dravis et
al., 2015; Spike et al., 2012)
(Figures 3G and S4B). The single-cell-derived fMaSC
signature also includes many genes not identified in our prior bulk population
studies (Spike et al., 2012) (Figures
3G and S4B; Table S3). Gene ontology (GO)
enrichment analysis revealed that the fMaSC signature comprises genes involved
in a variety of cellular processes of potential importance for stem and
progenitor cells, including cellular metabolism, chromatin conformation, cell
cycle, and tissue development (Maere et al.,
2005; Shannon et al., 2003)
(Figure 3H).As the data imply that fMaSCs are defined both by the unique features
captured in the fMaSC signature and by co-expression of lineage-associated
factors, we also wanted to identify gene sets reflecting this basal-luminal
apposition. To this end, we examined the results of rank product tests between
NMF2, −3, and −5 cells (adult luminal, adult basal, and fMASC
groups, respectively), to identify a set of genes termed
“balancer” signatures (Figure
3I; Table
S3). We generated a “basal balancer signature” (i.e., genes
expressed in fMASC and basal cells more highly than luminal cells) of 937 genes
including known basal markers such as Itga6 and Krt14. Similarly, we generated a
“luminal balancer signature” (i.e., genes expressed by both
luminal cells and fMaSCs more highly than basal cells) of 477 genes, including,
for example, Cd24a and Krt8 (Figure 3I;
Table S3). The
co-expression of these balancer signatures in individual fMaSC and their
lineage-specific expression later in development is illustrated in Figure 3I. Although the specific mechanisms
by which gene products belonging to balancer signatures effect the multipotent
and uncommitted fMaSC state remain to be elucidated, we computationally
identified numerous luminal balancer genes reported to directly interact with
basal balancer genes in the MINT interaction database, suggesting that some of
the mechanisms are likely to be direct (Figure S5A). The proteins encoded
by balancer signature genes extend beyond transcription factors and represent
varied aspects of signaling, metabolism, and microenvironmental response (Figure S5A; Table S3). These data,
together with our regulon analysis, in situ detection of
lineage-associated transcripts, the pattern of lineage-associated keratin
protein expression, and our previous single-cell RT-PCR analysis support a model
in which the fMaSC state is established by balanced expression of lineage
factors and specifiers (Figures 1C, 2C, 2D, 2F, and 3I) (Villadsen et al., 2007;
Rodilla et al., 2015; Spike et al., 2012; Sun et al., 2010).
Loss of Lineage Factor Balance Shortly after Birth
NMF6 cells principally derive from postnatal mammary epithelia and have
an expression signature (234 genes) composed of transcription factors such as
Jun, Rela/b, Nfkb2, and Sox4, chromatin modifiers including Arid1a,Top2A,
Jmjdlc, and Jmjd1c, cell adhesion and cell-cell junction proteins including
claudins 1 and 6, Icam, and many others (Figure
3F; Table
S2). Although they comprise a single group in the initial analysis, NMF6
cells are more dispersed along the DC2 axis than the NMF5/fMaSC group. They also
exhibit less balanced expression of the lineager balancer signatures described
above, suggesting NMF6 cells might be divisible with regard to their relative
luminal and basal characteristics (Figures 3C and
3I). Thus, while a few early postnatal cells have already committed
to a basal fate and were grouped with NMF3, the majority of postnatal cells
comprise NMF6 and appear to represent a mixed mammary precursor/progenitor
(hereafter, MMPr) (Figures 1D, 1E, 3A–3C, 3E, and 3I; Table S2).In light of this, and as fMaSC/NMF5 and MMPr/NMF6 are relatively large
groups (76 and 66 cells, respectively), we repeated the divisive NMF procedure
on NMF5 and NMF6 independently to determine whether they show early lineage
bifurcation events. We could subdivide NMF5 and NMF6 into two subclusters each
with high cluster stability (NMF5a and -b, CC = 0.96; NMF6a and -b, CC = 0.88)
(Figure 3A). We then examined their
relative positions in the diffusion map and their differential gene expression
(Figures 3C and 3E). While these
subclusters occupied largely overlapping regions, NMF5a and -b were centered at
different points along the embryonic-adult axis (DC1) and NMF6a and 6b were
centered at different points on the basal-luminal axis (DC2) (Figures 3C and 3E).To uncover the basis of these distributions, we delineated rank-product
gene expression differences for NMF5a versus −5b and NMF6a versus
−6b, and their potential cell fate and type distinctions (Table S2). The MMPr groups are
characterized by genes indicative of lineage specification including Cdh1 and
Krt8 in NM6a (luminal) and Acta2 and Krt14 in NMF6b (basal) (Table S2). Consistent with this,
there is also a positive correlation between the expression of luminal and basal
balancer signatures and the genes subdividing these MMPr groups (Figures S4C and S4F). We therefore
designate these MMPr subclusters and their signatures as MMPr-basal and
MMPr-luminal according to their correlated balancer signatures (Figures 3G and 3I). In contrast, the data suggest that
NMF5 is composed of a transcriptionally heterogeneous population not readily
separated into luminal and basal subtypes. Rather, when subclusters from NMF5
(NMF5a and -b) were derived and contrasted, their differentially expressed genes
correlated with graded expression of the fMaSC signature derived from the
undivided NMF5 cluster (Figure 3J). We
therefore sought to investigate select features represented in the fMaSC
signature that distinguish fMaSCs from their adult counterparts.
Changes in Chromatin Regulation and Accessibility Accompany fMaSC
Differentiation
The GO enrichment for chromatin regulators was of particular interest as
it could help explain the multipotent expression pattern identified in fMaSCs
(Figure 3H). To determine whether the
fMaSC state is accompanied by altered chromatin states relative to adult
epithelia, and to determine whether lineage-associated loci were
heterochromatic, we examined global chromatin accessibility by assay for
transposase-accessible chromatin using sequencing (ATAC-seq) (Andrews, 2010; Bolstad, 2018; Buenrostro et al.,
2015; Langmead et al., 2009;
Li et al., 2009; McLean et al., 2010; Quinlan and Hall, 2010; Ramirez et
al., 2014; Zang et al., 2009;
Zhang et al., 2008). E18 cells have
uniquely accessible regions (UARs) corresponding by proximity to 1,640 genes,
and uniquely repressed regions (URRs) corresponding to 401 genes when compared
to flow-sorted adult basal (Cd49f+,
Epcaml°w/med), luminal progenitor
(Epcam+, Cd61+), and mature luminal
(Epcam+, Sca1+) cells (Figures 4A–4D and S5B; Table S3). Genes at these loci
correspond to developmental and metabolic processes and show significant overlap
with fMaSC signature genes (Figures 4C and
S5C). Overall, E18
cells show greater accessibility at fMaSC signature genes as expected (Figure S5D). Most fMaSC
signature genes, however, were not uniquely accessible in E18 cells, suggesting
other regulatory mechanisms of transcriptional control in response to changing
signals in the developmental environment (Figure S5C). Interestingly, P4
cells showed intermediate accessibility across these loci (Figures 4A, 4B, and S5B).
Figure 4.
Differential Chromatin Regulation between Primitive and Mature Mammary
Epithelial Cells
(A) Tornado plots of uniquely accessible regions (UARs) (n
= 1,222) and uniquely repressed regions (URRs) (n = 242) for fMaSCs relative to
differentiated mammary epithelia as indicated. Loci, represented by the y axes,
are held consistent between stages, while their intensity representing ATAC-seq
signal is differential.
(B) Averaged accessibility for UARs and URRs derived from adult mammary
epithelial lineage comparisons.
(C) The top 15 GO categories for fMaSC UARs with Bonferroni-corrected p
values.
(D) Proximal chromatin accessibility at three lineage-associated loci
compared across indicated differentiation states.
We also identified UARs and URRs corresponding to each major adult
lineage sorted by fluorescence-activated cell sorting (FACS) (Figures 4D and S5B) (Dravis et al., 2018). A large number of loci
distinguished each lineage from the others in terms of both UARs and URRs (Figure S5A). fMaSCs
exhibit modest and equivalent accessibility for both basal and luminal
progenitor-associated loci, although they lack significant accessibility at loci
associated with Epcam+Sca1+ luminal cells. Even more
strikingly, fMaSCs show accessibility at adult lineage URRs, suggesting that in
spite of their more universal expression of Ezh2 and associated chromatin
methylation (Figures S5E and
S5F), fMaSCs have not yet silenced chromatin regions that define
discrete lineages in the adult (Figures
4A–4C).Closer inspection of lineage-associated genes illustrates the
multilineage accessibility of fMaSCs compared to adult cells (Figure 4D). For example, while Sca1+
luminal cells show little accessibility in Krt14 and Csn3 proximal regions, Csn3
is accessible in Cd61+ luminal cells. Conversely, basal adult cells
(Cd49f+, Epcamlow/med) have lower accessibility at
Krt8 and Csn3 than either luminal population, although basal cells generally
have more open chromatin across multi-lineage markers than luminal cells
(reported in detail in Dravis et al.,
2018) (Figure 4D). Chromatin
accessibility of P4 cells indicate that the FACS-sorted basal cells show greater
accessibility at basal associated loci (e.g., Krt14) and luminal cells show a
greater accessibility at luminal associated loci (e.g., Krt8, Csn3). However,
each sorted population also shows appreciable accessibility for genes of the
opposing lineages, indicating the P4 cells remain at an intermediate stage of
lineage commitment (Figures 4B and 4D).
These chromatin patterns mirror the gradual lineage restriction implied by
diffusion maps of their associated single-cell transcriptomes (Figures 2F and 3G).
fMaSC Metabolic Profiles Are Lost during Differentiation
Metabolic factors are overwhelmingly represented among the
fMaSC-signature gene ontologies (Figure
3H). Many of the fMaSC-signature genes encode enzymes involved in
glycolytic metabolism, the Krebs cycle, and enzymes and transporters involved in
anaplerotic mechanisms such as fatty acid oxidation. Therefore, we modeled how
flux through these pathways might change as a function of development-associated
changes in gene expression (Figure 5A). We
noted elevated expression of several glycolytic enzymes in fMaSCs and MMPr cells
relative to adult cells, but also a potential for shunting the glycolytic
end-product, pyruvate, away from the mitochondria and oxidative phosphorylation,
as in the stem cell- and tumor-associated Warburg effect (Figure 5A) (ShyhChang
et al., 2013). fMaSCs have increased RNA for lactate dehydrogenase
(Ldh), and ~90% of fMaSCs express Pkm2, a splice isoform of pyruvate
kinase (Pkm). Pkm2 incorporates exon 10 rather than exon 9, reported previously
to increase lactate formation (Li and Dewey,
2011; Mazurek et al., 2005;
Robinson et al., 2011) (Figures 5A–5C). Surprisingly, all
fMaSCs co-express the Pkm1 isoform. The majority of adult cells express either
Pkm1 or Pkm2 or had Pkm levels below isoform detection limits (Figure 5C). We also noted marked elevation in
embryonic cells of transcripts for several Krebs cycle enzymes and factors
capable of providing acetyl-CoA to the mitochondria via free fatty acids. These
observations are consistent with the evolving understanding that
“Warburg-shifted” cells use alternative mechanisms to fuel Krebs
cycle reactions (ShyhChang et al., 2013)
(Figure 5A). E18 cells also have
elevated Psatl and Gpt RNA, encoding two enzymes that can balance
interconversion of pyruvate and glutamate with alanine and
α-ketoglutarate (Figure 5A) (Coloff et al., 2016). While numerous
metabolism-related transcripts associated with E18 mammary epithelium were also
elevated in E16 cells, important differences were noted. For instance, Psat1 and
Gpt transcripts were also elevated in E18 relative to E16, while E16 cells
showed elevated levels for the glutamine transporter, Slc38a1 (Figure 5A).
Figure 5.
Metabolism-Related Gene Expression Profile in fMaSC
(A) Mean expression levels (sphere size) of genes encoding glycolytic,
Krebs cycle, and fatty acid metabolism enzymes across four developmental stages.
Gray arrows, Modeled metabolic flux.
(B) Expression levels of the Pkm2 in fetal and adult mammary epithelial
cells.
(C) Aligned reads matching Pkm1 and −2 in three representative
adult and fetal epithelial cells with the percentage of similar cells at each
stage. *Occasional reads are seen in introns and non-coding regions across the
genome adjacent to encoded poly-thymine tracts, but these are not enumerated in
the quantification of transcripts.
(D) Enumeration of Krt8 and Krt14 single-positive and double-positive
cells in organoids.
(E and F) The number (E) and size (F) of organoid structures produced
under treatment with the mitochondrial pyruvate transport inhibitor, UK5099 (n =
5). *p < 0.1, Student’s t test.
The correlation between changing stem cell content in early mammary
development and the above changes in expression of metabolic factors led us to
hypothesize that differentiating cells would be more dependent on mitochondrial
pyruvate transport than stem cells, an effect described previously in other
systems (Flores et al., 2017; Schell et al., 2017). We therefore tested
the prediction that inhibiting the mitochondrial pyruvate transporters (Mpc1,
−2) should differentially affect stem and differentiating cells using
in vitro clonal organoid formation assays in the presence
and absence of a potent and specific Mpc inhibitor, UK5099 (Sigma-Aldrich)
(Gray et al., 2014). UK5099 treatment
significantly alters organoid composition in differentiation-promoting media
where presumptive lineage-committed cells contribute significantly to organoid
expansion (Spike et al., 2014) (Figures
5D, S6A, and S6B). Multicellular
fMaSC-derived organoids grown in these conditions contain significantly fewer
cells exclusively expressing Krt8 or Krt14 and an increased proportion of
Krt8+Krt14+ co-expressing cells. This effect is
associated with a partial reduction in organoid-forming efficiency and size that
is not observed in a maintenance media where stem cell capacity is preserved
during culture (Spike et al., 2014)
(Figures 5E, 5F, and S6B).
The fMaSC Metabolic Program Has Parallels in Triple-Negative Breast Cancers
and Metastases
We previously showed that bulk fMaSC signatures are related to
expression profiles from humanbreast cancerpatient samples (Pfefferle et al., 2013; Pfefferle et al.,. 2015; Spike et al., 2012). We therefore asked whether the
refined single-cell-derived fMaSC signature and its metabolic components
demarcate the same breast cancers (Cancer Genome Atlas, 2012; Ciriello et al., 2015; Forero et al., 2016; Forero-Torres et al., 2015; Parker
et al., 2009; Varley et al.,
2014). As we observed previously, fMaSCs share expression signature
similarity with basal-like breast cancers, as well as occasional cancers in
other aggressive subtypes such as luminal B and Her2tumors (Figure 6A) (Spike et
al., 2012). However, only Her2tumors and basal-like tumors showed
frequent elevation of fMaSC-like metabolic profiles, whereas the equally
proliferative luminal B tumors did not (Figure
6B). We also generated a small eight-gene metabolic signature
(Metab-8) to reflect metabolic processes proximal to the mitochondria in our
fMaSC metabolism model (Figure 5A; Table S3). This signature
was composed of the last three steps in lactate-directed glycolysis (Eno, Pkm,
Ldh), three fatty acid pathway genes (Fabp5, Hadh, Acat), and genes controlling
the balance between pyruvate, cytosolic α-ketoglutarate, alanine, and
glutamate (Psat1, Gpt) (Figure 5A). The
median expression of Metab-8 was highest in basal-like cancers, which include a
subset of the most proliferative cancers and most of the clinically designated
triple-negative breast cancers (TNBCs) (Figures 6C
and 6D). Furthermore, among TNBCs, Metab-8 was highest in TNBC
metastatic lesions (Figure 6E).
Figure 6.
Human Breast Cancers Exhibit fMaSC-Related Biology
(A-C) Expression levels of single-cell-derived fMaSC signatures in
human breast cancers of varying molecular subtype. (A) fMaSC-Signature. (B)
fMaSC-metabolism Signature. (C) Reduced 8 gene fMaSC-metabolism signature. p
value, ANOVA between groups.
(D) TCGA breast cancer expression data plotted for median expression of
a proliferations signature (PAM50, × axis) versus median expression of
the Metab-8 fMaSC subsignature.
(E) fMASC-signature genes are overexpressed in metastatic
triple-negative breast cancer. p value, Kolmogorov-Smirnov.
DISCUSSION
This work provides a comprehensive scRNA-seq resource for the developing
mammary gland. We mine the data to show that multipotent fMaSCs are typified by
co-expression of lineage-associated factors and transcriptional programs reminiscent
of opposing lineage specifiers described in embryonic stem cells (Loh and Lim, 2011). While the present work was under
review, mixed-lineage expression patterns were reported for mammary cells as early
as E14 (Wuidart et al., 2018). Although E14
cells lack stem cell activity by in vitro colony assays and
transplantation of single cells, we have previously shown that transplant of intact
rudiments at this stage does reconstitute mammary tissue, presumably through
preservation of spatial cues and acquisition of repopulating capacity in the days
subsequent to transplant (Spike et al.,
2012). This suggests that additional factors are required to confer intrinsic
mammary stem cell competence and subsequent lineage segregation.At the population level, the fMaSC transcriptional profile is reflected in
open chromatin across lineage-associated genes in spite of high PRC2 activity. It
remains to be determined whether the PRC2 machinery is only operative in these cells
at lineage-unassociated loci or, alternatively, whether there are regulatory
mechanisms directly opposing PRC2-facilitated lineage restriction. Ultimately, as
differentiation ensues a lineage bifurcation events are revealed where this balance
is lost and opposing lineage genes become restricted. Thus, a continuum of E18 cell
states resolve into two discrete populations by P4. One represents a distinct basal
(myoepithelial) phenotype, and the other represents a luminal oriented precursor
population that is itself a continuum between cells retaining basal features and
more mature, luminally oriented expression profiles. The retention of mixed basal
features in this majority population may have contributed to the apparently
incorrect interpretation that prepubertal mammary epithelium is composed of basal
oriented cells (Pal et al., 2017). As
development progresses further, these mixed mammary precursors are replaced by
discrete luminal cell types corresponding to Esr1+ and
Esrl− lineages. Although basal cells form a transcriptionally
distinct cell type in the adult, it is interesting to note (as in Dravis et al., 2018) that they show some chromatin
accessibility at luminal gene loci similar to fMaSCs. We speculate that this favors
lineage plasticity and enables acquisition of multilineage potential upon
transplantation, in wounding conditions or following ex vivo
culture (Ge et al., 2017; Prater et al., 2014; Shackleton et al., 2006; Stingl et al.,
2006).Despite the similarities between fMaSC and adult basal populations in
chromatin accessibility, fMaSCs are transcriptionally distinct from basal and other
adult cells, and this extends beyond their active co-expression of lineage-specific
genes. At the single-cell level, they show expression of factors that may relate to
their developmental plasticity and proliferative potential, as well as to their
connection to humanbreast cancers. In this regard, we examine metabolic transcript
profiles in fMaSCs and show that they are shared by humanbreast cancers. Prompted
by the changing pattern of these profiles over developmental time, we demonstrate a
changing sensitivity to UK5099 as cells differentiate in vitro. We
take these data to indicate that differentiating cells but not bipotent
stem/progenitor cells critically depend upon mitochondrial pyruvate uptake. However,
tools are not yet available to assess metabolomics directly at the resolution of
single cells, and at this time we cannot rule out the possibility that the observed
differential sensitivity relates to other effects of UK5099, although its affinity
for secondary targets is several hundred-fold lower than for mitochondrial pyruvate
carrier (MPC) (Gray et al., 2014).Importantly, although proliferation is often associated with enhanced
glycolytic rate, the fMaSC metabolic gene profile does not seem to be strictly
linked to proliferation. For instance, there are many proliferative cells and tumor
types that lack this specific metabolic profile (Figures 6, S6C,
and S6D). In single-cell-derived organoids,
Krt8+Krt14+ cells increase in number and proportion under
UK5099 treatment, while the generation of presumptive lineage committed cells is
blocked. Still, it would not be surprising if the different metabolic programs lead
to different rates of proliferation and smaller colonies in the absence of lineage
committed progenitors. Most organoids produced from adult cells express only Krt8,
but UK5099 treatment enabled production of organoids containing both basal and
luminal cells indicative of effects on plasticity beyond the possible proliferative
effects (Figure S6B). It
will be important to determine whether this fMaSC-related profile in tumors helps
pinpoint metabolic liabilities for more precise therapeutic targets than those aimed
at all proliferating cells.Our scRNA-seq analysis also led us to important observations on the nature
of transcriptional heterogeneity in uncommitted cells. Although mammary stem cell
estimates vary considerably depending upon the markers and assay used to assess
their potential, the fetal cell population from which we derived our single-cell
data exhibits 30%−50% stem cell activity measured by in
vitro sphere formation, and 10%–30% activity measured by
transplantation (Dravis et al., 2015; Spike et al., 2012; Trejo et al., 2017) (Figure S3). Despite this prevalence,
our analyses did not reveal an equivalent transcriptionally distinct stem cell
subpopulation. Similarly, although stem cell content is much lower in the adult
basal population (~2%), our transcriptional profiling of >1,000 adult
basal cells should have enabled us to identify a transcriptionally distinct stem
cell population. However, we did not. Our detection of other rare transcriptional
cell types suggests this was not due to the limitations of the sequencing and
bioinformatics strategies employed. Work by Bach et
al. (2017) also found the adult basal compartment to be relatively
indivisible at the tran-scriptome level. We acknowledge that our analysis of stem
cell content and the above scRNA-seq data would exclude any epithelial cells lacking
Epcam expression.We consider it likely that stem cell activity is probabilistically
distributed throughout the heterogeneous cell population and is likely to be highly
dependent on external cues for its maintenance (e.g., Spike et al., 2014). Performance in functional assays may
therefore critically depend on the context under which a particular cell with stem
cell potential is challenged (Spike, 2016;
Wahl and Spike, 2017). This idea may help
to refine a long-standing concept about the nature of compartmentalized functions
and rigid cell types in complex tissues, and shed light on obdurate limitations of
stem cell purification by surface markers. In our data, markers were often enriched
in a given transcriptionally defined cell type but were never perfect, differing in
their levels, being undetectable in some cells, and being expressed occasionally in
unexpected cell types. Transcriptional heterogeneity may also help explain
discrepancies in lineage tracing that have suggested lineage-restricted stem cells
or multipotent stem cells depending on the chosen promoter for tracing, context,
methodology, and interpretation.Transcriptional heterogeneity associated with variable stem cell potential
has also been identified in embryonic stem cells, intestine, lung, and skin, where
perturbations of homeostasis can promote facultative stem cell activity and create
functional states and niches (Blanpain and Fuchs,
2014; Hough et al., 2009).
Expression profiles and their discreteness could be strongly influenced by
environmental context. Indeed, it is tempting to speculate that a lack of a
homeostatic niche environment is the critical feature linking early development,
wounding, and cancer in the generation of cellular heterogeneity, mixed lineage
phenotypes, and associated cellular plasticity among cells that may already harbor
the intrinsic flexibility to respond (e.g., Ge et
al., 2017, and the present study). Transcriptional heterogeneity and
plasticity are likely to be even greater in tumor settings due to genomic
instability, therapies, and inconstant microenvironments. It may be necessary to map
the continuum of stem cell-like states to understand how they contribute to cell
fitness in particular settings. This may also enable the development of effective
therapeutic combinations directed against plastic cell states that critically
contribute to intra-tumoral heterogeneity during cancer progression.The present work provides a resource for uncovering functionally relevant
mechanisms of stem cell regulatory biology with coopted roles in tumorigenesis.
Based on our analysis of the data, it also has the potential to impact the way we
conceptualize “cell type” in the mammary gland and other complex
tissues, and to replace classical models based on rigid cell hierarchies with a more
fluid, physiologically adaptable, and robust, if experimentally challenging,
idea.KEY RESOURCES TABLE
STAR ★ METHODS
CONTACT FOR REAGENT AND RESOURCE SHARING
Requests for further information and resources may be directed to Lead
Contact Benjamin T. Spike (benjamin.spike@hci.utah.edu)
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Mice
C57BL/6 mice were obtained from Charles River as 10–16 week
old adults or as timed pregnant mothers, or were bred in house to produce
fetal, postnatal and adult tissues. Three-week old CB17 SCIDmice were also
obtained from Charles River as recipients for transplantation experiments
and 8–10 week old CB17 SCIDmice were obtained from Charles River as
recipients. All mice used in these studies were female. All animals were
handled in accordance with IACUC/AAALAC and other institutional ethics
guidelines.
METHOD DETAILS
Isolation of Single Cells
Mammary tissue was dissected from C57BL76 mice at the developmental
stages indicated. The number 4 mammary gland from 10 to 16-week-old
nulliparous C57BL/6 female mice was used for adult samples. Single cell
suspensions were obtained as described previously (Spike et al., 2012). All dissociation reagents
were purchased from Stem Cell Technologies (SCT), unless otherwise
specified. Freshly dissected tissues were placed directly in EpiCult-B basal
medium containing hydrocortisone. Following dissection, minced adult mammary
glands were digested for 8–12 hours at 37°C in EpiCult-B basal
medium containing Epicult-B proliferation supplement, collagenase and
hyaluronidase and hydrocortisone. Mammary organ fragments resulting from
overnight digestion were treated briefly with ammonium chloride solution
(4min on ice), then 0.25% trypsin (5min RT) followed by dispase (4min
37°C), with washes and resuspension after each treatment using
Hank’s Balanced Salt Solution (HBSS) supplemented with 2% FBS
(Chemicon). Cell clumps and debris were removed by passing the suspension
through a 40 μm nylon filter (BD Biosciences). For fetal and
postnatal tissues, 150 mammary rudiments on average were dissected on each
experimental day. Dissected fetal mammary glands were processed as adult
cells except that 90 minutes incubation in collagenase/hyaluronidase was
sufficient to obtain single cell suspensions and trypsin was found to be
unnecessary and was therefore omitted to avoid unnecessary stress on the
cells or alterations to their transcriptomes. Cells were immunostained with
antibodies to Epcam, Cd49f, for 20 min on ice where indicated.
Antibody-labeled cells were resuspended and incubated in HBSS with 2% FBS
containing DAPI for live/dead discrimination. Cell sorting was carried out
on a FACSDiva cell sorter to collect
Epcamlow-highCd49fmedium-high cells for further
analysis (Becton Dickinson).
Organoid Culture
Single cell suspension (as above) from mammary tissues at the
developmental stages indicated were sorted for Epcam expression and plated
in 4% Matrigel on a pre-congealed undiluted Matrigel bed in either
Maintenance media or Differentiation media (Spike et al., 2014) supplemented where indicated with UK5099 at
the doses indicated. Maintenance media was comprised of Dulbecco’s
modified Eagle’s medium/F12 with 5% horse serum, 10 μg/ml
insulin, 20 ng/ml epidermal growth factor, 100 ng/ml cholera toxin, 0.5
μg/ml hydrocortisone, and 10 μg/ml ciprofloxacin with 1
× B27 supplement (Invitrogen). Differentiation media was composed of
Epicult-B mouse media containing B supplement (SCT), recombinant humanepidermal growth factor, recombinant human basic fibroblast growth factor,
and heparin, as previously described (Spike
et al., 2012), with ciprofloxacin 10 μg/ml. Organoids were
fixed after 7 days of growth for immunofluorescent staining.
Immunofluorescence
Samples were fixed in 4% formalin (NBF) at 4°C overnight,
permeabilized using 0.1 % BSA, 0.2% Triton X-100, 0.05% Tween-20 in PBS,
blocked in 10% goat serum and stained with antibodies against Keratin 8,
Keratin 14 as previously described (Spike et
al., 2012) prior to staining with secondary antibodies and
imaging on a Zeiss LSM 880 with Airyscan FAST. Samples were counterstained
with DAPI.
In Situ RNA-FISH
RNA-FISH was performed using RNAScope Multiplex Flourescent V2 Kit
(Advanced Cell Diagnostics, catalog number 323110). The protocol was
followed as per the manufacturer’s recommendations with the target
retrieval boiling time for 15 min and Protease IV at 40 °C incubation
for 30 min. Slides were mounted with Slowfade Mountant+DAPI (Life
Technologies, S36964) and sealed. All images were captured within one week
of slide preparation. A bacterial gene probe was used as a negative control
as per kit instructions. Images were captured on Carl Zeiss 880 Airyscan
Super-Resolution microscope using 40X/1.2NA W objective with 1.8X digital
zoom. All pictures were digitally edited to enhance the color and contrast
levels using Zen (Carl Zeiss) and ImageJ (Fiji) software.
Transplantation
Single cell suspensions of primary mammary cells were obtained as
above and varying numbers of cells were transplanted into de-epithelialized
number 4 fat pads of recipient three-week old CB17 SCIDmice as previously
described (Spike et al., 2012).
Glands were subsequently dissected, mounted on glass slides and fixed at 8
weeks post injection and were stained with Carmine Alum to score outgrowth.
Frequency of the repopulating unit was estimated using Extreme Limiting
Dilution Analysis (ELDA).
Single Cell RNA-Sequencing
Production of cDNA Libraries
Microfluidic Assay (Fluidigm C1).
FACS-sorted, Epcam-positive mammary cells were resuspended
in 20–30ul of ice cold HBSS/FBS. A 2ul volume of the cell
suspension was manually counted between a coverslip and slide and
the remaining cells were diluted to to 250 cells/ul for loading onto
Fluidigm C-1 platform microfluidic chips with with 10–17um
capture well sizes. Loading and staining of cells with the
Invitrogen LIVE/DEAD kit (i.e., Calcein-AM and EtBr) to distinguish
viable from dead cells was carried out according to the
manufacturer’s instructions. Following loading and staining,
all capture wells were imaged in green and red fluorescent channels
and brightfield including z axis examination on a Zeiss 710 or 780
confocal microscope. Each well was then scored as containing a
single live cell (Calcein+), single dead cell (Calcein-), multiple
cells or no cells. Mean live, single cell capture efficiency was 73%
across E18-adult (a representative image is given in Figure S2). After
imaging, the chip was processed according to the
manufacturer’s instructions to produce cDNA libraries from
each well. Initial experiments employed ERCC RNA spike in controls
(dil. 1:40,000), but it was determined that they provided no benefit
over normalization according to cellular transcript abundance and
spiked in controls were omitted from select runs to maximize
sequencing of cellular transcriptomes (see analysis below, Figure S2,
and Dillies et al., 2013, and
Lin et al., 2016). cDNA
yield was determined by Quant-iT Picogreen fluorescence on a
FlexStation II (Molecular Devices). Libraries with sufficient yield
were diluted to 100pg/ul in water. Controls were also generated from
pellets of ~1000 cells in parallel in PCR tubes as
recommended in the Fluidigm C1 protocol. The capture efficiency of
E16 cells was low for unknown reasons and individual E16 cells were
therefore processed manually by first sorting single cells in 2ul
starting volumes in 96 well plates and following tube control ratios
for Smart-seq2 (Clontech) and Nextera XT reagents.
Drop-Seq Platform (1 Ox Genomics Chromium).
FACS-sorted, Epcam-positive mammary cells were resuspended
in 500ul of HBSS. Cells were centrifuged at 4°C for 5 minutes
at 1500rpm and resuspended in 32.5ul of HBSS and processed
immediately and loaded on the microfulidic chip together with
barcoded beads and other reagents as described in the 10X Chromium
Single Cell Reagent Kit V2 protocol (Cat# 120237,10X Genomics Inc).
Subsequent cell lysis, first strand cDNA synthesis and amplification
were carried out according to the instructions with cDNA
amplification set for 12 cycles. cDNA quality was measured using
TapeStation (Agilent Biosystems) after bead-based purification.
Production of Sequencing Libraries
ATAC-Seq Library Preparation.
The ATAC-seq transposition assay was performed as previously
described with minor modifications (Buenrostro et al., 2015). Two biological replicates from
each cell population were assayed to ensure reproducibility. For the
adult populations, 2×104 nuclei were subjected to
2 μL TDE1 digestion in 20 μL reaction mix, while for
the P4/fMaSC populations, 1×104 nuclei were
subjected to 1 μL TDE1 digestion in 10 μL reaction mix
(lllumina Nextera FC-121–1030). The equal Tn5:cell ratio is
crucial to ensure similar signal-to-noise ratio during downstream
analysis. The cell-Tn5 mix was incubated at 37°C for 30
minutes. qPCR was performed to determine the cycle number for 25%
library saturation. Typically, 10–14 total cycles were
performed. The library was purified with AMPure XP beads (Beckman
A63881), and then analyzed by Agilent TapeStation to ensure proper
digestion.
Microfluidic Library Preparation.
For C1 libraries, 125 pg of each diluted library was used as
input for Nextera tagmentation and barcoding using volumes
recommended by Fluidigm (https://www.fluidigm.com/productsupport/c1-support-hub).
Subsequently, single cell Nextera libraries were individually
purified using Ampure XP beads (Agencourt) and magnetic force at a
library:bead-suspension ratio of 10:9. Bead-pellets dried just to
the point of visible cracking were resuspended in Tris/0.1mM
EDTA/0.05% Tween-20 to elute libraries. Nextera libraries were
quantified by Quant-iT Picogreen fluorescence as above and fragment
sizes were determined on a 2100 bioanalyzer (Agilent Genomics)
(Figure
S1D). Libraries were combined at equal ratios into pools
of 50–100 uniquely bar-coded samples which were again
precipitated using ampure beads, evaluated for fragment size using
the bioanalyzer and quantified by Quant-iT Picogreen.
Drop-Seq Library Preparation.
Sequencing libraries were prepared as per the
manufacturer’s protocol (Cat# 120237, 10XGenomics Inc) with
Index-PCR set for 14 cycles. Qualities of the sequencing libraries
were measured using TapeStation (Agilent Biosystems) after
bead-based purification.
Sequencing
ATAC-Seq Library Sequencing.
The ATAC-seq libraries were sequenced with 50 or 125 bp
single- (P4) or paired-end (fMaSC and adult) lllumina HiSeq
2500.
Microfluidic Library Sequencing.
For C1 derived libraries, pooled library were loaded at
12–20pM on an Illumina HiSeq 2500 High Throughput Sequencing
System and sequenced using a paired-end, 100bp+ read protocols. We
obtained 160–260 million reads per sequencing lane totaling
1.4–14 million reads per sample. Sequencing quality was
assessed using FastQC software (Ref. 7) and all samples exhibited
acceptable sequence quality including base wise quality scores
> 30 over the majority of the read length.
Drop-Seq Library Sequencing.
Pooled libraries were sequenced on the Illumina HiSeq2500
Rapid Sequencing (Illumina Inc) System as per instructions provided
in the Drop-Seq protocol (Cat# 120237, lOXGenomics Inc). Sequencing
quality was assessed using FastQC software and all samples exhibited
acceptable sequence quality including base wise quality scores
> 30 over the majority of the read length.
Sequence Data Submission
C1 derived FastQ files and the filtered normalized expression
matrix (262 cells × 13355 genes) are available at the NCBI
sequence read archive https://trace.ncbi.nlm.nih.gov/Traces/sra/ under the
BioSample accession SAMN07138894. Chromium Sequencing data is archived
under GSE111113 (Bioproject PRJNA435951; SRA: RP133477).
Analysis
Reproducibility
Chromium data yielded coherent clusters of cells corresponding
to established cell types in the adult gland. For example, basal cells
from three independent isolates co-clustered by each of the approaches
we employed suggesting biological differences supersede technical
variation in the data and analysis. C1 derived single cell samples
similarly identified known cell types and samples run on duplicate
sequencing lanes showed an average r2 value of > 0.99
between replicates (Figure S1E). Select E18 samples were processed independently
from the cDNA stage onward and were run independently in a subsequent
sequencing run. The resulting expression profiles bore an average
r2 value of 0.965 when compared to their initial
profiles. Averaging (Geometric means of values +1 count each) of single
cell samples showed high correlation to samples processed as pools
(Figure
S1G). Potentially as a function of our dual pre- and post pooling
library cleanup procedure, barcoding mismatches were negligible as
judged by spill over of ERCC reads to samples lacking ERCC at
preparation, i.e., was determined to be <0.04% based on means for
counts in non-ERCC samples co-processed with ERCC containing
samples.
Mapping and Normalization
We mapped reads to a custom mouse transcriptome compatible with
RSEM and comprised of gene models from MM10 sequences and sequences for
the ERCC RNA spike-in set. Transcript counts were enumerated using RSEM.
This pipeline is presented in the accompanying “Scripts and
Command Line Procedures” document (Data S1) under the perl
script “RNASIuice.pl.” As we noted variable total counts
per cell, we applied inter-sample normalization. Since previous studies
showed that Upper Quantile normalization compares favorably to other
normalization approaches including RPKM, DESeq and TMM except for
sensitivity to highly overexpressed genes (Dillies et al., 2013; Lin et al., 2016), we used a derivative
approach, normalizing samples for read depth based on the
19th ventile of expressed transcripts (i.e., count
> 0; excluding ERCC spike-ins if present). We first determined
the sum of counts in the 19th ventile of expressed
transcripts for each sample (σj), and
then defined a standard value (α) near the lower end of
σ values in the dataset to which samples
were subsequently normalized (e.g., ~1.5 stdev below the mean
σ). We then calculated a coefficient
(W) for normalizing each sample (j) where,We then multiply each gene’s counts in each sample
(C) by the sample specific
coefficient and round the values to integers to obtain the normalized
expression value of each gene in the dataset (Expi,j):In addition to its relative computational simplicity and
avoidance of perturbations from very highly expressed genes, this
approach has the added advantage of avoiding complications from lowly
expressed genes that have been shown to contain many null values in
single cell RNA Seq data (Marinov et
al., 2014), and also avoids unwarranted assumptions about
total RNA content and complexity between samples that has been shown to
inflate Type 1 errors (Katayama et al.,
2013). Normalized values for all transcripts were
subsequently rounded to integers. We noted that this approach performed
as well or better than normalization by FPKM (which assumes equivalent
total RNA content per cell) or ERCC (which assumes equivalent cell lysis
and mRNA recovery efficiency per cell) when measuring minimization of
expression difference between technical replicates (Figure S1F) and when
examining cluster distinction between adult luminal and basal
candidates. Following normalization, we evaluated transcriptomic
complexity for each sample by determining the number of genes
represented by 5 or more counts in each sample following normalization.
As we noted that most samples from each developmental stage fell within
a complexity distribution unique to that stage, we removed from further
analysis outlier samples with complexities lower than 1.5 standard
deviations below the mean complexity for all samples from the same
developmental stage (Figure S1H). To produce a final filtered expression matrix
for use in identifying groups of related cells, we also manually removed
the majority of transcripts with alphanumeric gene names indicative of
uncharacterized transcript accession numbers and Riken expression tags
as well as mitochondrial genes with names beginning “mt-”
from further analysis as well as all transcripts that were present
(>0 counts) in fewer than five independent samples following
normalization.
Sample Graphing and Clustering
Relationships between samples, cell stages and clusters were
visualized using t-SNE and diffusion coordinates as implemented in the
tsne and Destiny packages for R, respectively. The interactive online
rendering of diffusion maps (see Spike Lab website link, http://uofuhealth.utah.edu/huntsman/labs/spike/d3.php)
was generated in javascript using the Plotly libraries (Plotly
Technologies). Alternative approaches (i.e., clustering by Spearman rank
correlation) showed general agreement with our diffusion mapping
(Figures 3 and S4C). Plots were generated
with ‘plot’ or ‘plot3d’ in the rgl library
in R, respectively specifying fill transparency or point radius based on
scaled count values.We clustered samples (i.e., single cell transcriptomes) into
related cell types using the 1000 genes (Chromium) or 1500 genes (C1)
with the highest local variance across the dataset. Local variance for
each gene in the filtered matrix was defined as the ratio of the
gene-specific squared coefficient of variation (CV2) of
normalized raw counts to the gene neighborhood CV2 value. The
neighborhood CV2 value was calculated as the geometric mean
of CV2 values for the neighboring 100 genes in a list ordered
by mean expression values from high to low (i.e., 50 genes in each
direction) (Figure
S1J). We then applied non-negative matrix factorization (NMF)
with minimum value subtraction and random number seed generation as
implemented in R (Chromium data) or the MeV suite (C1 data) on log2
transformed data for these 1000 or 1500 genes to cluster samples. We
tested multiple ranks (i.e., numbers of clusters, 2–15) using a
cost convergence cutoff of 1.0 for 10 runs of up to 1000 iterations per
rank and divergence update rules and cost measurements (Lee and Seung, 1999). Cophenetic correlation
coefficients, change in Residual sum of squares and visual inspection of
cluster position in DC and t-SNE graphs was used to identify stable
clusters that describe putative cell type changes in the data (Brunet et al., 2004).
SCENIC
We used the standard SCENIC pipeline (Aibar et al., 2017) with depth normalized
log2 transformed values from Chromium sequencing as input (Table S1). The
SCENIC application (version 0.1.7) uses an older mm9 genomic reference
and did not account for 6,516 genes. Per SCENIC recommendations, we
refiltered genes to exclude those absent (counts = 0) in > 99% of
cells. SCENIC identified 10206 genes that were pruned to 324 regulons
based on cis regulatory motif analysis and a threshold
of at least 10 co-regulated genes per regulon. Cell activity scores for
these regulons were layed onto the pseudotemporal cell ordering from
diffusion maps and were clustered based on covariance.
ATAC-Seq Analysis
ATAC-seq analysis was performed as described in Dravis et al. (2018). In brief, after quality
check with FastQC, sequencing reads were mapped to the mouse genome
(mm9) with Bowtie (Langmead et al.,
2009). Low quality and duplicated reads were removed and peak
calling was done with MACS2. The signal correlations between the
biological replicates were checked to ensure reproducibility (Pearson r
> 0.9) before the replicates were merged with samtools (Li et al., 2009). To normalize the
ATAC-seq signal between different samples, the genome-wide signal was
binned into 100 bp and quantile normalized with preprocessCore in R. The
average signal profile at all genes was then checked for each sample to
ensure similar signal-to-noise level. Bedgraph files generated were
converted into BigWig format and visualized on UCSC genome browser
(https://genome.ucsc.edu/). Signal
profiling, correlation analysis and clustering were performed using
deepTools (Ramirez et al., 2014).
Functional annotation of peaks and peak-gene association were done with
GREAT using the default “basal plus extension” parameter
(McLean et al 2010). To
isolate UARs and URRs, pairwise differential peaks (FC > 2 and
FDR < 1×10−30) between each cell type
were first determined using SICER-df (Zang et al 2009), and enrichment score (ES) for each peak
calculated as ES = FC × -log(FDR). Cell type specific regions
were then isolated by cross comparison of peaks using bedTools intersect
(Quinlan and Hall, 2010).
Afterward, total enrichment score (TES) was calculated by adding up cell
type specific ES. For example, the TES of fMaSC = ESfMasc-Ba
+ ESmabsc-Lp + ESfMaSC-ML Thus, cell-type
specificity of UARs and URRs can be ranked by their TES.
Identification of Differentially Expressed Genes
Gene expression differences between NMF designated cell types
(i.e., NMF clusters @ rank = 7(Chromium) or 6(C1), and subdivisions of
adult cells and clusters 5 and 6 @ rank = 2 and 3 (C1)) were identified
using two-class unpaired rank products analysis as implemented in the
TM4 suite MeV software (Saeed et al.,
2003) with 100 permutations per comparison and the proportion
of false significant genes controlled so as not to exceed 0.05. Gene
expression values for the 13355 genes in the filtered expression data
matrix were compared between each NMF designated group of cells and each
other group of NMF designated cells to generate differentially expressed
gene lists for each pairwise group-to-group comparison. Cell type
specific gene signatures for the 6 NMF-derived clusters in C1 data were
determined from the overlap of more highly expressed genes in their five
pairwise rank products analyses.
GO Enrichment
Cell type specific gene signatures and other gene lists
evaluated in the manuscript were assessed for their content related to
specific biological functions using curated GO databases. GO enrichment
was assessed and graphed using the BiNGO plugin for Cytsoscape 3.3.0
(Shannon et al., 2003; Maere et al.. 2005). We assessed
the overlap of signature gene lists with biological process ontologies
using a hypergeometric test statistic and Benjamini & Hochberg False
Discovery Rate correction (FDR significance level = 0.05). Enrichments
were corroborated with PANTHER (pantherdb.org) and DAVID (https://david.ncifcrf.gov).
Comparative Transcriptomics
Humantumor data was collected as previously described with
classification based on PAM50, consensus by a pathology committee, or
mode of sample acquisition (Parker et.
al., 2009; Cancer Genome
Atlas, 2012; Varley et al.,
2014; Cirielloet al.,
2015; Forero-Torres et al.,
2015; Foreroetal.,
2016). Differential signature expression was determined by
ANOVA or a Kolmogorov-Smirnovtest, as indicated, on the median centered
or normalized probe and transcript expression values with normalization
as in Pfefferle et al. (2013), or
the sum of scaled FPKM values.
Other Graphing
Additional graphics generation used R/Quartz (including plot,
rgl and plot3d libraries), Excel, Powerpoint and Adobe Photoshop.
QUANTIFICATION AND STATISTICAL ANALYSIS
Details of the statistical tests used in this manuscript and the number
of replicates (n) are presented in the figures and figure legends, are
reiterated in the text, and described in detail in the methods sections above.
They are also summarized below:Non-negative matrix factorization, as implemented in the TM4-MeV
Suite, employed 10 runs of up to 1000 runs each with a cost convergence
cutoff set to 1.0 with a check frequency of 40. Update rules and cost
measurements were based on divergence. Cophenetic correlation
coefficients were based on Euclidean distance. Chromium data was
processed using every fifth sample beginning with sample 1 or 2 or 3 or
4 and resulting statistical metrics were averaged for each rank across
these runs to select a suitable NMF rank (i.e., cell cluster number) for
further modeling.Rank products, as implemented TM4-MeV suite, used a two class
unpaired test statistic with 100 permutations and false discovery
control set on the proportion of false significant genes not exceeding
0.05.The correlation in expression of gene signatures across samples
were calculated in Excel using the Pearson correlation.GO enrichment, as implemented in the BinGO app for Cytoscape,
used “Biological Process” definitions and a hypergeometric
test and Benjamini-Hochberg false discovery rate correction at 0.05.Differentially represented ATAC-Seq peaks were called using
SICER-df with FC >2 and FDR <
1×10−30 and associated enrichment scores
were calculated as ES = FC × -log(FDR).Quantification of organoid cultures was conducted manually on 5
replicate wells per treatment. Replicates and significant differences
were determined using a two-tailed Student’s t test.Extreme Limiting Dilution Analysis for transplants as
implemented at the URL: http://bioinf.wehi.edu.au/software/elda/, were
calculated to include 95th percentile confidence intervals
and likelihood ratio tests to determine differential stem cell
content.Differences in signature expression among TCGA archived breast
cancers that were organized by intrinsic subtype was determined by ANOVA
with p values indicated in the corresponding figures.Differences in signature expression in primary TNBC versus TNBC
metastases was determined with a Kolmogorov-Smirnoff test with p values
indicated in the corresponding figures.
DATA AND SOFTWARE AVAILABILITY
All software is commercially available, cited to previous publications
or is included in the accompanying “ScriptsandCommandline” text
document.Single cell RNA-sequencing data files are available at the NCBI Sequence
Read Archive (SRA) (https://wwvii.ncbi.nlm.nih.gov/sra/) under BioSample accession
SAMN07138894 and the NCBI Gene Expression Omnibus (GEO) under accession
GSE111113 (Bioproject: PRJNA435951 ; SRA:RP133477).
Authors: Benjamin T Spike; Dannielle D Engle; Jennifer C Lin; Samantha K Cheung; Justin La; Geoffrey M Wahl Journal: Cell Stem Cell Date: 2012-02-03 Impact factor: 24.633
Authors: Giovanni Ciriello; Michael L Gatza; Andrew H Beck; Matthew D Wilkerson; Suhn K Rhie; Alessandro Pastore; Hailei Zhang; Michael McLellan; Christina Yau; Cyriac Kandoth; Reanne Bowlby; Hui Shen; Sikander Hayat; Robert Fieldhouse; Susan C Lester; Gary M K Tse; Rachel E Factor; Laura C Collins; Kimberly H Allison; Yunn-Yi Chen; Kristin Jensen; Nicole B Johnson; Steffi Oesterreich; Gordon B Mills; Andrew D Cherniack; Gordon Robertson; Christopher Benz; Chris Sander; Peter W Laird; Katherine A Hoadley; Tari A King; Charles M Perou Journal: Cell Date: 2015-10-08 Impact factor: 41.582
Authors: Aline Wuidart; Marielle Ousset; Steffen Rulands; Benjamin D Simons; Alexandra Van Keymeulen; Cédric Blanpain Journal: Genes Dev Date: 2016-06-09 Impact factor: 11.361
Authors: René Villadsen; Agla J Fridriksdottir; Lone Rønnov-Jessen; Thorarinn Gudjonsson; Fritz Rank; Mark A LaBarge; Mina J Bissell; Ole W Petersen Journal: J Cell Biol Date: 2007-04-09 Impact factor: 10.539
Authors: Yong Zhang; Tao Liu; Clifford A Meyer; Jérôme Eeckhoute; David S Johnson; Bradley E Bernstein; Chad Nusbaum; Richard M Myers; Myles Brown; Wei Li; X Shirley Liu Journal: Genome Biol Date: 2008-09-17 Impact factor: 13.583
Authors: Hua Wang; Dongxi Xiang; Ben Liu; Aina He; Helena J Randle; Kelvin Xi Zhang; Anushka Dongre; Norman Sachs; Allison P Clark; Luwei Tao; Qing Chen; Vladimir V Botchkarev; Ying Xie; Ning Dai; Hans Clevers; Zhe Li; David M Livingston Journal: Cell Date: 2019-06-27 Impact factor: 41.582
Authors: Abbie S Ireland; Alexi M Micinski; David W Kastner; Bingqian Guo; Sarah J Wait; Kyle B Spainhower; Christopher C Conley; Opal S Chen; Matthew R Guthrie; Danny Soltero; Yi Qiao; Xiaomeng Huang; Szabolcs Tarapcsák; Siddhartha Devarakonda; Milind D Chalishazar; Jason Gertz; Justin C Moser; Gabor Marth; Sonam Puri; Benjamin L Witt; Benjamin T Spike; Trudy G Oliver Journal: Cancer Cell Date: 2020-05-30 Impact factor: 31.743
Authors: Jayne F Martin Carli; G Devon Trahan; Kenneth L Jones; Nicole Hirsch; Kristy P Rolloff; Emily Z Dunn; Jacob E Friedman; Linda A Barbour; Teri L Hernandez; Paul S MacLean; Jenifer Monks; James L McManaman; Michael C Rudolph Journal: J Mammary Gland Biol Neoplasia Date: 2020-11-20 Impact factor: 2.673
Authors: Sirinapa Sribenja; Orla Maguire; Kristopher Attwood; Matthew F Buas; Julie R Palmer; Christine B Ambrosone; Michael J Higgins Journal: Am J Cancer Res Date: 2021-06-15 Impact factor: 6.166