| Literature DB >> 30899025 |
Enkelejda Miho1,2,3, Rok Roškar4, Victor Greiff5, Sai T Reddy6.
Abstract
The architecture of mouse and human antibody repertoires is defined by the sequence similarity networks of the clones that compose them. The major principles that define the architecture of antibody repertoires have remained largely unknown. Here, we establish a high-performance computing platform to construct large-scale networks from comprehensive human and murine antibody repertoire sequencing datasets (>100,000 unique sequences). Leveraging a network-based statistical framework, we identify three fundamental principles of antibody repertoire architecture: reproducibility, robustness and redundancy. Antibody repertoire networks are highly reproducible across individuals despite high antibody sequence dissimilarity. The architecture of antibody repertoires is robust to the removal of up to 50-90% of randomly selected clones, but fragile to the removal of public clones shared among individuals. Finally, repertoire architecture is intrinsically redundant. Our analysis provides guidelines for the large-scale network analysis of immune repertoires and may be used in the future to define disease-associated and synthetic repertoires.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30899025 PMCID: PMC6428871 DOI: 10.1038/s41467-019-09278-8
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Large-scale network analysis reveals the architecture of antibody repertoires and its three fundamental principles. a Large-scale networks (>500,000 nodes) of antibody repertoires were constructed from the Levenshtein distance (LD, edit string distance) matrix of CDR3 clonal sequences (a.a) using a custom high-performance computing platform (see Methods). Networks represent antibody repertoires of similar CDR3 nodes connected by edges when amino acid CDR3 sequences differ by a predetermined LD. All clones of a repertoire connected at a given LD form a similarity layer (LDn). b Deconvolution of the complexity of antibody repertoire architecture was performed by quantifying (i) its reproducibility through global and clonal (local) properties or features, (ii) robustness to clonal removal and (iii) redundancy across its similarity layers in the sequence space (Supplementary Fig. 1)
Fig. 2Global and clonal properties of antibody repertoire networks are reproducible. a Network size of antibody repertoires. The y-axis indicates the absolute number count of CDR3 nodes, CDR3 edges (similarities) and CDR3 clones in the largest component. The mean percentage of the CDR3s belonging to the largest component by B-cell development stage is shown on top of the dark blue bar. b Global properties, diameter and assortativity coefficient are shown for pre-B cells (pBC), naïve B cells (nBC) and plasma cells (PC). c The mean value of the coefficient of variation for clonal properties in pBC, nBC and PC repertoires. Wilcoxon test, ppBC,nBC/PC < 0.05 (see Methods). d Percentage of clones connected to at least one other clone in the repertoire at LD1, LD≤2, …, LD≤12 in pre-B cells, naïve B cells, plasma cells. e The power-law (orange), exponential (red) and Poisson (gray) distributions were fit to the cumulative degree distributions of naïve B cell and plasma cell (unimmunized) repertoires of a representative sample for similarity layers LD1,3,7 (log-log scale). Representative clusters are shown for LD1. f Percentage of CDR3 clones (mean ± s.e.m) that compose the maximal core. Subgraph of the maximal k-core (red), and k-1 (black), k-2 (dark gray) and k-3 (light gray) cores in a representative mouse pBC sample. g Percentage overlap of CDR3 germline V-genes in the maximal core of nBC repertoires (n = 5 mice and data sets for Unimm (unimmunized), OVA, NP-HEL, n = 4 mice sets for HBsAg, mean ± s.e.m). h Normalized neighborhood size for orders n = {1–10, 15, 20, 30, 40, 50} across CDR3 clones (similarity layer LD1). For a, b, d, barplots show mean ± s.e.m; for a–e, each B-cell stage n = 19 mice. Source data are provided as a Source Data file
Fig. 3The architecture of antibody repertoires is robust and redundant. a CDR3 clones of an exemplary naïve B-cell repertoire (OVA-immunized mouse) have been ordered from increasing to decreasing frequency (CDR3 rank). Public clones are color-coded in red. b Bootstrapped p-values of the power-law fit are shown for complete antibody repertoires and after removing public clones. Power law is a good fit to degree distributions for p-values above the dashed red line (p-value = 0.1, Wilcoxon test). Examples of exponential (red) and power-law (gray) networks are shown on the top panel. c CDR3 clones were removed randomly at 10%, 50%, and 90% from each original repertoire (20 times) and the power-law distribution was fit to the cumulative degree distributions of the remaining CDR3 clones. A p-value = 0.1 (Wilcoxon test) is indicated as a red dashed line. In PC samples a fit was not feasible after removal of 90% of CDR3 clones (NA). d Heatmaps indicate the mean prediction accuracy (Q, leave-one-out cross-validated R) of similarity layer LD1 versus similarity layers LD2–12. The scatterplot shows Q for LD1 vs. LD2 for each CDR3 clone. e Prediction accuracy (Q) for LD1 vs. LD2 and LD3. For b, c, e, barplots show mean ± s.e.m. Source data are provided as a Source Data file