| Literature DB >> 35680885 |
Alberto Valdeolivas1, Aurélien Dugourd2, Daniel Dimitrov2, Dénes Türei2, Martin Garrido-Rodriguez2, Paul L Burmedi2, James S Nagai3,4, Charlotte Boys2, Ricardo O Ramirez Flores2, Hyojin Kim2, Bence Szalai5, Ivan G Costa3,4, Julio Saez-Rodriguez6.
Abstract
The growing availability of single-cell data, especially transcriptomics, has sparked an increased interest in the inference of cell-cell communication. Many computational tools were developed for this purpose. Each of them consists of a resource of intercellular interactions prior knowledge and a method to predict potential cell-cell communication events. Yet the impact of the choice of resource and method on the resulting predictions is largely unknown. To shed light on this, we systematically compare 16 cell-cell communication inference resources and 7 methods, plus the consensus between the methods' predictions. Among the resources, we find few unique interactions, a varying degree of overlap, and an uneven coverage of specific pathways and tissue-enriched proteins. We then examine all possible combinations of methods and resources and show that both strongly influence the predicted intercellular interactions. Finally, we assess the agreement of cell-cell communication methods with spatial colocalisation, cytokine activities, and receptor protein abundance and find that predictions are generally coherent with those data modalities. To facilitate the use of the methods and resources described in this work, we provide LIANA, a LIgand-receptor ANalysis frAmework as an open-source interface to all the resources and methods.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35680885 PMCID: PMC9184522 DOI: 10.1038/s41467-022-30755-0
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Tools included in the framework.
| Tool/Method | Resource | Methods’ scoring systems |
|---|---|---|
| CellChatDB | (1) (2) | |
| CellPhoneDB | (1) (2) | |
| Ramilowski | (1) (2) | |
| - | (1) | |
| - | (1) | |
| ConnectomeDB | (1) (2) | |
| LRdb | (1) | |
| - | (1) |
Each method considers expression at the cell cluster level, and all of the scoring systems presented here use the expression of transmitters and receiver genes in the source and target cells, respectively. In addition to the seven methods, we included their consensus.
In bold are the names of cell-cell communication inference methods and their scoring functions.
Dagger (†): Explicitly incorporates communicating cell-pair specificity in interaction predictions
Hashtag (#): CellPhoneDB, CellChat, and SingleCellSignalR provide explicit thresholds to control for false positive interaction predictions. In the case of the former two, these are permutation-based p-values, whereas SingleCellSignalR’s LRscore has a suggested threshold of 0.5.
Methods that additionally infer intracellular processes, such as NicheNet[19], Cytotalk[22], and SoptSC[20] are not directly comparable but instead provide complementary analyses.
Fig. 1LIANA—a LIgand-receptor ANalysis frAmework.
LIANA takes any annotated single-cell RNA (scRNA) dataset as input and establishes a common interface to all the resources and methods in any combination. LIANA also provides a consensus ranking for the method’s predictions.
Fig. 2Dependencies and overlap between CCC resources.
The lineages of CCC interaction database knowledge. General biological knowledge databases (blue), CCC-dedicated resources (magenta), manual literature curation effort (yellow), additional resources included in iTALK (cyan), and OmniPath (green). Arrows show the data transfers between resources. The yus symbol (Ѫ) indicates the manual-curation of resources, defined by explicitly mentioning that these resources are ‘manually’ or ‘expert’ curated. The asterisk () indicates that the resource was included in the analyses presented here.
Fig. 3Cell-cell communication resources—uniqueness and overlap.
A Shared and unique Interactions, Receivers and Transmitters for each resource. B Similarity between the different resources based on the interactions (Jaccard Index). Source data are provided as a Source Data file.
Fig. 4Representation of functional categories in CCC resources.
CCC resources distributions in terms of number of interactions (A) and relative abundance (B) matched to the SignaLink database. Relative abundance of interactions categorised by (C) CancerSEA’s cancer-related gene sets, and (D) organ-enriched proteins from the Human Protein Atlas (HPA). Fisher’s exact test was used to estimate the differentially-represented categories. Differentially represented (absolute(log2(Odds ratio)) >1) categories were marked according to FDR-corrected p-values =< 0.05 (diamond, ♢), 0.01 (triangle, △), and 0.001 (8-pointed asterisk; ❋). Source data are provided as a Source Data file.
Fig. 5Overlap of predictions using any combination of CCC methods and resources.
Overlap (Jaccard index) in the 1000 highest ranked (A) when using the same Resource with different Methods (Blue; n = 7) and (B) when using the same Method with different Resources (Red; n = 16). Boxplots represent the median pairwise jaccard index with hinges showing the first and third quartiles and whiskers extending 1.5 above and below the interquartile range. The dashed lines represent the median when using different resources (red) and methods (blue); the lines overlap for the CMBCs dataset. Source data are provided as a Source Data file.
Fig. 6Agreement of CCC predictions with other modalities.
Odds ratios of (A) active cytokines and (B) colocalized cell types among the highest ranked interaction predictions, across a ranked range between 100 and 10,000. Odds ratios representing the association of preferentially ranked CCC predictions and (A) cytokine activities and (B) spatial adjacencies were calculated using Fisher’s exact test. Asterisk (*): Consensus represents the aggregated ranks of all interactions predicted by all the methods. Dashed horizontal line is the baseline represented by an odds ratio of 1. The dashed vertical lines represent the truncated ranges of CellChat, CellPhoneDB, and LogFC Mean, arising from their relatively stricter preprocessing steps. Source data are provided as a Source Data file.