Introduction

Multicellular life relies on the coordination of cellular activities, which depend on cell–cell interactions (CCIs) across an organism’s diverse cell types and tissues1,2,3. Thus, studies on cellular functions increasingly require consideration of the community context of each cell4. CCIs leverage diverse molecules, including ions, metabolites, integrins, receptors, junction proteins, structural proteins, ligands and secreted proteins of the extracellular matrix. Some molecules support structural CCIs (for example, cell adhesion molecules), whereas ligands such as hormones, growth factors, chemokines, cytokines and neurotransmitters mediate cell–cell communication (CCC) (Fig. 1a). The signalling events behind CCC are often mediated by interactions of various types of protein, encompassing ligand–receptor, receptor–receptor and extracellular matrix–receptor interactions. Receiver cells trigger downstream signalling through cognate receptors, generally culminating in altered transcription factor activity and gene expression. These cells with altered expression further interact with their microenvironment. To understand the role of each cell within its local community, one must identify the protein messages passed between cells; measuring expressed messenger molecules and their associated pathways is fundamental to understanding the directionality, magnitude and biological relevance of CCC.

Fig. 1: Types and applications of cell–cell interactions and communication.
figure 1

a | ‘Autocrine signalling’ refers to intracellular communication whereby cells secrete ligands that are used to induce a cellular response through cognate receptors for those molecules expressed on the same cell. Paracrine cell–cell communication does not require cell–cell contact, rather depending on the diffusion of signalling molecules from one cell to another after secretion. Juxtacrine, that is, contact-dependent, cell–cell communication relies on gap junctions or other structures such as membrane nanotubes to pass signalling molecules directly between cells, without secretion into the extracellular space. Endocrine cell–cell communication represents intercellular communication whereby signalling molecules are secreted and travel long distances through extracellular fluids such as the blood plasma; typical mediators of this communication are hormones. b | Overview of the main applications of cell–cell interaction methods: cell development, tissue and organ homeostasis, and immune interactions in disease (for more details on each study type, see Supplementary Table 1).

Direct measurement of proteins mediating CCC requires specialized biochemical assays and extensive domain knowledge; moreover, these proteins cannot always be studied in the native microenvironment. Traditional assays of the underlying protein–protein interactions (PPIs) include yeast two-hybrid screening, co-immunoprecipitation, proximity labelling proteomics, fluorescence resonance energy transfer imaging and X-ray crystallography5,6. These techniques have identified many interactions between proteins that are secreted or displayed extracellularly to mediate intercellular communication. Proteomics and transcriptomics can further reinforce such studies as evidence of expression supports the presence of PPIs. This approach has been applied to, for instance, the analysis of communication between 144 human primary cell types, which provided insights into pairs of cells that are more likely to interact and the specific pathways they use to communicate7. While proteomic technologies are preferable for these analyses owing to the direct measurement of protein abundances, RNA sequencing (RNA-seq) data sets are more numerous, easier to access and straightforward to analyse. They can also be generated from bulk samples8, microdissected specimens9 or single-cell suspensions10 and enable studies of CCC at different resolutions, whereas proteomics at single-cell resolution is a technology still under development11. Single-cell RNA-seq has benefits over bulk analysis, chiefly in quantifying expression in rare cell types and in identifying the cell type of origin of proteins mediating CCIs12,13. Results from transcriptomics must be cautiously considered and validated to avoid misleading hypotheses; however, the ubiquity and ease of analysis have enabled many recent studies to infer CCC from gene expression, generating testable hypotheses across diverse disciplines. In particular, the coordinated gene expression of ligands and receptors can be used to infer intercellular communication.

Here, we start by providing an overview of the range of fields that RNA-based CCI analyses have been applied to, illustrating the types of insight that can be gleaned. We then discuss the computational strategies adopted in those studies, detailing the PPI databases and mathematical models commonly used to decipher CCC. Additionally, we introduce the computational tools that facilitate these analyses, describing their main features as well as their strengths and weaknesses. Finally, we review approaches to validate CCI-derived results and discuss remaining challenges and future directions in the field.

Insights from RNA-based CCI analyses

The study of intercellular interactions has greatly accelerated as transcriptomics, in particular bulk and single-cell RNA-seq, has become commonplace. These approaches use transcriptional profiling to decipher CCCs at any stage of development and in any multicellular community. Many studies focus on signals mediating cellular differentiation, interactions of cell types within tissues and organs, and immune responses (Fig. 1b; Table 1; see Supplementary Table 1 for more details). Here, we review these studies and illustrate the types of insights gained from analysing CCC.

Table 1 Illustrative studies and their strategies for deciphering cell–cell interactions and communication

Interactions drive cellular differentiation and organ development

Cellular differentiation depends on temporally and locally precise cell communication, so inspecting intercellular communication has increased our understanding of stem cell fates and revealed ligand–receptor interactions that initiate self-renewal and differentiation14,15,16,17,18,19,20. For example, a CCC network of haematopoietic cells built using ligand–receptor pairs showed that fate decisions are regulated through precise coordination of an antagonistic feedback circuit involving megakaryocyte-derived stimulatory factors and monocyte-derived inhibitory factors14. Another analysis of CCC networks interrogating how differentiated cells influence haematopoietic stem cell fate revealed that ligand production is cell type specific, whereby some cells can produce signals with the same function, whereas receptors are less specific15. Given the promiscuity of receptors, physical compartmentalization of cells is key to limit ligand signalling and confer specificity to stem cell fates.

Tissue and organ development also depends on signals that progenitor cells send and receive21. The analysis of brain CCC showed crosstalk involved in neurogenesis and identified novel mediators16,22, such as apolipoprotein E (APOE), a protein associated with Alzheimer disease. CCC analysis also elucidated how erythroblasts interact with macrophages during haematopoiesis in the fetal liver23 and was applied to liver organoid development to investigate how multilineage communication shapes the differentiation of hepatic cells24. Following similar principles, other studies have used CCC analyses to understand differentiation during epidermal regeneration25,26,27, development of the interfollicular epidermis28 and formation of the maternal–fetal interface in early pregnancy29,30.

Cell-type communication in tissue and organ homeostasis

Expression profiling of different cell types in adult tissues and organs has shown how intercellular communication contributes to organ function. To date, this approach has been applied to brain31,32, heart33,34, kidney35,36,37, liver38,39, lungs40,41,42,43, placenta29,30,44, retina45 and visual cortex46. Remarkably, this approach revealed new roles of cells within a tissue41 and helped explain how ageing32,47, diseases34,35,36,48,49,50,51, infections52 and injuries31,40,43 shape multicellular organization. Intercellular crosstalk has been investigated in healthy and diseased liver50, kidneys during homeostasis and rejected kidney transplants35,36, heart during failure and recovery conditions34, and healthy and asthmatic lungs51. RNA-based strategies for studying CCIs and CCC have helped elucidate, for instance, how cells communicate during ageing of the mouse brain. This revealed that CST3 and CXCL12 are mediators that differentiate intercellular interactions in young and old brains and may modulate ageing-related processes32. Thus, within-tissue CCC studies continue to deepen our understanding, for basic and therapeutic purposes, of how cellular communities work.

During the study of CCC, considering spatial context clarifies relationships between cells across tissues and organs. Several studies have incorporated imaging to spatially map cells and integrated transcriptomics to measure CCC. For example, researchers interrogated the communication and response of lung cells to tissue injury40,43; spatial information aided the identification of a new population of lung endothelial cells (Car4 high) and elucidated how these cells communicate with neighbouring alveolar type 1 cells through VEGF signalling given both their proximity and their expression of the cognate genes43. Spatial maps have also helped investigate the communication of T cells during their development in the human thymus13. As this process is spatially coordinated, knowing the cellular localizations was crucial to understand, for example, interactions between XCR1+ dendritic cells and T cells with high expression of XCL1, which are important to recruit dendritic cells into the medulla of the thymus. Yet another study inferred the 3D organization of bone marrow — instead of using tissue imaging to map the cells — ultimately identifying signalling between immune and non-immune cells53. These examples show how cell localization can help elucidate interactions between spatially proximal regions.

Immune interactions in disease

The immune system receives signals from multiple tissues, but only specific signals allow it to coordinate healthy immune responses. For instance, CCL2- and CX3CL1-mediated communication coordinate the recruitment and positioning of immune cells, as determined from single-cell transcriptomes37,39. Specifically, these CCCs were associated with the recruitment of monocytes that later became liver-resident macrophages39 and the positioning of mononuclear phagocytes in kidney37, which are crucial processes to combat ascending uropathogenic Escherichia coli infection. CCC is also involved in the response to viral infections52,54,55,56. Studies of respiratory diseases investigated the crosstalk between lung and T cells in Sendai virus-infected mice55 or CCC associated with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection52,56, where interactions between immune and epithelial cells correlated with COVID-19 severity. CCC-based studies have also provided more general insights. For example, they helped build a social network of immune cells by identifying communication pathways between immune cells57. Furthermore, they revealed the roles of structural cells in immune responses by elucidating how fibroblasts, endothelial cells and epithelial cells are primed for organ-specific immune gene activation through upregulation of ligands and receptors, including β2M, CD74, CXCL10, VCAM1 and TNFRSF1A58.

Tumours and their surrounding microenvironments are complex communities of cells that modify local immune cell functions. Studying CCC within these communities can reveal how cells communicate in these ecosystems54,59,60,61,62,63,64 and help guide the development of effective cancer immunotherapies just as the inhibition of CCC through PD1 and PDL1 has revolutionized the field65,66. CCC analyses have also elucidated crosstalk between tumour and stroma67,68 and communication pathways used by tumours69,70,71. Several studies have developed statistical models to connect inferred CCC mechanisms to cancer phenotypes. One case observed clear correlations between the activity of specific ligand–receptor interactions and the degree of regulatory T cell infiltration and tumour growth60. In another study, active ligand–receptor pairs were associated with invasiveness and proliferation of malignant cells under a partial epithelial–mesenchymal transition programme in patients with head and neck squamous cell carcinoma59. Moreover, the expression levels of key mediators of CCC were used as inputs for training a decision tree to predict prognosis for patients with glioma71; this model classified patients into high-risk and low-risk groups, defined on the basis of the difference in patient survival time. Thus, studying CCC within the tumour microenvironment provides opportunities to identify druggable pathways and develop new cancer therapeutics68.

Deciphering CCC

The aforementioned studies provide a glimpse of the insights attainable when studying CCC. While the methods and tools that these studies used have in common that they infer CCC from gene expression (Fig. 2a), a diverse range of strategies can be applied (Fig. 2b; Table 1). For simplicity, we refer to them as methods for studying CCC, but these strategies can decipher any type of gene product-based CCIs, including proteins that participate in structural interactions between cells. Furthermore, although we focus on mammalian CCIs, the approaches apply to any prokaryotic or eukaryotic cells with a characterized interactome (for example, Drosophila melanogaster72,73 and Caenorhabditis elegans74,75).

Fig. 2: Analysis workflow for inferring cell–cell interactions and communication from gene expression.
figure 2

a | Samples or cells are analysed by transcriptomics to measure the expression of genes (step 1). Then the data generated are preprocessed to build a gene expression matrix, which contains the transcript levels of each gene across different samples or cells (step 2). A list of interacting proteins that are involved in intercellular communication is generated or obtained from other sources (step 3), often including interactions between secreted and membrane-bound proteins (commonly ligands and receptors, respectively). Only the genes associated with the interacting proteins are held in the gene expression matrix (step 4). Their expression levels are used as inputs to compute a communication score for each ligand–receptor pair using a scoring function (function f(L, R), where L and R are the expression values of the ligand and the receptor, respectively). These communication scores may be aggregated to compute an overall state of interaction between the respective samples or cells using an aggregation function (function g(Cell 1, Cell 2), where Cell 1 and Cell 2 are all communication scores of those cells or corresponding samples) (step 5). Finally, communication and aggregated scores can be represented by, for instance, Circos plots and network visualizations to facilitate the interpretation of the results (step 6). b | Main scoring functions of communication pathways based on the expression of their components. Recommended data to use with these functions and the type of their resulting communication score are indicated.

Building from PPIs

Inferring CCC from transcriptomics relies on gene co-expression, whereby one gene in a given pair comes from one interacting cell and the other gene comes from the second interacting cell. Several studies focused on intercellular signalling using co-expression of all genes or specific cell markers31,76, the similarity between expression profiles77 or the properties of regulatory networks78. However, most studies rely on literature-curated lists of interacting proteins, which facilitates the biological interpretation of results (Fig. 2a). Although several studies have used interactions between any class of cell-surface protein and secreted protein32,53, the predominant class of interactions used for studying CCC are known ligands and their cognate receptors.

By focusing on ligands and receptors, an early study investigated autocrine signalling loops in cancer69. The authors analysed the correlation between ligand expression and cognate receptor expression. They first created a literature-curated list of ligand–receptor pairs (Database of Ligand-Receptor Partners), which was subsequently integrated with microarray-based expression data to analyse autocrine signalling in cancer cells. At the time, the Database of Ligand-Receptor Partners consisted of 451 interactions in humans; since then, many other databases have catalogued substantially more interacting ligand–receptor pairs, enabling more comprehensive testing of communication processes between cells.

Many studies have enumerated known ligand–receptor pairs using different approaches (Box 1). To facilitate further use and comparison, we collated publicly available lists into a single ligand–receptor pair repository. One extensively used database7 contains ~2,400 human ligand–receptor pairs, obtained from multiple databases and literature curation. Similar approaches allowed researchers to increase the number of known ligand–receptor pairs and to build databases for other organisms (Supplementary Table 1). However, integrating multiple sources of data is challenging and requires reconciliation of the different ways ligand–receptor pair confidence was assessed or how orthologues were determined. Furthermore, few predicted PPIs in databases have been validated, raising the concern of false positives.

While early efforts increased the number of reported ligand–receptor pairs, many lacked information about protein complexes involving multiple subunits. This scenario greatly increases false-positive CCC predictions79. Some proteins, such as transforming growth factor-β receptors80 and cytokine receptors81, require multisubunit assembly for function. A lack of expression of any subunit blocks ligand–receptor interactions and the resulting communication82. Thus, more recent computational tools such as CellPhoneDB30,83, CellChat79 and ICELLNET84 include multimeric proteins and interactions between complexes of both ligands and receptors. Accounting for subunit co-expression better represents functional ligand–receptor interactions. For example, CellChat79 includes ~2,000 ligand–receptor interactions, ~48% of which represent interactions of heterodimers, a substantial increase from the ~900 ligand–receptor pairs in CellPhoneDB.

Additional efforts have incorporated information beyond ligand–receptor pairs to reveal other aspects of CCC, such as the interchange of metabolites or activation of intracellular signalling. For example, metabolite secretion data have been included in communication studies by considering the expression of the producing enzymes85. In this regard, the production of specific metabolites can be inferred from transcriptomic data86,87; however, metabolite concentrations cannot be directly predicted, so concentration-dependent interactions cannot be assessed. Among other efforts, information about downstream signalling gene products and gene regulatory networks can be included25,54 by weighing the potential of using a communication pathway given the intracellular effect on a receiver cell. For this, genes in a downstream pathway to each receptor are obtained from signalling and regulatory networks; then their expression provides additional inputs for scoring the respective ligand–receptor pairs. However, more extensive information on ligand–receptor pairs is required to include downstream genes. A limitation of this information is that rules of gene regulation are not considered, leading to potential false positives and false negatives.

Adding information can be laborious, requiring careful curation of mediator molecules and development of algorithms that incorporate these data. Furthermore, inference methods based on PPI databases are sensitive to the quality of the underlying databases. Nevertheless, PPIs, especially ligand–receptor pairs, have been central to decipher CCC in all strategies.

Which communication pathways are cells using?

To elucidate which biological processes are used by interacting cells to communicate, a score is usually estimated for each pair of interacting proteins (Box 1) using the expression of their cognate genes as input to a scoring function (Fig. 2a). Importantly, the main assumption of these methods is that (1) gene expression reflects protein abundance and (2) protein abundance is sufficient to infer the PPI strength, ignoring essential factors for their binding, such as post-translational modifications or multisubunit complex assembly. Here, we focus on ligand–receptor pairs as PPIs (Fig. 2a, step 3); however, these strategies can be extended to any intercellular PPIs.

Communication scores can be binary or continuous (Fig. 2b), each providing different insights into the signalling pathways that cells use. Binary scores are simpler, whereas continuous scores enable more precise quantification of intercellular signalling. For both types of communication scores, we identified two core scoring functions in the literature. The ‘expression thresholding’ and ‘differential combinations’ methods are defined for the binary category, whereas the ‘expression product’ and ‘expression correlation’ approaches quantify continuous scores.

In binary scoring, expression thresholding is widely used because of its easy implementation and interpretation (Table 1). By thresholding expression values of both interacting partners in each ligand–receptor pair, we can measure all communication processes used between cells. If both genes are expressed above a threshold, the ligand–receptor pair is considered ‘active’; otherwise it is ‘inactive’ (assigning ones and zeros, respectively; Fig. 3). Different thresholds can be used for the ligand and receptor44,67. By contrast, the differential combinations method uses any approach to quantify differential gene expression88,89 to identify ‘active’ interacting partners. Thus, this strategy measures communication pathways in a sample- or cell type-specific manner. Regarding input data sets, both binary functions are suitable for either bulk or single-cell data (Fig. 2b); however, bulk-derived samples may miss certain interacting ligand–receptor pairs, as averaging gene expression across all cells can mask signals from low-abundance cells. These binary methods both assume that higher expression is needed for interaction and require the choosing of a gene expression threshold (either significance or gene expression magnitude), which can lead to false positives and/or false negatives7. General thresholds may fail as (1) some proteins have different bioactivities in a concentration-dependent manner90 and (2) mRNA–protein level correlations vary across genes82. Hence, gene-specific thresholds should be developed, as a general threshold may not properly represent the presence and activity of proteins.

Fig. 3: Toy examples of using core functions to compute communication scores.
figure 3

Two primary inputs are used for quantifying communication scores: a preprocessed gene expression matrix (part a) and a list of interacting proteins to supervise the analysis (for example, ligand–receptor pairs) (part b). Then a communication score (CS) can be computed for every ligand–receptor pair in a given pair of cells. Here, we show how to perform these calculations for four core functions (parts cf). These are applied to elucidate paracrine (parts c,d) and autocrine (parts e,f) communication. To assess cell–cell communication, a CS can be computed for each ligand–receptor pair by accounting for the presence of both partners if their expression is greater than a given threshold, which for demonstrative purposes was set arbitrarily to a value of 3.3 (part c), or by multiplying their expression values (part d). Similarly, the CS for each ligand–receptor pair can be the correlation score obtained from their expression across all cell types for autocrine communication (part e). To reveal non-autocrine interactions, the correlation can be computed across pairs of different cells. Particular signatures of each cell type can be extracted through analysing differentially expressed ligands and receptors. Using the cell type-specific differentially expressed genes, we can assign a binary CS and study the ligand–receptors used for autocrine communication (part f). In this example, autocrine communication is evaluated for cell type A by using its differentially expressed genes with respect to cell type B (cell type A-specific genes are located in the coloured quadrant). Analogously to the correlation score, for non-autocrine communication we would need to consider differentially expressed genes in each of the cell types or samples. For a given pair of cells, we can say that a communication pathway is active when the ligand is differentially expressed in one cell and its cognate receptor is differentially expressed in the other. FC, fold change.

Modelling interactions as non-binary processes provides biologically relevant information91 and helps infer active communication pathways. The expression product method computes a continuous value by multiplying the expression of both interacting proteins, and it has successfully found differences in ligand–receptor pair use between cells55,60. For example, communication pathways that were previously described to enhance the expansion of regulatory T cells were also identified to have communication scores linearly correlated with the infiltration of those cells in human metastatic melanoma60, which would not have been possible with binary scores. However, this approach may become problematic if interacting proteins have vastly different transcript levels wherein one protein dominates the interaction signal. Additional data preprocessing, such as cell-type normalization of gene expression using housekeeping genes60 or accounting for correlation between transcripts and proteins7, may mitigate this effect. Although this method can be applied to bulk data, microdissected or single-cell data sets are preferred as they capture the full heterogeneity of expression across cells19,48,49,55,60,62, resulting in clearer differences in expression92.

The second continuous scoring function is the expression correlation method. This score is the correlation between expression values of the interacting proteins across many samples. Thus, the score represents the general behaviour of each ligand–receptor pair in the evaluated groups rather than the individual importance for each pair of samples or cells. Whereas the previous methods can use any gene expression data type, the expression correlation method requires data sets containing many measurements of two populations to compute correlation for each ligand–receptor pair. A potential obstacle for this method is the sparsity of single-cell data sets, which can increase or decrease correlation coefficients in undesirable ways93,94, leading to correlation values that measure sparsity, rather than biology. In addition, the formation of subgroups of data points (representing pairs of cells) due to low expression variance of the ligand or the receptor in an interaction is known to bias correlation and can make this score inappropriate95.

In addition to these core scoring functions, more advanced approaches can generate interaction scores, such as RNA-Magnet53, a method using fuzzy logic to identify candidate ligands and receptors active in cell communication. As mRNAs encoding surface receptors frequently show low abundance, this strategy better accounts for the variance in receptor expression. Other advanced approaches have been wrapped into tools (see the section entitled A growing toolbox to facilitate CCC analysis). Each strategy can decipher CCC but relies on different assumptions that may influence the results. Therefore, knowing the limitations and following best practices in data set preprocessing93,96,97 are crucial to minimize false discoveries.

Distinguishing which pair of cells are more likely to interact

Measurement of individual communication scores facilitates the study of CCC, exposing the roles of specific signalling mechanisms; however, it does not reveal the entire interaction state between cells. Thus, it may be desirable to use an aggregate score to define the interactions between pairs of cells84.

From the few studies that aggregated communication scores into an overall score (Fig. 2a, step 5), the most common approach quantifies the number of active ligand–receptor pairs between cells (that is, the sum of binary communication scores). This score suggests which cells interact more strongly and enables the building of CCC networks to perform graph-based analyses. However, different cell pairs could have similar counts but completely different circuits of communication7, causing inaccuracies. Newer methods attempted to deal with those inaccuracies by weighting CCIs and CCC on the basis of additional data. Hence, these approaches enable the comparison of interactions weighted by their importance, among those that are considered active. One approach computes a probability of intracellular communication using a given ligand–receptor pair between a sender and a receiver cell and then aggregates this into a global CCC score25. Another study proposed a statistical model to evaluate expression variation of individual genes given a cell’s CCIs in its neighbourhood98. This method yields coefficients interpretable as an overall CCI score. Here, there is no aggregation of communication scores as the coefficient is computed from the covariance of different factors, such as the distance between cells, the cellular composition of the neighbourhood and transcriptomes including all genes instead of only ligands and receptors. Although these approximations seem more biologically relevant than just counting the expressed ligand–receptor pairs, they require more detailed information that might not be available in all experimental settings.

Alternative strategies to quantify CCC include Euclidian-like or Jaccard-like distance metrics that represent the number of active ligand–receptor pairs in relation to all ligand–receptors in interacting cells. Furthermore, spatial transcriptomics technologies99,100,101,102,103,104 may help train machine learning models to predict a physical distance-dependent potential of CCIs105. Expression of ligands and receptors can serve as inputs and distances between cells as outputs.

Although there are many approaches, it remains unclear which metric best captures the underlying biology. Moreover, any strategy relying on gene expression will remain limited to signals captured in the ligand–receptor list, which may miss ways that cells can communicate. Thus, there remains a need for strategies that determine an aggregate CCC potential and functional–spatial relationships of cells.

A growing toolbox to facilitate CCC analysis

In addition to the core scoring functions (Fig. 2b), many tools use advanced statistical methods to identify intercellular communication (Table 2). Existing computational tools can be grouped into one of four categories on the basis of the mathematical models used for identifying ligand–receptor interactions, classified here as (1) differential combination-based, (2) network-based, (3) expression permutation-based and (4) array-based tools.

Table 2 Existing tools for measuring cell–cell communication

In differential combination-based tools, significantly differentially expressed genes between cell clusters in single-cell RNA-seq data are identified, and these lists are analysed for ligand–receptor pairs. Two such tools, iTALK106 and CellTalker61, use slightly different downstream analysis methods to curate the final list of significantly interacting ligand–receptor pairs. Similarly, PyMINEr107 annotates interacting cell types but labels interactions as ‘activating’ or ‘inhibitory’ according to public interaction databases. Another tool, CCCExplorer67, uses differential gene expression and PPI database-guided network analysis of downstream targets and transcription factors to determine activated or deactivated signalling events between two groups of samples or cells. These methods are powerful for discerning ligand–receptor interactions in the background of the rest of the data set but are blind to interactions that are common between all groups.

Network methods are used by several tools and exploit properties of connections between genes. For example, CCCExplorer uses them to identify concerted movements in the expression levels of genes involved in ligand–receptor signalling67. The tools SoptSC25 and NicheNet54 use known interactions between ligands, receptors and downstream targets to build a network of ligand–receptor relationships. The former computes a probability, while the latter uses a personalized PageRank algorithm, in both cases to evaluate the effect of ligand–receptor co-expression on downstream signalling genes in the receiver cell and, thus, obtain a continuous score for ranking ligands and receptors based on this effect. Most recently, a method called ‘SpaOTsc’108 formulates intercellular communication as an optimal transport problem109 using RNA-seq and spatial information. All these tools use not only the expression levels of ligands and receptors to compute interaction scores but also expression levels of downstream signalling targets, which is intended to be a strength of these techniques. However, they do not account for the rules of gene regulation, so a limitation of these methods is that they are blind to signalling crosstalk, where the signals triggered by one receptor could interfere with the signals triggered by another receptor. Pitfall scenarios could be when an intracellular pathway is highly scored because of the expression of its downstream genes, but its activity may be strongly diminished because of inactivation due to another activated pathway that regulates it with post-translational modifications rather than expression control, as may happen in some cytokine signalling pathways110, leading to false positives or false negatives not seen with other strategies.

Expression permutation-based tools are the most widely used among those listed in this Review. These methods compute a communication score for each ligand–receptor pair and evaluate its significance either through cluster label permutation, non-parametric tests to assess differences with the null model, or through empirical methods. To increase confidence, CellPhoneDB30,83, CellChat79 and ICELLNET84 address one of the limitations common to most CCC methods: not considering multisubunit protein complexes. Lists with multimeric proteins are used to assess whether all subunits are simultaneously expressed to identify likely functional ligand–receptor interactions. CellChat also allows multisubunit complexes and incorporates positive and negative effectors into its Hill function-inspired framework. Also, other important features are present in different tools in this category: Giotto111 includes spatial expression information, accounting for the potential of cells to interact given their physical proximity, whereas ICELLNET is the only tool that returns a global CCC score aggregated from all ligand–receptor interactions using percentile normalization and the expression product core function. SingleCellSignalR112 uses a regularized expression product to compute ligand–receptor interaction scores and is the only tool reviewed that provides explicit cut-off values for this score to achieve appropriate false discovery rates based on empirical results.

Tensor-based tools are the most mathematically sophisticated group of tools. Although network computations can be formulated as matrix operations, scTensor113 explicitly models ligand–receptor interactions using a tensor. A tensor of rank three is generated from the data, wherein two dimensions are for ligand and receptor expression by each cell type or cluster within the single-cell RNA-seq data set, respectively, and the third dimension represents all ligand–receptor interactions. Then non-negative Tucker decomposition is performed to decompose this tensor114, resulting in three matrices with coefficients representing the relationship between interacting cells and their respective ligands and receptors. Thus, this tool captures communication pathways in a context where all pairs of cells are simultaneously considered and extracts the relationships between different CCIs and further produces lower error rates than are obtained with independent hypothesis tests. However, interpreting the scores from a tensor decomposition may not be as straightforward as interpreting the scores from other groups of tools.

The tools described also include powerful visualization features that facilitate the interpretation of results. Several of the more common visualization methods are outlined in Fig. 4 and display data by directly plotting ligand–receptor co-expression patterns and communication scores (Fig. 4ac) and provide higher-level intuition concerning overall CCI levels and the directionality of these effects between cell types (Fig. 4df). Thus, several tools not only quantify CCIs and CCC but also facilitate their analysis and interpretation.

Fig. 4: Common visualization techniques for cell–cell interactions and communication.
figure 4

a | A Sankey diagram for connecting key ligands from a sender cell to cognate receptors in the receiver cell. Node colour (ligand or receptor) indicates the expression level. b | Heatmap to represent the communication scores for each ligand–receptor interaction in each cell pair. c | Dot plot to show the communication score (colour of dots) and at the same time its significance (size), often obtained from a statistical model or permutation analysis. d | Circos plot or chord diagram to show key communication pathways used by different cell types to communicate. The links start from a ligand (red) and end in a receptor (blue), which are grouped for each cell type (coloured outer arcs). e | Bipartite network where nodes can be either cells or ligands. Edges can be directed only from a cell to a ligand it produces or from a ligand to a cell that expresses its cognate receptor. f | Cell–cell interaction network to represent the potential of cells to interact. Nodes correspond to cells and edges correspond to their interactions. These are directed from a sender cell to a receiver cell, and their thicknesses are proportional to the respective global cell–cell communication scores (for example, number of active ligand–receptor pairs).

Assessing predicted CCC mechanisms

CCC inference should be considered a data-driven process for generating hypotheses, which can lead to different results depending on the strategy adopted (Fig. 3). Thus, validating inferred mechanisms is crucial to confirm associations with phenotypes and behaviours of cell communities. In this section, we discuss current approaches used for this purpose, emphasizing both computational methods to minimize false discoveries and experiments used to validate results, and illustrate hallmark studies that successfully implemented this important step.

Computational minimization of false discoveries

Robust inferences are essential to minimize false positives and false negatives and help reduce the number of validations to perform, which is especially useful when experiments are expensive. From the total pool of inferred ligand–receptor interactions between cells, statistical modelling can assess whether they may result from the null hypothesis and help discard artefactual or non-specific CCC inferences.

Permutation-based analyses can help discard results arising from random noise by prioritizing cell type-enriched ligand–receptor pairs. The cluster labels, representing the cell type, are permuted for each single cell, and the mean gene expression within each permuted cluster is computed, followed by a recalculation of the communication scores. With all communication scores generated for each ligand–receptor pair in a given pair of clusters, a null distribution is built and a P value of the measured communication score can be computed. A full list of tools that use this method is included in the discussion of permutation-based tools (see the section entitled A growing toolbox to facilitate CCC analysis and Table 2). As a representative example, the CellPhoneDB tool was applied on single-cell RNA-seq of human first-trimester placentae to understand the regulation of the immune response and how it prevents harmful maternal responses30.

Subsampling analysis has been applied to evaluate the robustness of the CCC results. Random subsamples of the original data set are used to rerun the CCC analysis. Using the subsampled results, one can compute true-positive and false-positive rates with respect to the original data set results79 and quantify how variable the inferences are, given subtle changes in the composition of cell clusters. CellChat is one tool that has applied this method to compare its performance with that of other tools and measure robustness when identifying the role of a specific population of mouse myeloid cells, termed ‘MYL-A’, in wound healing through transforming growth factor-β signalling79.

Enrichment analyses have also helped reduce false discoveries. A recent study applied this strategy to study the role of structural cells (that is, fibroblasts, endothelial cells and epithelial cells) in priming immune responses and how they interact with immune cells58. It identified ligand–receptor pairs that were enriched between interacting cell types using a Fisher’s exact test over all possible pairs of differentially expressed genes. Supported by other experiments, these interactions revealed the crucial role of structural cells in priming immune response in an organ-specific fashion58.

Data generated with other technologies, such as proteomics, have shaped CCC methods and their potential performance. Specifically, they have helped tune the parameters of methods before making predictions on the gene expression data set of interest, addressing issues originating from imperfect mRNA–protein correlations. For example, Browaeys et al.54 used several gene and protein expression data sets to optimize NicheNet. Similarly, another study112 used gene and protein expression data sets to evaluate false-positive rates and tune the parameters of CCC measurement methods. Thus, integrating other types of data is also important to benchmark and optimize tools.

As discussed here, computational strategies can help reduce false discoveries and facilitate the selection of important results for further research or experimental validation. Although we highlighted cases for which these methods helped drive new hypotheses, many other strategies can be used, such as Wilcoxon’s test60, random generation of data sets68, cross-validation105 and Welch’s test48.

Experimental validation

Many studies use experimental methods to validate CCC mechanisms inferred computationally (Table 1; Supplementary Table 1). These cover three levels of experimental tests: (1) confirmation of the expression of proteins thought to be involved in CCC (for example, through proteomics, enzyme-linked immunosorbent assay, western blot or immunohistochemistry); (2) visualization of interactions between gene products expressed in neighbouring cells (for example, through microscopy coupled with immunostaining, single-molecule fluorescence in situ hybridization or measurement of co-occurrence through flow cytometry); and (3) assessment of the functional role of CCC mediators by performing in vivo or in vitro experiments using activators or inhibitors of the interacting molecules or genetic manipulation of cells to observe effects consistent with the predicted communication.

Although providing only tangential information, rather than validation, studies can confirm the presence of proteins to test the potential of signals contributing to CCC events. Immunohistochemistry or western blots have measured the cell type-specific presence of involved signalling complex members. A recent analysis of human liver development used immunohistochemistry to verify VCAM1 production in macrophages and ITGA4 production in erythroid cells of the human fetal liver to suggest a possible interaction between these two cell types and their relevance to lineage differentiation23. However, these approaches only support and do not validate a predicted interaction.

Immunohistochemistry and other tagging techniques, such as single-molecule fluorescence in situ hybridization, are frequently used to gain further information about cellular localization for predicted CCIs. This set-up verifies the spatial colocalization of CCC mediators and therefore supports the potential occurrence of a CCC event. For example, in addition to immunostaining to confirm protein expression, statistically significant co-occurrence of IL-33 and its receptor was validated on adjacent AT-2 cells and basophils, respectively41, whereas another study used imaging flow cytometry to show interaction between macrophages and liver progenitor cells23. In addition, immunofluorescence was used to colocalize the expression of four ligand–receptor interactions (APOE–LRP1, APOE–LDLR, VTN–KDR and LAMA4–ITGB1) to adjacent cells in mouse brain on embryonic day 14.5 (ref.22), supporting the relevance of these interactions to the developing brain. Thus, these approaches confirm spatial colocalization of predicted interactions.

Functional assessment is the most informative validation technique. This approach evaluates whether the phenotypes observed in the interacting cells are the result of specific CCC events. For example, the interaction between VEGFA and KDR (also known as VEGFR2) in liver bud development was confirmed by dosing KDR inhibitors in vitro into micro-liver buds, significantly reducing hepatoblast abundance and impairing differentiation24. Elsewhere, 33 candidate ligands affecting haematopoietic differentiation were dosed in vitro in haematopoietic stem cells, and 27 significantly affected the differentiation rate and trajectory of these cells15. Similarly, organ structural cells thought to contribute to the immune response were exposed to candidate cytokines, which induced expression changes similar to those seen upon lymphocytic choriomeningitis virus infection, as determined with a mouse model of lymphocytic choriomeningitis virus infection as a reference58.

Challenges and future directions

As the methods measuring CCIs improve and the associated results are experimentally validated, new research opportunities will arise that may improve the reliability of inferred CCIs. Furthermore, novel insights will emerge through the study of interactions between cells from different species and the engineering of phenotypes by manipulating communication events.

Multi-omics integration

Although mRNA and protein levels are qualitatively correlated, transcriptomics may not represent a fully accurate view of intercellular communication, as transcript and protein abundances can be uncoupled by post-transcriptional and post-translational processes82,115,116. For example, the inferred presence of ligand–receptor pairs from transcriptomics may not coincide with their actual presence in proteomic data7,82. Borrowing information from other omics technologies can improve confidence in results16,117, especially for emerging technologies such as Nativeomics118, which detects intact ligand–receptor assemblies using mass spectrometry, and INs-seq119, which couples single-cell RNA-seq with intracellular protein measurements to simultaneously profile transcription factors, signalling activity and metabolism. Moreover, emerging technologies such as single-cell proteomics11 will become important inputs for CCC methods and complement single-cell RNA-seq to improve CCI predictions.

Omics integration can extend beyond gene expression. Mammalian cells are covered by a thick glycocalyx, and most hormones and receptors are glycoproteins. Hence, glycomic data can add contextual information to CCI analyses. Glycosylation has an impact on protein interactions, especially in ligand–receptor binding as many ligands are glycosylated120, and glycosylation can change receptor affinity121,122,123. This phenomenon may rewire CCIs and affect, for example, development or T cell activation124. Moreover, glycans are involved in interspecies crosstalk, contributing to the specificity of communication125,126,127. Therefore, integration of additional omics technologies, such as proteomics and glycomics, will help identify additional CCIs that are missed with use of RNA-based methods.

Adding a spatial dimension

Ligand mRNA expression and ligand abundance are not the only factors required for communication: the ligand must also be localized in the correct cellular compartment, which RNA-based methods are blind to, and interacting cells are usually close to each other, which routine single-cell RNA-seq experiments cannot measure. To improve reliability of computed CCIs, it is crucial to account for spatial positions of mediators and interacting cells. Single-cell analyses have considered cell location in mouse bone marrow53,128, demonstrating that cell proximity is key to the study of intercellular communication. The study of spatial proximity in interacting cells is an emerging approach129. One technology for profiling these physical interactions, PIC-seq, uses cell sorting to acquire and transcriptionally profile physically interacting cells (PICs) through massively parallel single-cell RNA-seq130. The study authors presented an algorithm for deconvolving the data to capture signals from intercellular physical interactions. One advantage of this method is that it captures more details in the crosstalk of PICs than single-cell RNA-seq. Although this technology is promising, it is currently limited to studying PICs from only two populations of cells. Nevertheless, similarly to other PIC-based methods128,131,132, PIC-seq can be readily integrated into the pipelines described here and leverage the analysis of CCIs by including the intrinsic spatial information that PICs encode. As more approaches emerge for spatial-based transcriptomics99,100,101,102,103,104 and proteomics133,134, future studies and algorithms should include this information. Accounting for the physical distance between cells will lead to the generation of new scoring functions that may better capture the potential of cells to communicate and interact53,98,108,111,135. As an example, ligand-specific diffusion constants can be considered to reflect how effectively gene products can mediate long-distance communication136,137.

Adding a temporal dimension

Time is another important factor for studying CCC. It can help elucidate how communication evolves and detect important changes in dynamic processes, such as cellular differentiation and the immune response. One can use samples taken at different time points or infer pseudotimes from the RNA-seq data set as happens when one is studying cellular differentiation through lineage tracing138,139. Then CCC analyses on each (pseudo) time point can identify active ligand–receptor pairs and test the overall interaction potential between cells25. However, time-dependent behaviours might be uncoupled from mRNA levels, hindering the detection of changes through transcriptomics-based CCC predictions. mRNA expression can be temporarily disconnected from the activity of the products owing to long half-lives or storage of the products140. It can also take longer to reach the appropriate abundance of a ligand or a receptor than the production of the cognate mRNA82. For example, endothelial cells store cell adhesion molecules, such as P-selectins, in granules, which are quickly mobilized to the cell surface to start the recruitment of immune cells, instead of expressing those proteins de novo141. Thus, time-dependent analyses can improve CCC discoveries, but other data types should be integrated to distinguish whether phenotypes stem from CCC or other dynamics of gene expression.

Shedding light on interspecies interactions

Comprehensive lists of ligand–receptor pairs or interacting surface-secreted proteins are crucial for algorithms that study intraspecies CCIs. However, study of interspecies interactions — for example, humans and pathogens142,143,144,145 — requires lists of molecular interactions between species. An opportunity awaits to define interspecies PPI lists with omics methods. Considering the avalanche of data generated in the microbiome field146,147,148,149 and recent approaches for modelling microbial communities150, lists of interacting proteins or cross-species ligand–receptor pairs to enable CCI analyses would evidently yield novel discoveries. For example, a study mapped the interaction between inclusion membrane proteins secreted by Chlamydia trachomatis and cognate human proteins145, providing insights into the host machinery this pathogen uses to establish the intracellular niche needed for infection. Thus, even small-scale lists of interacting partners between the host and a single microorganism, instead of the whole microbiota, will open great opportunities to use CCC methods to understand infections and diseases on the basis of host–pathogen interactions151,152. Although this endeavour will require validation of putative interactions, as done for human–virus protein interactions153,154,155,156, results from such studies will have a considerable impact on biomedicine and microbiome fields.

Predicting and manipulating phenotypes

Models for predicting phenotypes were previously trained with the active communication pathways underlying intercellular interactions71. As more models are developed and their predictive power is increased, they will enable the identification of drug targets and manipulation of phenotypes through cell engineering. Particularly, removing or adding communication pathways with genome editing and cell engineering technologies157 will modify cellular phenotypes to control how pairs of cells interact. This approach will have a great impact on many fields, such as developmental biology158, wherein CCIs are fundamental for cell differentiation into specific lineages. Biomedicine and biotherapeutics will benefit further from controlling CCIs, particularly in modifying disease courses, as is currently done with immune checkpoint inhibitors159,160. As a proof of concept, a tool that induces gene activation when specific cells interact or are directly in contact was recently built by combining a synthetic Notch receptor and the CRISPR–Cas9 system161. This biological device enabled customization of cell behaviours as observed through the expression of reporter genes. As reporters may be replaced by activators or inhibitors of other communication pathways, this tool holds great potential for therapeutic uses. Thus, manipulation of interaction pathways to control CCIs is feasible, with predictive models helping to decide which modifications to perform.

Conclusions

Incredible advances are now emerging to infer CCIs and CCC from gene expression. The diverse strategies applied have elucidated fundamental roles of cells within their communities and how they shape cellular functions, with great potential for future applications, especially in biomedicine and biotherapeutics. Each approach for inferring CCIs and CCC has its own assumptions and limitations to consider; when one is using such strategies, it is important to be aware of these strengths and weaknesses and to choose appropriate parameters for analyses. Methodological and technological challenges remain, but many opportunities exist to increase our understanding of intercellular interactions.