Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference

View ORCID ProfilePierre-Cyril Aubin-Frankowski, View ORCID ProfileJean-Philippe Vert
doi: https://doi.org/10.1101/464479
Pierre-Cyril Aubin-Frankowski
1MINES ParisTech, PSL Research University, CBIO - Centre for Computational Biology, F-75006 Paris, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pierre-Cyril Aubin-Frankowski
Jean-Philippe Vert
2Google Brain, F-75009 Paris, France
1MINES ParisTech, PSL Research University, CBIO - Centre for Computational Biology, F-75006 Paris, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jean-Philippe Vert
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Single-cell RNA sequencing (scRNA-seq) offers new possibilities to infer gene regulation networks (GRN) for biological processes involving a notion of time, such as cell differentiation or cell cycles. It also raises many challenges due to the destructive measurements inherent to the technology. In this work we propose a new method named GRISLI for de novo GRN inference from scRNA-seq data. GRISLI infers a velocity vector field in the space of scRNA-seq data from profiles of individual data, and models the dynamics of cell trajectories with a linear ordinary differential equation to reconstruct the underlying GRN with a sparse regression procedure. We show on real data that GRISLI outperforms a recently proposed state-of-the-art method for GRN reconstruction from scRNA-seq data.

1 Introduction

Single-cell RNA sequencing (scRNA-seq) enables to observe genome-wide cellular activities at the single cell resolution (Kolodziejczyk et al., 2015), generating extraordinary expectations for biologists and bringing forth new computational and mathematical challenges. By allowing us to study cell-to-cell variability, scRNA-seq has quickly become a technique of choice to systematically identify cell types in complex samples (Trapnell, 2015; Zeisel et al., 2015; Tasic et al., 2016) and understand dynamic biological processes such as embryo development (Deng et al., 2014), cell differentiation (Lönnberg et al., 2017) and cancer (Patel et al., 2014).

A fascinating perspective offered by scRNA-seq studies is to understand how genes interact and regulate each other. In particular, by observing how gene expression varies among similar cells subject to stochastic fluctuations or involved in a dynamical process such as differentiation or cell cycle, one may be able to capture statistical or dynamical dependencies between genes which may in turn allow to reverse-engineer a gene regulatory network (GRN) to describe biologically which transcription factors (TF) regulate which genes. While numerous algorithms have been proposed to infer GRN from bulk transcriptomic profiles (e.g., Marbach et al., 2012, and references therein), scRNA-seq data raises new opportunities and challenges. On the one hand, the quantity of cells in scRNA-seq studies is often several-fold larger than the number of samples in bulk transcriptomic studies, offering increased statistical power to capture regulatory interactions, and allowing to capture subtle changes in dynamical process. On the other hand, scRNA-seq data are subject to various sources of variability (Kharchenko et al., 2014; Risso et al., 2018), and the precise type or state of each cell in a population must usually be inferred themselves from the data. In particular, in the case of dynamical processes such as differentiation or cell cycles, several methods have been proposed to automatically infer a pseudo-time associated to each individual cell, as reviewed by Cannoodt et al. (2016).

As in bulk transcriptomics studies, putative functional interactions between genes can be detected by simple correlation analysis (Moignard et al., 2013; Stegle et al., 2015; Bacher and Kendziorski, 2016), or through more advanced strategies to capture statistical dependency between genes tailored to scRNA-seq data (Chan et al., 2017; Filippi and Holmes, 2017). Aibar et al. (2017) refined the detection of gene modules by combining sequence information. However, such statistical associations do not necessarily capture regulatory relationships, which typically require perturbations or temporal experiments to be detected. As scRNA-seq studies can provide an ordering of cells involved in a dynamical process, through experimental time and/or inferred pseudo-time, they offer a unique opportunity to infer regulatory relationships by taking into account the (pseudo-)time information to compare gene expression profiles. For example, Herbach et al. (2017) propose a realistic, albeit complex, stochastic dynamical system to model scRNA-seq data, which is only tested on simulated data for networks of two genes due to its computational complexity. Moignard et al. (2015) present a formalism to infer a boolean network from single cell qRT-PCR data, but requires to discretize gene expression values to an on/off status in each cell. Ocone et al. (2015) propose to infer a GRN by estimating an ordinary differential equation (ODE) from pseudo-time-ordered scRNA-seq data; however, due to the computational complexity of the model selection procedure, the final GRN is limited to be a refinement of a coarse GRN inferred with GENIE3 (Huynh-Thu et al., 2010), a method for bulk gene expression. A similar, linear ODE-based formalism was proposed by Matsumoto et al. (2017), who designed a more efficient procedure named SCODE to directly infer de novo the GRN from scRNA-seq data. SCODE assumes that all cells are on the same trajectory, and estimates the parameters of the ODE by integrating it and optimizing the fit between the integrated model and each individual cell’s transcriptome. However, the resulting optimization problem is computationally intractable, and is solved only approximately by restricting the class of GRN models.

In this work, we follow the same linear ODE-based formalism as SCODE for GRN inference from scRNA-seq data, and propose a new approach, which we name GRISLI, to estimate the parameters of the model. GRISLI first estimates the velocity of each cell, i.e., how each gene’s expression is increasing or decreasing in the dynamical process for each cell, and then estimates the structure of the GRN by solving a sparse regression problem to relate the gene expression of a cell to its velocity profile. We solve the sparse regression problem with a variant of stability selection (Meinshausen and Bühlmann, 2010) proposed in TIGRESS (Haury et al., 2012), a method for GRN inference from bulk transcriptomics where no velocity is involved since samples are assumed to be near steady state. In spite of a similar ODE formalism, GRISLI differs from SCODE in several aspects: (i) while SCODE assumes that all cells are on the same trajectory, in GRISLI we consider bundles of trajectories derived from a large number of initial conditions and do not integrate the ODE; (ii) while SCODE integrates the ODE, leading to a computationally intractable optimization problem to infer the parameters, we solve a simple, convex regression problem that allows us to make no restrictive assumption on the GRN structure and leads to a fast algorithm. These benefits come at the cost of estimating the velocity of each cell, for which we propose a novel procedure based on weighted averages of finite differences with other cells at nearby positions in space-time.

We empirically assess the performance of GRISLI on human and murine scRNA-seq data and show that it outperforms TIGRESS, highlighting the benefits of the ODE-based framework for scRNA-seq data, as well as outperforming SCODE, confirming the relevance of our new estimation procedure.

2 Methods

2.1 Setting and notations

We consider the problem of inferring a GRN from a set of C single-cell transcriptomic profiles x1,…, xC ∈ ℝG, where xi ∈ ℝG represents the expression, for the i-th cell, of G genes. We furthermore assume that the cells are involved in a dynamical process, such as differentiation or cell cycle, and that for each cell i ∈ [1, C] we have an estimate of a time label ti ∈ ℝ that describes where the cell is in the process. The time-label ti is assigned to the i-th cell based either on the real experimental time, or on a calculated pseudo-time. Hence we assume given a collection of time-labeled vectors { (xi, ti) ∈ ℝG × ℝ: i = 1,…, C.}

We model the dynamical process of the cell expression x(t) ∈ RG as a linear ordinary differential equation (ODE) of the form Embedded Image where A ∈ ℝG×G characterizes how each gene’s expression level influences the expression dynamics of other genes. Assuming that each gene is regulated by only a few TFs, we assume that A is sparse, in the sense that Aij ≠ 0 means that the expression of gene j influences that of gene i, i.e., that gene j regulates gene i.

Inferring the GRN thus amounts to estimating which entries in A are non-zero. To do so, we propose a two-step approach called GRISLI: first we estimate the velocity of each cell vi = dxi/dt with an estimator Embedded Image, and second we infer non-zero elements of A by estimating the support of the regression model (1) from the sample Embedded Image with a stability selection procedure. We detail each step in turn below.

2.2 Velocity inference

Given the set of time-labeled vectors (xi, ti) ∈ ℝG × ℝ: i = 1,…, C, we estimate the velocity Embedded Image of each cell i ∈ [1, C] as follows. We first observe that from any other cell (xj, tj), with tj ≠ ti, we may form the following velocity estimate based on finite difference: Embedded Image

This estimate is interesting only when (i) tj is not too far from ti, so that a finite difference is a good approximation of the derivative, and (ii) the trajectories of cells j and i are close to each other, in the sense that if we were able to observe cell j at time ti then it should be close to xi. Based on these considerations, we form the velocity estimate Embedded Image as a weighted average of the Embedded Image, with weights defined by a spatio-temporal kernel K(x, t, x′, t) that quantifies how we believe (x′, t′) is useful to estimate the velocity at (x, t). However, as the points living in the past or the future of a given point (xi, ti) act differently on the velocity, we separate their contributions into two weighted averages. Embedded Image

As for the spatio-temporal kernel K, we arbitrarily take the following: Embedded Image where σx and σt are fixed respectively to the square root of the 10th percentile of the distribution of distances in space, x (resp. in time, t).

2.3 GRN inference

Once we form the estimate Embedded Image for the velocity vi = dxi/dt of each cell i = 1,…, C, we estimate the GRN by considering (1) as a sparse regression problem of the form Embedded Image, with observations Embedded Image We estimate the non-zero entries of A using a stability selection procedure for sparse regression (Meinshausen and Bühlmann, 2010; Haury et al., 2012). More precisely, for each candidate regulator j ∈ [1, C] and target gene i ∈ [1, C], we compute a score s(i, j) ∈ (0, 1) which increases when we believe that Aij ≠ 0, i.e., that j regulates i. The score s(i, j) itself depends on three parameters R, L ∈ ℕ and α ∈ [0, 1], and is computed through the procedure described by Haury et al. (2012), which we now summarize. Let us denote by X:= (x1,…, xC) ∈ ℝG×C the expression matrix and Embedded Image the matrix of estimated velocities. We repeat R times a procedure where we create a new expression matrix Embedded Image and a new velocity matrix Embedded Image obtained from X and Embedded Image by (i) randomly subsampling ⌊C/2⌋ columns (i.e., cells) simultaneously from X and Embedded Image, and (ii) multiplying each row i of X by a different random number βi uniformly sampled between α and 1. We then estimate for every Embedded Image a sparse matrix A by solving a lasso regression problem: Embedded Image over a grid of regularization parameters λ ensuring that we have solutions having from 0 to at least L nonzero entries in each row of A. For each pair of genes (i, j) and each integer l ∈ [1, L] we measure, among the R repeats, the frequency F (i, j, l) at which Aij is nonzero for the solution of (2) when λ is set such that l entries are nonzero in the i-th row of A, i.e., when the j-th TF is among the top l TFs in the regularization path to explain the expression of the i-th gene. We then consider the area score proposed by Haury et al. (2012): Embedded Image

Alternatively, we can consider the original stability selection score proposed by Meinshausen and Bühlmann (2010): Embedded Image Haury et al. (2012) discusses the differences between both scores, and suggests to prefer the area score which is therefore our preferred choice.

The choice of the three parameters R, L and α is a difficult question. While R should typically be as large as possible to reduce random fluctuations of the algorithm, α should typically be chosen in the range [0.2, 0.8] according to Haury et al. (2012) and L should be tested on a large grid of values. In the experiments below we provide results for different values of these parameters to demonstrate the potential of the method. In other applications where some interactions are known, we suggest as well to test predictions over a grid of values, and to pick the model that best matches known interactions.

2.4 Data

In order to test GRISLI, we use the two datasets provided online by Matsumoto et al. (2017). These are scRNA-Seq datasets where the TF expression is considered as the log-transform of the transcripts per million reads (TPM) or the log-transform of the fragments per millions of kilobases mapped (FPKM). The TF data comes from the RIKEN mouse TFdb for mouse (Kanamori et al., 2004) and animalTFDB for human (Hu et al., 2018), which we downloaded from the Transcription Factor Regulatory Network database (http://www.regulatorynetworks.org). In each case we kept only the 100 TFs with the highest variance in our datasets to infer the network, as they are the most likely to have an influence over differentiation.

The first dataset, published by Treutlein et al. (2016), (named Data2 in SCODE) comes from a direct reprogramming of murine embryonic fibroblast cells to myocytes at days 0, 2, 5 and 22. This dataset contained 373 cells.

The second dataset (named Data3 in SCODE) comes from Chu et al. (2016) and measures the differentiation of human ES cells to definitive endoderm cells, taken at 0, 12, 24, 36, 72 and 96 h. This dataset contained 758 cells.

2.5 Performance evaluation

We evaluate the performance of GRISLI by its area under the receiver operating characteristic curve (AUC), calculated by comparing the predictions of GRISLI (a score for each pair of TFs) to the gold standard regulatory networks. For the sake of comparison, we follow the choices made by Matsumoto et al. (2017): we compute the AUC ignoring self-loops (the diagonal elements) and the TFs that do not have an edge in the true network. After discarding the TFs with a variance among the top 100 for which no edge exists in the regulation network taken from the literature, only a smaller number of TFs remains for which we can compute the ROC. There are 40 TFs left for the murine dataset and 49 TFs for the human dataset.

3 Results

3.1 GRISLI

We propose a new method for Gene Regulatory network Inference from scRNA-seq data with LInear differential equations (GRISLI). The input to GRISLI is a set of time-stamped scRNA-seq data (xi, ti)i=1,…,C, where C is the number of cells, xi is the vector of gene expression for the i-th cell and ti is the time associated to the i-th cell; this time can be based either on the real experimental time, or on a calculated pseudo-time. GRISLI combines the dynamical model of SCODE (Matsumoto et al., 2017) with the statistical procedure for network estimation of TIGRESS (Haury et al., 2012). More precisely, like SCODE we model the dynamics of gene expression as a linear differential equation dx/dt = Ax, where A is sparse and encodes the GRN in its non-zero entries. By integrating this equation, and assuming all cells have the same (unknown) stage x0, Matsumoto et al. (2017) propose to estimate A by solving: Embedded Image which is however a computationally intractable non-convex optimization problem in A; to overcome the difficulty, Matsumoto et al. (2017) restrict themselves to low-rank diagonalizable matrices of the form A = WBW+, where B is diagonal of small rank, and use further assumptions and heuristics to obtain a tractable algorithm SCODE to optimize successively W and B.

GRISLI is based on the same dynamical model as SCODE, but exploits it differently. Instead of integrating the dynamical model dx/dt = Ax as in (3), we see it as a regression problem of the form v = Ax where v = dx/dt is the velocity of each cell and take a two-step approach to estimate A: (i) first estimate the instant velocity Embedded Image of each cell i = 1,…, C, and (ii) then estimate the non-zero entries of A using a stability selection procedure akin to the one used in TIGRESS to identify non-zero coefficients in the regression problem Embedded Image from samples Embedded Image. The technical details of GRISLI are presented in the Methods section.

While GRISLI involves a step of velocity inference absent from SCODE, the benefits of the GRISLI model over the SCODE model include the facts that (i) we do not need to assume that all cells lie on the same trajectory, and (ii) we make no restricting assumption on A, such as being of low rank and symmetric, and still derive a computationally efficient convex problem to estimate A.

3.2 Performance on GRN inference

To assess the predictive capacity and the speed of GRISLI, we test it on two benchmark datasets analyzed by Matsumoto et al. (2017): (i) a murine dataset of 373 cells corresponding to direct reprogramming of murine embryonic fibroblast cells to myocytes at days 0, 2, 5 and 22 (Treutlein et al., 2016), and a human dataset of 758 cells corresponding to differentiation of human ES cells to definitive endoderm cells, taken at 0, 12, 24, 36, 72 and 96h (Chu et al., 2016). We follow the experimental protocol of Matsumoto et al. (2017) to assess the effectiveness of GRISLI to predict known regulations, and compare it to SCODE and TIGRESS. All methods having a stochastic component, we run them 30 times on each dataset and summarize their performance by the distribution of AUC scores (see Methods), the AUC taking values between 0.5 for a random prediction to 1 for a perfect recovery of known regulations.

Figure 1 summarizes the performance of the three methods on both datasets. Since each method depends on several parameters, we tested different parameter set, as detailed in the next section, and report here the best performance of each method to assess how well they can perform if we choose good parameters. The parameters used for SCODE were the one provided for their respective datasets in Matsumoto et al. (2017). The parameters used for GRISLI result from an extensive search, illustrated on Figure 2. The parameters used for TIGRESS were obtained similarly, in practice they coincide with the parameters of GRISLI.

Figure 1:
  • Download figure
  • Open in new tab
Figure 1:

Performance of the different methods (as distribution of AUC over 30 repeats) on the murine (left) and human (right) benchmarks. SCODE score was obtained taking the average of 50 replicates with the rank D equal to 4 and 100 trials. GRISLI has respectively as parameters L = 70, R = 1500 and α = 0.3 for the murine benchmark, L = 1, R = 3000 and α = 0.3 for the human benchmark.TIGRESS was run with the same parameters as GRISLI.

Figure 2:
  • Download figure
  • Open in new tab
Figure 2:

(a): Performance of GRISLI (AUC) on the murine dataset with with R = 1500 and varying α and L. (b): Performance of GRISLI (AUC) on the murine dataset with α = 0.4, L = 70 and varying R (repeated 20 times). (c): Same as (a) but for the human data and R = 3000. (d): Same as (b) but for the human data with α = 0.3 and L = 1.

We first notice that, in both cases, TIGRESS has the poorest performance, which highlights the limitations of GRN techniques developed for bulk RNA-seq in the context of scRNA-seq data. This is coherent with the findings of Matsumoto et al. (2017) who noticed for example that GENIE3, another state-of-the-art method for GRN inference from bulk RNA-seq data, performs poorly on these scRNA-seq data. Since both TIGRESS and GENIE3 are based on a steady-state assumption of expression data, this suggests that the explicit dynamical model of SCODE or GRISLI is beneficial for scRNA-seq data.

Second, we see that GRISLI outperforms SCODE on both datasets: on the murine benchmark, GRISLI has a mean AUC of 0.571 vs 0.528 for SCODE, while on the human benchmark GRISLI reaches an AUC of 0.573 vs 0.550 for SCODE. Contrary to Matsumoto et al. (2017), we did not provide only one AUC value to assess the performance but a boxplot that shows the inherent stochasticity of the methods. As both SCODE and GRISLI share the same underlying dynamical model, this highlights the benefits of the GRISLI approach to estimate the parameters of the system. We furthermore notice that the variability in the performance across runs is smaller for TIGRESS than for SCODE, while TIGRESS ran faster (respectively 20s and 100s on the human and murine datasets, on a 4-cores 3.5GHz Intel Core i5 with 16GB of 667MHz DDR3 RAM) than SCODE (500s on both datasets).

For the murine dataset, we used Monocle pseudo-time as a time-label. However, for the human dataset, we used the real experimental time values, as the pseudo-time AUC results did not increase with R. It is possible that when the measurements are close enough and evenly spaced in time (such as for the human data), real time is sufficient, while for more distant experiences (which is the case of the murine data) the pseudo-time may be of some interest.

3.3 Sensitivity to the parameters

Here we investigate in more details the influence of the parameters L, α and R on the performance of TIGRESS. While R should typically be chosen as large as possible to ensure that the empirical average converges to the the expectation in the procedure, the optimal choice of L and α is harder to predict.

We therefore systematically assess the performance of GRISLI, in terms of AUC, over a large grid of values for L, α and R. Figure 2 summarizes the results, for both the murine and the human datasets. As expected, increasing R is always beneficial. For example, Figures 2b and 2d show for a particular choice of L and α that, after swiftly increasing, the AUC values reaches a plateau when R increases. As R has a significant effect on runtime, we take an intermediate value of R = 1500 for the murine dataset and R = 3000 for the human dataset for the subsequent experiments.

The influence of L and α is, as expected, more complex and depends on the dataset. On the murine dataset, the AUC is fairly stable and optimal for any L ∈ [62, 78] with little influence of α (Figure 2a), while on the human dataset only the value L = 1 seems adequate, having a bigger effect as the scores sharply diminish for larger than 0.6 (Figure 2c). Nonetheless the overall shape is coherent with the analysis of Haury et al. (2012): there is a compensating effect, as a smaller α increases the diversity between the batches. On the contrary, reducing L limits the number of selected edges of the regulation network, making the predictions more similar.

The optimal values for L are strikingly different between the two datasets. While taking α equal to 0.3 or R larger than 1000 seem adequate in both cases, determining heuristics to choose L is still an open problem. In practice, we suggest to test different values of L and select the one that best recapitulates known interactions, if any.

4 Discussion and conclusion

Based on the (pseudo-)time information of scRNA-seq data, we propose GRISLI, a new method to infer GRN without any other information than the scRNA-seq data themselves. GRISLI is based on the same linear ODE formalism as SCODE, where the GRN is defined as the support of the matrix that relates TF expression to velocities of other genes; however GRISLI differs from SCODE in several assumptions. While SCODE assumes that all cells are on the same trajectory, which allows to model each cell by integrating the ODE from a unique initial condition, GRISLI considers bundles of trajectories where each cell may be following a unique trajectory, governed by a unique ODE common to all cells. Given the inherent stochasticity in gene expression, and the well-known bifurcations possible when similar cells differentiate in different subtypes (Paul et al., 2015), we believe that allowing cells to evolve on different trajectories is an important property. This flexibility prevents us from integrating the ODE as in SCODE, and forces us instead to estimate the local velocity of each cell. We propose a simple estimator based on a weighted average of finite differences between pairs of cells, and believe that much work remains to be done for velocity inference from scRNA-seq data. Interestingly, La Manno et al. (2018) proposed recently a completely different approach for velocity inference, by comparing the quantity of spliced vs unspliced mRNA; it would be interesting to assess how this estimator correlates with ours, and potentially to use it in GRISLI for GRN inference. Another difference between GRISLI and SCODE concerns the assumptions on the structure of the GRN. For computational reasons, SCODE constrains the GRN matrix to be low-rank, while GRISLI puts no assumption on it. Furthermore, if we wish to constrain the structure of the GRN in GRISLI, it can easily be achieved by adding structured sparsity constraints in the sparse regression problem (Bach et al., 2012).

In terms of performance, we observed that GRISLI outperforms both SCODE and TIGRESS on both human and murine scRNA-seq data. The limited performance of TIGRESS highlights the fact that methods developed for bulk RNA-seq data, based on the assumption that samples are near a steady-state condition, are not optimal for single-cell data. This was already observed by Matsumoto et al. (2017) with other state-of-the-art GRN inference methods for bulk RNA-seq, and confirms the relevance of interpreting the GRN as the support of the matrix that relates expression to velocity in the linear ODE framework. The fact that GRISLI outperforms SCODE, on the other hand, confirms the relevance of our assumptions and estimation procedure. However, we should keep in mind that the performance in absolute value remains modest, with a maximum AUC of 0.58. This is roughly similar to the best performances reached on simpler organisms from bulk transcriptomic data (Marbach et al., 2012), and highlights again the difficulty of de novo GRN inference. In addition, we note that GRISLI has two main parameters to tune (L and α), which have a significant impact on the performance. If some interactions are known, we suggest to tune them over a grid of candidate values by maximizing the fit between known and predicted interactions. Finding heuristics to automatically chose L and α when no known interaction is available is an interesting future work.

Funding

None declared

Conflict of Interest

None declared.

Availability of data and materials

The code and data are available at https://github.com/PCAubin/GRISLI

Acknowledgments

The authors thank deeply Héctor Climente-González and Samyadeep Basu for enlightning discussions.

Footnotes

  • pierre-cyril.aubin{at}mines-paristech.fr, jpvert{at}google.com

References

  1. ↵
    Aibar, S. et al. (2017). Scenic: single-cell regulatory network inference and clustering. Nat. Methods, 14, 1083–1086.
    OpenUrlCrossRefPubMed
  2. ↵
    Bach, F. et al. (2012). Structured sparsity through convex optimization. Stat. Sci., 27(4), 450–468.
    OpenUrl
  3. ↵
    Bacher, R. and Kendziorski, C. (2016). Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol., 17(63).
  4. ↵
    Cannoodt, R. et al. (2016). Computational methods for trajectory inference from single-cell transcriptomics. Eur. J. Immunol., 46, 2496–2506.
    OpenUrlCrossRefPubMed
  5. ↵
    Chan, T. E. et al. (2017). Gene regulatory network inference from single-cell data using multivariate information measures. Cell systems, 5, 251–267.e3.
    OpenUrl
  6. ↵
    Chu, L.-F. et al. (2016). Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol., 17, 173.
    OpenUrlCrossRef
  7. ↵
    Deng, Q. et al. (2014). Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science, 343(6167), 193–6.
    OpenUrlAbstract/FREE Full Text
  8. ↵
    Filippi, S. and Holmes, C. C. (2017). A Bayesian nonparametric approach to testing for dependence between random variables. Bayesian Anal., 12(4), 919–938.
    OpenUrl
  9. ↵
    Haury, A.-C. et al. (2012). TIGRESS: Trustful Inference of Gene REgulation using Stability Selection. BMC Syst Biol, 6, 145.
    OpenUrlCrossRefPubMed
  10. ↵
    Herbach, U. et al. (2017). Inferring gene regulatory networks from single-cell data: a mechanistic approach. BMC Syst. Biol., 11, 105.
    OpenUrl
  11. ↵
    Hu, H. et al. (2018). AnimalTFDB 3.0: a comprehensive resource for annotation and prediction of animal transcription factors. Nucleic Acids Res.
  12. ↵
    Huynh-Thu, V. A. et al. (2010). Inferring regulatory networks from expression data using tree-based methods. PLoS One, 5(9), e12776.
    OpenUrlCrossRefPubMed
  13. ↵
    Kanamori, M. et al. (2004). A genome-wide and nonredundant mouse transcription factor database. Biochem. Biophys. Res. Commun., 322, 787–793.
    OpenUrlCrossRefPubMedWeb of Science
  14. ↵
    Kharchenko, P. V. et al. (2014). Bayesian approach to single-cell differential expression analysis. Nat. Methods, 11(7), 740–742.
    OpenUrlCrossRefPubMed
  15. ↵
    Kolodziejczyk, A. A. et al. (2015). The technology and biology of single-cell RNA sequencing. Molecular Cell, 58(4), 610–620.
    OpenUrlCrossRefPubMed
  16. ↵
    La Manno, G. et al. (2018). Rna velocity of single cells. Nature, 560, 494–498.
    OpenUrlCrossRef
  17. ↵
    Lönnberg, T. et al. (2017). Single-cell RNA-seq and computational analysis using temporal mixture modelling resolves Th1/Tfh fate bifurcation in malaria. Sci. Immunol., 2.
  18. ↵
    Marbach, D. et al. (2012). Wisdom of crowds for robust gene network inference. Nat. Methods, 9(8), 796–804.
    OpenUrlCrossRefPubMedWeb of Science
  19. ↵
    Matsumoto, H. et al. (2017). SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation. Bioinformatics, 33, 2314–2321.
    OpenUrlCrossRef
  20. ↵
    Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B, 72(4), 417–473.
    OpenUrlCrossRefPubMed
  21. ↵
    Moignard, V. et al. (2013). Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nature cell biology, 15, 363–372.
    OpenUrlCrossRefPubMedWeb of Science
  22. ↵
    Moignard, V. et al. (2015). Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nat. Biotechnol., 33, 269–276.
    OpenUrlCrossRefPubMed
  23. ↵
    Ocone, A. et al. (2015). Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data. Bioinformatics, 31, i89–i96.
    OpenUrlCrossRefPubMed
  24. ↵
    Patel, A. P. et al. (2014). Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science, 344(6190), 1396–1401.
    OpenUrlAbstract/FREE Full Text
  25. ↵
    Paul, F. et al. (2015). Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell, 163, 1663–1677.
    OpenUrlCrossRefPubMed
  26. ↵
    Risso, D. et al. (2018). A general and flexible method for signal extraction from single-cell RNA-seq data. Nature Comm., 9(1), 284.
    OpenUrl
  27. ↵
    Stegle, O. et al. (2015). Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet., 16(3), 133–145.
    OpenUrlCrossRefPubMed
  28. ↵
    Tasic, B. et al. (2016). Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat. Neurosci., 19(2), 335–346.
    OpenUrlCrossRefPubMed
  29. ↵
    Trapnell, C. (2015). Defining cell types and states with single-cell genomics. Genome Res., 25(10), 1491–1498.
    OpenUrlAbstract/FREE Full Text
  30. ↵
    Treutlein, B. et al. (2016). Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq. Nature, 534, 391–395.
    OpenUrlCrossRefPubMed
  31. ↵
    Zeisel, A. et al. (2015). Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science, 347(6226), 1138–42.
    OpenUrlAbstract/FREE Full Text
View Abstract
Back to top
PreviousNext
Posted November 07, 2018.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference
Pierre-Cyril Aubin-Frankowski, Jean-Philippe Vert
bioRxiv 464479; doi: https://doi.org/10.1101/464479
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference
Pierre-Cyril Aubin-Frankowski, Jean-Philippe Vert
bioRxiv 464479; doi: https://doi.org/10.1101/464479

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Systems Biology
Subject Areas
All Articles
  • Animal Behavior and Cognition (2434)
  • Biochemistry (4796)
  • Bioengineering (3335)
  • Bioinformatics (14704)
  • Biophysics (6649)
  • Cancer Biology (5180)
  • Cell Biology (7440)
  • Clinical Trials (138)
  • Developmental Biology (4374)
  • Ecology (6890)
  • Epidemiology (2057)
  • Evolutionary Biology (9930)
  • Genetics (7351)
  • Genomics (9542)
  • Immunology (4570)
  • Microbiology (12702)
  • Molecular Biology (4954)
  • Neuroscience (28382)
  • Paleontology (199)
  • Pathology (809)
  • Pharmacology and Toxicology (1394)
  • Physiology (2025)
  • Plant Biology (4516)
  • Scientific Communication and Education (978)
  • Synthetic Biology (1302)
  • Systems Biology (3919)
  • Zoology (729)