Abstract
An R package for specifying and estimating linear latent variable models is presented. The philosophy of the implementation is to separate the model specification from the actual data, which leads to a dynamic and easy way of modeling complex hierarchical structures. Several advanced features are implemented including robust standard errors for clustered correlated data, multigroup analyses, non-linear parameter constraints, inference with incomplete data, maximum likelihood estimation with censored and binary observations, and instrumental variable estimators. In addition an extensive simulation interface covering a broad range of non-linear generalized structural equation models is described. The model and software are demonstrated in data of measurements of the serotonin transporter in the human brain.
Similar content being viewed by others
Notes
For the lava.tobit package the weight argument is already reserved and the weight2 argument should be used instead and further the estimator should not be changed from the default.
For more information on mathematical annotation in R we refer to the plotmath help-page.
References
Andersen EB (1971) The asymptotic distribution of conditional likelihood ratio tests. J Am Stat Assoc 66(335):630–633
Angrist J (2001) Estimation of limited dependent variable models with dummy endogenous regressors: simple strategies for empirical practice. J Bus Econ Stat 19:2–16
Bates D, Maechler M (2009) lme4: linear mixed-effects models using S4 classes. http://CRAN.R-project.org/package=lme4, R package version 0.999375-31
Boker S, Neale M, Maes H, Wilde M, Spiegel M, Brick T, Spies J, Estabrook R, Kenny S, Bates T, Mehta P, Fox J (2011) Openmx: an open source extended structural equation modeling framework. Psychometrika 76:306–317
Bollen K (1996) An alternative two stage least squares (2sls) estimator for latent variable equations. Psychometrika 61(1):109–121
Bollen KA (1989) Structural equations with latent variables. Applied probability and statistics, Wiley series in probability and mathematical statistics. Wiley, New York
Bollen KA (2001) Two-stage least squares and latent variable models: simultaneous estimation and robustness to misspecification. In: Cudeck R, Sörbom D, Du Toit SHC (eds) Structural equation modeling, present and future: a festschrift in honor of Karl Jöreskog, Scientific Software International, Lincolnwood
Bollen KA, Kirby JB, Curran PJ, Paxton PM, Chen F (2007) Latent variable models under misspecification two-stage least squares (2SLS) and maximum likelihood (ML) estimators. Soc Methods Res 36(1):48–86. doi:10.1177/0049124107301947
Budtz-Jørgensen E, Keiding N, Grandjean P, Weihe P, White RF (2003) Statistical methods for the evaluation of health effects of prenatal mercury exposure. Environmetrics 14:105–120
Caffo B, Griswold M (2006) A user-friendly introduction to link-probit-normal models. Am Stat 60(2): 139–145
Csardi G, Nepusz T (2006) The igraph software package for complex network research, InterJ, Complex Syst 1695. http://igraph.sf.net
Ditlevsen S, Christensen U, Lynch J, Damsgaard MT, Keiding N (2005) The mediation proportion: a structural equation approach for estimating the proportion of exposure effect on outcome explained by an intermediate variable. Epidemiology 16(1):114–120. doi:10.1097/01.ede.0000147107.76079.07
Erritzoe D, Holst KK, Frokjaer VG, Licht CL, Kalbitzer J, Nielsen FA, Svarer C, Madsen J, Knudsen GM (2010) A nonlinear relationship between cerebral serotonin transporter and 5-HT2A receptor binding: an in vivo molecular imaging study in humans. J Neurosci 30(9):3391–3397. doi:10.1523/JNEUROSCI.2852-09.2010. http://www.jneurosci.org/cgi/reprint/30/9/3391.pdf
Fox J (2006) Teacher’s corner: structural equation modeling with the sem package in r. Struct Equ Model Multidiscip J 13(13):465–585. doi:10.1207/s15328007sem1303_7
Fox J (2009) Sem: structural equation models. http://CRAN.R-project.org/package=sem, R package version 0.9-16
Gansner ER, North SC (1999) An open graph visualization system and its applications to software engineering. Softw Pract Exper 30:1203–1233
Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80. http://genomebiology.com/2004/5/10/R80
Gentry J, Long L, Gentleman R, Falcon S, Hahne F, Sarkar D (2009) Rgraphviz: provides plotting capabilities for R graph objects. R package version 1.20.3
Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2009) Mvtnorm: multivariate normal and t distributions. http://CRAN.R-project.org/package=mvtnorm, R package version 0.9-4
Gilbert P (2009) NumDeriv: accurate numerical derivatives. http://www.bank-banque-canada.ca/pgilbert, R package version 2006.4-1
Greene WH (2002) Econometric analysis, 5th edn. Prentice Hall, Englewood Cliffs
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70. doi:10.2307/4615733
Holst KK (2011) Lava.tobit: latent variable models with censored and binary outcomes. http://lava.r-forge.r-project.org, R package version 0.4-3
Holst KK (2012) Gof: model-diagnostics based on cumulative residuals. http://CRAN.R-project.org/package=gof, R package version 0.8-1
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260):663–685
Hotelling H (1953) New light on the correlation coefficient and its transforms. J R Stat Soc Ser B 15:193–225 (discussion, 225–232)
Jöreskog K (1970) A general method for analysis of covariance structures. Biometrika 57:239–251
Kalbitzer J, Erritzoe D, Holst KK, Nielsen F, Marner L, Lehel S, Arentzen T, Jernigan TL, Knudsen GM (2010) Seasonal changes in brain serotonin transporter binding in short serotonin transporter linked polymorphic region-allele carriers but not in long-allele homozygotes. Biol Psychiatry 67:1033–1039. doi:10.1016/j.biopsych.2009.11.027
Kenward MG, Molenberghs G (1998) Likelihood based frequentist inference when data are missing at random. Stat Sci 13(3):236–247. doi:10.1214/ss/1028905886
Laird NM, Ware JH (1982) Random-effects models for longitudinal data. Biometrics 38:963–974
Lehmann EL, Romano JP (2005) Testing statistical hypotheses. Springer texts in statistics. Springer, New York
Liang KY, Zeger S (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22
Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley series in probability and statistics, Wiley, Hoboken
Magnus JR, Neudecker H (1988) Matrix differential calculus with applications in statistics and econometrics. Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley, Chichester
McArdle JJ, McDonald RP (1984) Some algebraic properties of the reticular action model for moment structures. Br J Math Stat Psychol 37(2):234–251
Muthén LK, Muthén BO (2007) Mplus user’s guide (version 5), 5th edn. Muthén& Muthén, Los Angeles
Paik M (1988) Repeated measurement analysis for nonnormal data in small samples. Commun Stat Simul Comput 17:1155–1171
Pinheiro JC, Bates DM (2000) Mixed-effects models in S and S-PLUS. Springer, Berlin
Pinheiro JC, Chao EC (2006) Efficient laplacian and adaptive gaussian quadrature algorithms for multilevel generalized linear mixed models. J Comput Graph Stat 15(1):58–81
R Development Core Team (2010) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org, ISBN 3-900051-07-0
REvolution Computing (2009) Foreach: foreach looping construct for R. http://CRAN.R-project.org/package=foreach, R package version 1.3.0
Rabe-Hesketh S, Skrondal A, Pickles A (2004) Generalized multilevel structural equation modeling. Psychometrika 69:167–190. doi:10.1007/BF02295939
Raftery A (1993) Bayesian model selection in structural equation models. In: Bollen K, Long J (eds) Testing structural equation models. Sage, Newbury Park, pp 163–180
Rotnitzky A, Robins JM (1995) Semiparametric regression estimation in the presence of dependent censoring. Biometrika 82(4):805–820
Sanchez BN, Budtz-Jørgensen E, Ryan LM, Hu H (2005) Structural equation models: a review with applications to environmental epidemiology. J Am Stat Assoc 100:1443–1455
Sharpsteen C, Bracken C (2010) TikzDevice: a device for R graphics output in PGF/TikZ format. http://R-Forge.R-project.org/projects/tikzdevice, R package version 0.5.2/r34
Steiger JH (2001) Driving fast in reverse. J Am Stat Assoc 96(453):331–338. doi:10.1198/016214501750332893
Therneau T, original R port by Thomas Lumley (2009) Survival: survival analysis, including penalised likelihood. http://CRAN.R-project.org/package=survival, R package version 2.35-8
White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50(1):1–26
Williams RL (2000) A note on robust variance estimation for cluster-correlated data. Biometrics 56(2):645–646. doi:10.1111/j.0006-341X.2000.00645.x
Yan J, Fine J (2004) Estimating equations for association structures. Stat Med 23:859–874. doi:10.1002/sim.1650
Acknowledgments
We thank the referees for helpful comments. This work was supported by The Danish Agency for Science, Technology and Innovation.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Some zero-one matrices
In this section we will define a few matrix-operators in order to define various conditional moments. Let \(\varvec{B}\in \mathbb R ^{n\times m}\) be a matrix, and define the indices \(\varvec{x} = \{x_{1},\ldots ,x_{k}\}\in \{1,\ldots ,n\}\), and \(\varvec{y} = \{y_{1},\ldots ,y_{l}\}\in \{1,\ldots ,m\}\). We define \(\varvec{J}_{n,\varvec{x}} = \varvec{J}_{\varvec{x}} \in \mathbb R ^{(n-k)\times n}\) as the \(n\times n\) identify matrix with rows \({\varvec{x}}\) removed. E.g.
To remove rows \(\varvec{x}\) from \(\varvec{B}\) we simply multiply from the left with \(\varvec{J}_{n,\varvec{x}}\). If we in addition want to remove columns \(\varvec{y}\) we multiply with the transpose of \(\varvec{J}_{n,\varvec{y}}\) from the right:
We will use the notation \(\varvec{J}\) to denote the matrix that removes all latent variables (\(\varvec{\eta }\)) from the vector of all variable, \(\varvec{U}\) as defined in (8). We denote \(\varvec{J}_{\varvec{Y}}\) the matrix that only keeps endogenous variables (\(\varvec{Y}\)).
We also need an operator that cancels out rows or columns of a matrix/vector. Define the square matrix \(\varvec{p}_{n,\varvec{x}}\in \mathbb R ^{n\times n}\) as the identity-matrix with diagonal elements at position \(\varvec{x}\) canceled out:
e.g.
To cancel out rows \(\varvec{x}\) and columns \(\varvec{y}\) of the matrix \(\varvec{B}\in \mathbb R ^{n\times m}\) we calculate
We will use \(\varvec{p}_{\varvec{X}}\) and \(\varvec{p}_{\complement \varvec{X}}\) as the matrix that cancels out the rows corresponding to the index of the exogenous variables (\(\varvec{X}\)) respectively the matrix that cancels out all rows but the ones corresponding to \(\varvec{X}\).
Appendix B: The score function and information
In this section we will calculate the analytical derivatives of the log-likelihood. In order to obtain these results we will first introduce the notation of some common matrix operations. Let \(\varvec{A}\in \mathbb R ^{m\times n}\), then we define the column-stacking operation:
where \(a_i\) denotes the ith column of \(\varvec{A}\). The unique commutation matrix, \(\mathbb R ^{mn\times mn}\) is defined by
Letting \(\varvec{H}^{(i,j)}\) be the \(m\times n\)-matrix with one at position \((i,j)\) and zero elsewhere, then
e.g.
It should be noted that product with a commutation matrix can be implemented very efficiently instead of relying on a direct implementation of the above mathematical definition.
Let \(\varvec{A}\in \mathbb R ^{m\times n}\) and \(\varvec{B}\in \mathbb R ^{p\times q}\) then the Kronecker product is the \(mp\times nq\)-matrix:
We will calculate the derivatives of (20) by means of matrix differential calculus. The Jacobian matrix of a matrix-function \(F:\mathbb R ^n\rightarrow \mathbb R ^{m\times p}\) is the \(mp\times n\) matrix defined by
Letting \(\,\text{ d}\!\,\) denote the differential operator (see Magnus and Neudecker 1988), the first identification rule states that \(\,\text{ d}\!\,\,\text{ vec}\,F(\varvec{\theta }) = A(\varvec{\theta })\,\text{ d}\!\,\varvec{\theta } \Rightarrow DF(\varvec{\theta }) = A(\varvec{\theta })\).
1.1 Appendix 14.1: Score function
Using the identities \(\,\text{ d}\!\,\log \left|\varvec{X}\right| = \,\text{ tr}\,(\varvec{X}^{-1}\,\text{ d}\!\,\varvec{X})\) and \(\,\text{ d}\!\,\varvec{X}^{-1} = -\varvec{X}^{-1}(\,\text{ d}\!\,\varvec{X})\varvec{X}^{-1}\), and applying the product rule we get
where
and
Taking \(\,\text{ vec}\,\)’s it follows that
hence by the first identification rule
and similarly
and finally (exploiting the symmetry of \({\varvec{\varOmega }}_{\varvec{\theta }}\) and commutative property under the trace operator) we obtain the gradient
Next we examine the model including a mean structure (20). W.r.t. to the first differential we observe that
Hence
Further by the chain-rule
and
Taking \(\,\text{ vec}\,\) (\({\varvec{G}}_{\varvec{\theta }}{\varvec{v}}_{\varvec{\theta }} = \varvec{1}{\varvec{G}}_{\varvec{\theta }}{\varvec{v}}_{\varvec{\theta }}\)):
We have calculated the full score but in some situations it will be useful to evaluate the score in a single point. The contribution of a single observation to the log-likelihood is
or as in (20) where we simply exchange \({\varvec{T}}_{\varvec{\theta }}\) with \(\varvec{T}_{\varvec{z}_i,\varvec{\theta }} = (\varvec{z}_i-{\varvec{\xi }}_{\varvec{\theta }})(\varvec{z}_i-{\varvec{\xi }}_{\varvec{\theta }})^{\prime }\), hence the score is as in (99) where (100) is calculated with \(\varvec{z}_i\) instead of \(\widehat{\varvec{\mu }}\). Alternatively, letting \(\varvec{z_i}-{\varvec{\xi }}_{\varvec{\theta }} = {\varvec{u}}_{\varvec{\theta }} = {\varvec{u}}_{\varvec{\theta }}(i)\):
where we used that for constant symmetric \(\varvec{A}\) the differential of a quadratic form is
Hence the contribution to the score function of the \(i\)th observation is
where the score-function evaluated in \(\varvec{\theta }\) is \(\mathcal S (\varvec{\theta }) = \sum _{i=1}^n \mathcal S _i(\varvec{\theta })\).
1.2 The information matrix
The second order partial derivative is given by
Taking negative expectation with respect to the true parameter \(\varvec{\theta }_0\) we obtain the expected information (Magnus and Neudecker 1988), which get rid of all second order derivatives
We will further derive the observed information in the case where the second derivatives vanishes in the case of the matrix functions \({\varvec{A}}_{\varvec{\theta }}, {\varvec{P}}_{\varvec{\theta }}\) and \({\varvec{v}}_{\varvec{\theta }}\). Now
Hence
Next we will find the derivative of (96). We let \(m\) denote the number of variables, \(p\) the number of parameters, and \(k\) the number of observed variable (e.g. \({\varvec{G}}_{\varvec{\theta }}\in \mathbb R ^{k\times m}\) and the number of columns in the derivatives are \(p\)). We have \({\varvec{G}}_{\varvec{\theta }}{\varvec{P}}_{\varvec{\theta }}\in \mathbb R ^{k\times m}\) and using rules for evaluating the differential of Kronecker-product (see Magnus and Neudecker 1988, pp. 184) we obtain
And
Hence from (112) and (113) and using rules for applying the \(\,\text{ vec}\,\) operator on products of matrices we obtain
with the expressions for the derivatives and second derivatives of \({\varvec{G}}_{\varvec{\theta }}\) given in (95) and (111). Further
and
and
By using the identity \(\,\text{ vec}\,(ABC) = (C^{\prime }\otimes A)\,\text{ vec}\,(B)\) several times we obtain
and the second order derivative of the log-likelihood (107) now follows from applying the product rule with (114), (115), (116) and (119).
Rights and permissions
About this article
Cite this article
Holst, K.K., Budtz-Jørgensen, E. Linear latent variable models: the lava-package. Comput Stat 28, 1385–1452 (2013). https://doi.org/10.1007/s00180-012-0344-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-012-0344-y