lab:projects:17robust_sem:manuscript [Zhiyong Johnny Zhang]

%%All the programs are in d:/missingdata-robustSEM/
\documentclass[12pt,epsfig,titlepage]{article}
\usepackage{listings}
\textheight=9.20in
\textwidth=6.5in
\topmargin=-.68in 
%\topmargin=-.88in 
\oddsidemargin=-.0in
%\oddsidemargin=-.09in
\setlength{\footskip}{.6truein}
\newcommand{\vech}{{\rm vech}}
\newcommand{\ve}{{\rm vec}}
\newcommand{\E}{{ E}}
\newcommand{\Var}{{\rm Var}}
\newcommand{\Cov}{{\rm Cov}}
\newcommand{\Acov}{{\rm Acov}}
\newcommand{\diag}{{\rm diag}}
\newcommand{\bone}{{\bf 1}}
\newcommand{\RMSEA}{{\rm RMSEA}}

\newcommand{\SE}{{\rm SE}}

\newcommand{\tr}{{\rm tr}}
\newcommand{\s}{\rm s}
%\newcommand{\min}{\rm min}
\newcommand{\limn}{{\lim_{n\rightarrow\infty}}}

\newcommand{\tho}{\theta_0}
\newcommand{\htn}{\hat\theta}
\newcommand{\fot}{^{\frac{1}{2}}}
\newcommand{\si}{{\sum_{i=1}^n}}
\newcommand{\sii}{\frac{1}{n}\si}
\newcommand{\cas}{\stackrel {a.s.}{\longrightarrow}}
\newcommand{\cip}{\stackrel {P}{\longrightarrow}}
\newcommand{\cid}{\stackrel {\cal L}{\to}}

\newcommand{\bDelta}{{\bf \Delta}}
\newcommand{\bGamma}{{\bf \Gamma}}
\newcommand{\bLambda}{{\bf \Lambda}}
\newcommand{\bOmega}{{\bf \Omega}}
\newcommand{\bPhi}{{\bf \Phi}}
\newcommand{\bPi}{{\bf \Pi}}
\newcommand{\bPsi}{{\bf \Psi}}
\newcommand{\bSigma}{{\bf \Sigma}}
\newcommand{\bUpsilon}{{\bf \Upsilon}}

\newcommand{\bXi}{{\bf \Xi}}
\newcommand{\balpha}{\mbox{\boldmath $\alpha$}}

\newcommand{\bbeta}{\mbox{\boldmath $\beta$}}
\newcommand{\bhbn}{\hat{\mbox{\boldmath $\beta$}}}
\newcommand{\bbo}{\mbox{\boldmath $\beta$}_0}
\newcommand{\bdelta}{\mbox{\boldmath $\delta$}}
\newcommand{\bepsilon}{\mbox{\boldmath $\epsilon$}}
\newcommand{\betab}{\mbox{\boldmath $\eta$}}

\newcommand{\bgamma}{\mbox{\boldmath $\gamma$}}
\newcommand{\biota}{\mbox{\boldmath $\iota$}}

\newcommand{\blambda}{\mbox{\boldmath $\lambda$}}
\newcommand{\bmu}{\mbox{\boldmath $\mu$}}
\newcommand{\bnu}{\mbox{\boldmath $\nu$}}
\newcommand{\bomega}{\mbox{\boldmath  $\omega$}}
\newcommand{\bphi}{\mbox{\boldmath $\phi$}}
\newcommand{\bpsi}{\mbox{\boldmath $\psi$}}
\newcommand{\bsigma}{\mbox{\boldmath $\sigma$}}
\newcommand{\btau}{\mbox{\boldmath $\tau$}}
\newcommand{\bTheta}{\mbox{\boldmath $\Theta$}}
\newcommand{\btheta}{\mbox{\boldmath $\theta$}}
\newcommand{\bhan}{\hat{\mbox{\boldmath $\alpha$}}}

\newcommand{\bhgn}{\hat{\mbox{\boldmath $\gamma$}}}

\newcommand{\btan}{\tilde{\mbox{\boldmath $\alpha$}}}

\newcommand{\btgn}{\tilde{\mbox{\boldmath $\gamma$}}}

\newcommand{\btho}{{\mbox{\boldmath $\btheta$}}_0}
%\newcommand{\bhan}{\hat{\mbox{\boldmath $\alpha$}}}

\newcommand{\bhtn}{\hat{\mbox{\boldmath $\theta$}}}
\newcommand{\bttn}{\tilde{\mbox{\boldmath $\theta$}}}
\newcommand{\bupsilon}{\mbox{\boldmath $\upsilon$}}

\newcommand{\bvarepsilon}{\mbox{\boldmath $\varepsilon$}}
\newcommand{\bvarphi}{\mbox{\boldmath $\varphi$}}
\newcommand{\bvartheta}{\mbox{\boldmath $\vartheta$}}

\newcommand{\bxi}{\mbox{\boldmath  $\xi$}}
\newcommand{\bzeta}{\mbox{\boldmath  $\zeta$}}

\newcommand{\bA}{{\bf A}}
\newcommand{\ba}{{\bf a}}
\newcommand{\bB}{{\bf B}}
\newcommand{\bb}{{\bf b}}
\newcommand{\bC}{{\bf C}}
\newcommand{\bc}{{\bf c}}

\newcommand{\bD}{{\bf D}}
\newcommand{\bE}{{\bf E}}
\newcommand{\be}{{\bf e}}
\newcommand{\bF}{{\bf F}}
\newcommand{\bff}{{\bf f}}
\newcommand{\bG}{{\bf G}}
\newcommand{\bg}{{\bf g}}
\newcommand{\bH}{{\bf H}}
\newcommand{\bh}{{\bf h}}
\newcommand{\bI}{{\bf I}}
\newcommand{\cI}{{\cal I}}

\newcommand{\bK}{{\bf K}}
\newcommand{\bL}{{\bf L}}
\newcommand{\bl}{{\bf l}}
\newcommand{\bM}{{\bf M}}
\newcommand{\bm}{{\bf m}}

\newcommand{\bP}{{\bf P}}
\newcommand{\bQ}{{\bf Q}}
\newcommand{\bR}{{\bf R}}
\newcommand{\br}{{\bf r}}
\newcommand{\bS}{{\bf S}}
\newcommand{\bs}{{\bf s}}
\newcommand{\bsn}{{\bf s}}
\newcommand{\bt}{{\bf t}}
\newcommand{\bu}{{\bf u}}
\newcommand{\bU}{{\bf U}}
\newcommand{\bV}{{\bf V}}
\newcommand{\bv}{{\bf v}}
\newcommand{\ev}{{\textbf{\emph{v}}}}

\newcommand{\bW}{{\bf W}}
\newcommand{\bX}{{\bf X}}
\newcommand{\bx}{{\bf x}}
\newcommand{\cX}{{\cal X}}
\newcommand{\cY}{{{\cal X}^*}}
\newcommand{\by}{{\bf y}}
\newcommand{\bz}{{\bf z}}
\newcommand{\bzero}{{\bf 0}}
\newcommand{\stheta}{{\scriptsize\btheta}}
\newcommand{\slambda}{{\scriptsize\blambda}}
\newcommand{\spsi}{{\scriptsize\bpsi}}


\begin{document}
\oddsidemargin=-.00in
\setlength{\footskip}{.6truein}

\baselineskip=18pt 

\begin{titlepage}
%\vspace*{3.5cm}
\vspace*{1.5cm}
\begin{center}
Robust Structural Equation Modeling with Missing Data and Auxiliary Variables\\
\vspace{12mm}
Ke-Hai Yuan and Zhiyong Zhang\\
University of Notre Dame\\
\vspace*{7mm}
December 24, 2010\\
\end{center}
\vspace*{12.0cm}
\flushbottom
$^*$The research was supported Grants DA00017 and DA01070 from the National Institute on Drug Abuse. Correspondence concerning this article: Ke-Hai Yuan, Department of Psychology, University of Notre Dame, Notre Dame, IN 46556, USA (kyuan@nd.edu).

\end{titlepage}
%\flushleft
\baselineskip=22.00pt 
\pagestyle{empty}
\begin{center}
Abstract
\end{center}

This paper develops robust M-procedures for structural equation modeling with missing data and auxiliary variables. M-estimates of saturated mean vector and covariance matrix of all variables are obtained first. Those corresponding to the interesting variables are next fitted to the structural model by the normal-distribution-based discrepancy function. Many weight functions can be used at the first stage for robustness. Details are given to Huber-type weight functions. In particular, an expectation robust (ER) algorithm is introduced for estimating the saturated model. A consistent covariance matrix for the robust estimates is obtained using estimating equations. These are used to obtain standard errors for structural parameter estimates as well as rescaled, adjusted or corrected statistics in the second stage. For applications, SAS IML and R programs are given to perform the first-stage analysis and EQS codes are provided to perform the second-stage analysis. An example with open/closed book examination data illustrates the proper use of the provided programs. 

\vskip1cm

\noindent

\vskip5mm
\noindent
Keywords: Auxiliary variables, estimating equation, missing at random, sandwich-type covariance matrix.

\newpage
\baselineskip=22.55pt 
\newpage 
\pagestyle{plain} 
\setcounter{page}{1}

\begin{center}
1. Introduction
\end{center}

Being capable of modeling latent variables and measurement errors simultaneously, structural equation modeling (SEM) has become one of the most popular statistical methods in social science research, where missing data are common, especially those that are collected longitudinally. Many procedures have been developed for modeling missing data in SEM, most of those are  normal-distribution-based maximum likelihood (NML). Because data in social sciences are typically nonnormally distributed (Micceri, 1989), there are also a few developments accounting for the effect of nonnormality on test statistics and standard errors associated with NML (Arminger \& Sobel, 1990; Yuan \& Bentler, 2000). But NML estimates (NMLE) can be very inefficient or even biased when data have heavy tails or are contaminated. Yuan and Bentler (2001) and Yuan, Marshall and Bentler (2002) provided the outline of modeling missing data in SEM and factor analysis using maximum likelihood (ML) based on the multivariate $t$-distribution. But they did not provide the details of implementing the procedures. Lee and Xia (2006) developed a missing data procedure for nonlinear SEM in which latent variables as well as measurement and prediction errors are symmetrically distributed. Model estimation and inference are through Monte Carlo and BIC.  Because model structure and estimation method both affect the value of BIC, Lee and Xia (2006, pp.\ 581--582) noted that the method should be used with caution. In this paper we develop a two-stage procedure to allow for auxiliary variables, where the saturated mean vector and covariance matrix are obtained by robust M-estimators. The robust estimates of means and covariances are then fitted by the structural model in the second stage. Model evaluation is done by using well-established statistics or fit indices for complete data. Actually, any model estimation and evaluation methods available for SEM with complete data can be applied to the second stage analysis. 

Missing data can occur for many reasons. The process by which data become incomplete was called missing data mechanism by Rubin (1976).  Missing completely at random (MCAR) is a process in which missingness of data is independent of both the observed and the missing values; missing at random (MAR) is a process in which missingness is independent of the missing values given the observed data. When missingness depends on the missing values themselves given the observed data, the process is missing not at random (MNAR). When all missing values are MCAR, most ad hoc procedures still generate consistent parameter estimates. When ignoring the process for data that are  MAR or MCAR, MLEs or the normal-distribution-based pseudo MLEs are still consistent. When missing values are MNAR, one has to correctly model the missing data mechanism in order to get consistent parameter estimates in general. However, MAR or MNAR mechanism depends on whether variables accounting for missingness are observed and included in the estimation. Auxiliary variables are those that are not directly involved in the structural model but have the potential to account for missingness in the interesting variables. The developed procedure aims for SEM with missing data that are MAR after including potential auxiliary variables. Actually, auxiliary variables can be easily included in the first stage estimation. Parallel to the procedures in Yuan and Lu (2008) and Savalei and Bentler (2009), only estimates of means and covariances corresponding to the interesting variables are selected and fitted by the structural model in the second-stage analysis. 

Ignoring MNAR mechanism will results in biased parameter estimates in general. Outliers or data contamination can also bring biases to parameter estimates. Actually, data contamination bring biases to parameter estimates even without missing values. Comparing the two kinds of biases, those caused by data contamination can be a lot greater than those by MNAR data. Notice that the consistency of MLEs with MAR data is a large sample property. When the population distribution has heavy tails, at smaller sample sizes the NMLEs can contain substantial biases when the proportion of MAR data is not trivial (Yuan, Wallentin \& Bentler, 2010).  In addition to biases, efficiency is also a key factor in any statistical estimation. The efficiency of NMLEs goes to zero as the kurtosis of the population increases. Compared to NML, a robust procedure typically has better control in (1) biases due to outliers or data contamination; (2) biases caused by the interaction of heavy-tailed distribution and missing data; (3) efficiency due to heavy-tailed distribution. 
Actually, there exist tuning parameters in all the weight functions for M-estimation. If one choose the tuning parameter properly the resulting estimates will be more close to the true MLEs than the pseudo NMLEs. 

Since practical data sets often exhibit heterogeneous marginal skewnesses and kurtoses, our purpose is to develop procedures that are robust to distribution violations. As we shall see, the robustness of the developed procedure is through assigning smaller weights to cases that lie far from the center of the majority of the data. Many weight functions can be used for such a purpose. We choose Huber-type weight function because it tends to yield more efficient parameter estimates than other weight functions with real data (Yuan, Bentler \& Chan, 2004). The tuning parameter in the Huber-type weight function is explicitly related to the percentage of data contamination or outliers one would like to control. Another consideration behind choosing Huber-type weight function is that it does not assign zero weights to cases. When many cases get zero weights, it is very likely that the resulting asymptotic covariance matrix of the robust estimates is close to singular. Then rescaled, adjusted or corrected residual-based statistics cannot be trusted or even obtained in the second stage SEM analysis (Bentler \& Yuan, 1999). This is important because the asymptotic covariance matrix may have a huge size and it is already a challenge for it to be positive definite even without any missing data.

Because no SEM software contains robust estimation with missing data, we will provide SAS IML and R programs to facilitate the applications of the developed procedure. In particular, for any missing data with or without auxiliary variables, by modifying a few lines of the codes in the SAS or R programs, they will generate the necessary elements for mean and covariance structure analysis in the second stage. With these elements, SEM with missing data is essentially the same as SEM with complete data for any SEM programs that allow users to input the first four moments for analysis.

In section 2 of the paper, we describe an expectation robust algorithm for  Huber-type M-estimators. Asymptotic covariance matrix of the robust estimators are also provided in section 2. Section 3 describes the second-stage analysis by fitting the normal-distribution-based discrepancy function and the associated adjusted and rescaled statistic. Section 4 introduces SAS IML and R programs for implementing the first stage estimation and illustrates their use with EQS code by the test score data of Mardia, Kent and Bibby (1979). Discussion and conclusion are provided in section 5.

\begin{center}
2. M-estimates of the Saturated Mean Vector and Covariance Matrix
\end{center}
 
Algorithms for robust estimates of means and covariances with missing values are developed in Little (1988) and Liu (2004) using a multivariate $t$-distribution. Cheng and Victoria-Feser (2002) provide an algorithm for obtaining minimum covariance determinant (MCD) estimates.
Yuan (2010) extended M-estimators of means and covariances to samples containing missing values using estimating equations, which can be solved by an expectation robust (ER) algorithm. This section will describe the ER algorithm. We will also provide the asymptotic covariance matrix for the M-estimates  with Huber-type weights. In particular, we assume some auxiliary variables are available.  

Let $\by$ represent a population of $p$ random variables with $E(\by)=\bmu$ and $\Cov(\by)=\bSigma$. We are interested in modeling $\bmu$ and $\bSigma$ by a structural model. A sample $\by_i$, $i=1$, 2, $\ldots$, $n$, from $\by$ with missing values is available. There also exist a vector $\bu$ of $q-p$ auxiliary variables with the associated sample realization $\bu_i$, $i=1$, 2, $\ldots$, $n$. Let $\bx=(\by',\bu')'$ with $E(\bx)=\bnu$ and $\Cov(\bx)=\bV$. Due to missing values, the vector $\bx_i=(\by_i',\bu_i')'$ only contains $q_i$ marginal observations of $\bx$. Also, we do not know the distribution of $\bx$ and the observations in $\bx_i$ may contain outliers. In such a case, robust estimates of $\bnu$ and $\bV$ are preferred than the NMLEs. Let $\bnu_i$ and $\bV_i$ be the mean vector and covariance matrix corresponding to the observations in $\bx_i$,
$$
d_i^2=d^2(\bx_i,\bnu_i,\bV_i)=(\bx_i-\bnu_i)'\bV_i^{-1}(\bx_i-\bnu_i),
$$
and $w_{i1}(d)$, $w_{i2}(d)$ and $w_{i3}(d)$ be nonincreasing scalar functions.
The estimating equations defining robust M-estimators are give by
$$
\sum_{i=1}^nw_{i1}(d_i)\frac{\partial\bnu_i'}{\partial\bnu}\bV_i^{-1}(\bx_i-\bnu_i)=\bzero
\eqno(1)
$$
and
$$
\sum_{i=1}^n\frac{\partial\ve'(\bV_i)}{\partial\ev}\bW_i\ve[w_{i2}(d_i)(\bx_i-\bnu_i)(\bx_i-\bnu_i)'-w_{i3}(d_i)\bV_i]=\bzero,
\eqno(2)
$$
where $\bW_i=0.5(\bV_i^{-1}\otimes\bV_i^{-1})$. Notice that the subscript $i$ in $w_{i1}$, $w_{i2}$ and $w_{i3}$ is to adjust for varying number of observations in $\bx_i$. When $w_{i1}(d_i)=w_{i2}(d_i)=w_{i3}(d_i)=1$, equations (1) and (2) define the NMLEs of $\bnu$ and $\bV$ with missing data. When  $w_{i1}(d_i)=w_{i2}(d_i)=(m+q_i)/(m+d_i^2)$ and $w_{i3}(d)=1$, equations (1) and (2) define the MLEs of $\bnu$ and $\bV$ based on the multivariate $t$-distribution with $m$ degrees of freedom. When the three weight functions are chosen according to Lopuha\"a (1989) or Rocke (1996), equations (1) and (2) define S-estimators of $\bnu$ and $\bV$ for samples with missing data. Let $0<\varphi<1$ and $\rho_i$ be the $(1-\varphi)$th quantile corresponding to $\chi_{q_i}$, the chi-distribution with $q_i$ degrees of freedom. Huber-type weight functions with missing data are given by
$$
w_{i1}(d_i)=\left\{ 
\begin{array}{ll}
1,   &{\rm if}\;d_i\leq \rho_i\\ 
\rho_i/d_i, &{\rm if}\;d_i>\rho_i,
\end{array}
\right. 
\eqno(3)
$$
$w_{i2}(d_i)=[w_{i1}(d_i)]^2/\kappa_i$ and $w_{i3}(d_i)=1$, where $\kappa_i$ is a constant defined by $E[\chi_{p_i}^2w_{i1}^2(\chi_{q_i}^2)/\kappa_i]=q_i$ that aims to yield consistent estimate of $\bV$ when $\bx\sim N(\bnu,\bV)$. 


Equations (1) and (2) can be easily solved by an ER algorithm. Let $\bx_{ic}=(\bx_i',\bx_{im}')'$ denote the  complete data, where $\bx_{im}$ is the vector containing the $q-q_i$ missing values. Of course, the positions of missing values are not always at the end in practice. We can perform  permutations on each missing pattern so that all the algebraic operations in this paper still hold. Let $\bnu^{(j)}$ and $\bV^{(j)}$ be the current values of $\bnu$ and $\bV$, $\bnu_i^{(j)}$ and $\bV_i^{(j)}$ be the means and covariances corresponding to the observed $\bx_i$. When $p_i<p$, we have
$$
\bnu^{(j)}=\left(\begin{array}{c}
\bnu_i^{(j)}\\
\bnu_{im}^{(j)}
\end{array}
\right)
\;\;\;{\rm and}\;\;\;
\bV^{(j)}
=\left(\begin{array}{cc}
\bV_i^{(j)}&\bV_{iom}^{(j)}\\
\bV_{imo}^{(j)}&\bV_{imm}^{(j)}
\end{array}
\right),
$$
where $\bnu_{im}^{(j)}$ corresponds to the means of $\bx_{im}$; $\bV_{imm}^{(j)}$ and $\bV_{imo}^{(j)}$ correspond to the covariances of $\bx_{im}$ with itself and $\bx_{i}$, respectively. Let 
$d_{i}^{(j)}=d(\bx_i,\bnu_i^{(j)},\bV_i^{(j)})$.
The E-step of the ER algorithm obtains 
the weights $w_{i1}^{(j)}=w_{i1}(d_i^{(j)})$, $w_{i2}^{(j)}=w_{i2}(d_i^{(j)})$, $w_{i3}^{(j)}=w_{i3}(d_i^{(j)})$, the conditional means
$$
\hat\bx_{ic}^{(j)}=E_j(\bx_{ic}|\bx_i)=\left(
\begin{array}{c}
\bx_i\\
\hat\bx_{im}^{(j)}
\end{array}
\right), 
\eqno(4)
$$
and the conditional covariance matrix
$$
\bC_i^{(j)}=\Cov_j(\bx_{ic}|\bx_{i})
=\left(
\begin{array}{cc}
\bzero&\bzero\\
\bzero&\bC_{imm}^{(j)}
\end{array}
\right),
\eqno(5)
$$
where
$$
\hat\bx_{im}^{(j)}=\bnu_{im}^{(j)}+\bV_{imo}^{(j)}(\bV_{i}^{(j)})^{-1}(\bx_i-\bnu_{i}^{(j)})
\;\;{\rm and}\;\;
\bC_{imm}^{(j)}=\bV_{imm}^{(j)}-\bV_{imo}^{(j)}(\bV_{i}^{(j)})^{-1}\bV_{iom}^{(j)}.
$$
The robust step is given by
$$
\bnu^{(j+1)}=\frac{\sum_{i=1}^nw_{i1}^{(j)}\hat\bx_{ic}^{(j)}}{\sum_{i=1}^nw_{i1}^{(j)}}, 
\eqno(6)
$$
$$
\bV^{(j+1)}=\frac{1}{n}\sum_{i=1}^n[w_{i2}^{(j)}(\hat\bx_{ic}^{(j)}-\bnu^{(j+1)})(\hat\bx_{ic}^{(j)}-\bnu^{(j+1)})'+\bC_i^{(j)}].
\eqno(7)
$$
Repeated the steps in (4) to (7) until convergence yields a solution to (1) and (2). For the Huber-type weight function in (3), the algorithm is implemented in the SAS and R programs to be introduced in section 4.



Let $\hat{\bnu}$ and $\hat\bV$ be the solution to (1) and (2). They play the role of sample means and covariance matrix in the 2nd stage analysis when estimating the structural parameters. We still need a consistent estimator of the covariance matrix of $\hat{\bnu}$ and $\hat\bV$ to get consistent SEs of the structural parameter estimates and reliable statistics for overall model evaluation. We will obtain such a covariance matrix using estimating equations.
Let $\ev=\vech(\bV)$ be the vector contains the elements in the low-triangular part of $\bV$, $\balpha=(\bnu',\ev')'$,
$$
\bg_{i1}(\balpha)=
w_{i1}(d_i)\frac{\partial\bnu_i'}{\partial\bnu}\bV_i^{-1}(\bx_i-\bnu_i),
$$
$$
\bg_{i2}(\balpha)=
\frac{\partial\ve'(\bV_i)}{\partial\ev}\bW_i\ve[w_{i2}(d_i)(\bx_i-\bnu_i)(\bx_i-\bnu_i)'-w_{i3}(d_i)\bV_i]
$$
and 
$\bg_i(\balpha)=(\bg_{i1}'(\balpha),\bg_{i2}'(\balpha))'$.
Under standard regularity conditions, the estimators $\hat{\bnu}$ and $\hat\bV$ satisfy
$$
\sqrt{n}(\bhan-\balpha)\cid N(\bzero,\bUpsilon), \eqno(8a)
$$
where $\bUpsilon$ can be consistently estimated by
$$
\hat\bUpsilon=[\frac{1}{n}\sum_{i=1}^n\frac{\partial\bg_i(\bhan)}{\partial\bhan'}]^{-1}
[\frac{1}{n}\sum_{i=1}^n\bg_i(\bhan)\bg_i'(\bhan)]
[\frac{1}{n}\sum_{i=1}^n\frac{\partial\bg_i'(\bhan)}{\partial\bhan}]^{-1}. \eqno(8b)
$$
The formulas for evaluating $\partial\bg_i(\balpha)/\balpha'$ with Huber-type weight are given in appendix A, and coded in the SAS IML and R programs. 


\begin{center}
3. Estimation and Inference with the Structural Model
\end{center}

The development in the previous section allows us to obtain $\hat{\bnu}$, $\hat\bV$ and $\hat\bUpsilon$. Our interest is in modeling the means and covariance matrix of $\by$.  Let $\hat{\bmu}$, $\hat\bSigma$ and $\hat\bGamma$ be the parts of $\hat{\bnu}$, $\hat\bV$, and $\hat\bUpsilon$ corresponding to the variables in $\by$, respectively; and $\bbeta=(\bmu',\vech'(\bSigma))'$. It follows from (8) that
$$
\sqrt{n}(\bhbn-\bbeta)\cid N(\bzero,\bGamma), \eqno(9)
$$
where $\bGamma$ is consistently estimated by $\hat\bGamma$.
With (9), robust SEM with missing data is the same as for SEM with complete data from an unknown population distribution. In particular, we can fit $\hat{\bmu}$ and $\hat\bSigma$ by any structural model and use $\hat\bGamma$ to obtain consistent standard errors (SEs) and related statistics or fit indices for overall model evaluation. Let $\bmu(\btheta)$ and $\bSigma(\btheta)$ be the structural model. We choose estimating $\btheta$ by minimizing
$$
F_{ML}(\btheta)=[\hat{\bmu}-\bmu(\btheta)]'\bSigma^{-1}(\btheta)[\hat{\bmu}-\bmu(\btheta)]+\tr[\hat\bSigma\bSigma^{-1}(\btheta)]-\log|\hat\bSigma\bSigma^{-1}(\btheta)|-p 
\eqno(10)
$$
for the following reasons: (a) minimizing $F_{ML}(\btheta)$ for parameter estimates is the default procedure in essentially all SEM programs; (b) the result in (9) and a consistent $\hat\bGamma$ allow us to obtain test statistics that have been shown to work well under varied conditions. These statistics are available in EQS and other programs. We will briefly describe them below.

Let $\bhtn$ be the parameter estimates of minimizing (10), $\dot{\bbeta}=\partial\bbeta(\btheta)/\partial\btheta'$, and 
$$
\bW_{\beta}=
\left(
\begin{array}{cc}
\bSigma^{-1}&\bzero\\
\bzero&0.5\bD_p'(\bSigma^{-1}\otimes\bSigma^{-1})\bD_p
\end{array}
\right).
$$
It follows from (9) that
$$
\sqrt{n}(\bhtn-\btheta)\cid N(\bzero,\bOmega), \eqno(11)
$$
where 
$$
\bOmega=(\dot{\bbeta}'\bW_{\beta}\dot{\bbeta})^{-1}
(\dot{\bbeta}'\bW_{\beta}\bGamma\bW_{\beta}\dot{\bbeta})
(\dot{\bbeta}'\bW_{\beta}\dot{\bbeta})^{-1}
$$
is consistently estimated when replacing $\btheta$ by $\bhtn$ and $\bGamma$ by $\hat\bGamma$. 


Let $T_{ML}=nF_{ML}(\bhtn)$ and $k$ be the number of free parameters in $\btheta$. Although referring $T_{ML}$ to the nominal chi-square distribution will most likely yield better inference than the same procedure following NML, we do not recommend such a practice. Better theoretically justified ones are the rescaled and adjusted statistics derived from $T_{ML}$, and the corrected residual-based asymptotically distribution free (ADF) statistic and a related $F$-statistic. Let $p^*=p(p+1)/2$, then $df=p^*+p-k$ is the nominal degrees of freedom. Let
$$
\bU=\bW_{\beta}-\dot{\bbeta}(\dot{\bbeta}'\bW_{\beta}\dot{\bbeta})^{-1}\dot{\bbeta}'\bW_{\beta}
$$
and $\hat{m}=df/\tr(\hat{\bU}\hat\bGamma)$. The rescaled statistic is given by
$$
T_{RML}=\hat{m}T_{ML},
$$
which asymptotically follows a distribution with mean equal to $df$. Let
$$
\hat{m}_1=\tr(\hat{\bU}\hat{\bGamma})/\tr[(\hat{\bU}\hat{\bGamma})^2]/,\;\;\;
\hat{m}_2=[\tr(\hat{\bU}\hat{\bGamma})]^2/\tr[(\hat{\bU}\hat{\bGamma})^2]. 
$$
The adjust statistic 
$$
T_{AML}=\hat{m}_1T_{ML}
$$
asymptotically follows a distribution with mean and variance equal to that of $\chi_{m_2}^2$, where $m_2=[\tr(\bU\bGamma)]^2/\tr[(\bU\bGamma)^2]$. In practice, we refer
$T_{RML}$ to $\chi_{df}^2$ or $T_{AML}$ to $\chi_{m_2}^2$ for model inference. However, the distribution of neither $T_{RML}$ nor $T_{AML}$ is known even asymptotically. 

Let
$$
\bQ=\bGamma^{-1}-\bGamma^{-1}\dot{\bbeta}(\dot{\bbeta}'\bGamma^{-1}\dot{\bbeta})^{-1}
\dot{\bbeta}'\bGamma^{-1}
$$
and $\br=\hat{\bbeta}-\bbeta(\bhtn)$. Then $T_{RADF}=n\br'\hat\bQ\br$ is just the residual-based ADF statistic (Browne, 1984) applied to the setting of robust procedures with missing data, and $T_{RADF}$ asymptotically follows $\chi_{df}^2$. In particular, a corrected version of it,
$$
T_{CRADF}=T_{RADF}/(1+\br'\hat\bQ\br)
$$ 
has bee shown to work well with smaller sample sizes when referred to $\chi_{df}^2$. Referring the $F$-statistic
$$
T_{RF}=(n-df)T_{RADF}/[(n-1)df]
$$
to an $F$-distribution with $df$ and $n-df$ degrees of freedoms has also been shown to work well at smaller sample sizes (Bentler \& Yuan, 1999). Both $T_{CRADF}\sim \chi_{df}^2$ and $T_{RF}\sim F_{df,(n-df)}$ are  asymptotically exact. Since the development of these statistics are essentially the same as in Yuan and Bentler (2000) and they are available in EQS (Bentler, 2008), we will not provide the details here. 

\begin{center}
4. An R package for Robust Estimation and Structural Models
\end{center}

We will introduce an package that generates the $\hat{\bnu}$ and $\hat\bV$ using the ER algorithm in (4) to (7) with the Huber-type weight in (3). The sandwich-type covariance matrix $\hat\bUpsilon$ in (9) will also be generated by the package. We will also provide EQS codes for SEM at the second stage analysis and demonstrate how to call EQS within R for SEM parameter estimation. The use of the package will be introduced through a real data set, and missing values are created so that they are MAR when an auxiliary variable is included.

Table 1.2.1 of Mardia et al.\ (1979) contains test scores of $N=88$ students on 
five subjects. The five subjects are: Mechanics, Vectors, Algebra, Analysis,
and Statistics. The first two subjects were tested with closed-book exams
and the last three were tested with open-book exams. Let $\by$ be the vector of Mechanics, Vectors, Analyis, and Statistics. Yuan and Lu (2008) found that the sample means and covariances of these four variables are well explained by the two-factor model 
$$
\by=\bLambda\bff+\be, \eqno(12)
$$
where 
$$
\bLambda=\left(
\begin{array}{cccc}
1.0 & \lambda_{21}& 0   &0\\
0   & 0           & 1.0 &\lambda_{42}
\end{array}
\right)',
$$
$\btau=E(\bff)=(\tau_1,\tau_2)'$,
$\bPhi=(\phi_{jk})=\Cov(\bff)$ is a covariance matrix, and $\bPsi=\diag(\psi_{11},\psi_{22},\psi_{33},\psi_{44})$. Clearly, the mean and covariance structures of $\by$ are
$$
\bm(\btheta)=\bLambda\btau\;\;\;{\rm and}\;\;\;
\bC(\btheta)=\bLambda\bPhi\bLambda'+\bPsi. \eqno(13)
$$
There are 11 parameters in the model with 
$$\btheta=(\tau_1,\tau_2, \lambda_{21}, \lambda_{42}, \phi_{11}, \phi_{21}, \phi_{22}, \psi_{11}, \psi_{22}, \psi_{33}, \psi_{44})'.
$$ 
The normal distribution-based LR statistic is $T_{ML}=3.259$, with an associated $p$-value=0.353 when referred to $\chi_3^2$.

We use the variable Algebra\footnote{The variable Algebra is chosen because $T_{ML}$ is highly significant when it is included in model (12).} to create missing data schemes, thus $x_3$ is an auxiliary variable. When $x_2={\rm Vectors}$ and $x_5={\rm Statistics}$, corresponding to the smallest 31 scores of $x_3={\rm Algebra}$, are removed and the variable Algebra is excluded from the analysis, the missing data mechanism is MNAR.  The missing data mechanism is MAR when the five variables are considered simultaneously. The created data set is at
%\footnote{The address is temporally anonymous for the review process.}
www.nd.edu/$^\sim$kyuan/missingdata-robustSEM/Mardiamv25.dat, with $-99$ for missing values.
%www.anonymous.edu/Mardiamv25.dat, with $-99$ for missing values.

To use the R package for the first time, it can be installed by issuing the following command
\begin{verbatim}
install.packages('rsem', repos='http://rpackages.psychstat.org')
\end{verbatim}
In order to perform SEM analysis using EQS within R, the package REQS (Mair, Wu, \& Bentler, 2010) is also required. The following command can be used to install REQS
\begin{verbatim}
install.packages('REQS', repos='http://cran.r-project.org')
\end{verbatim}

The R codes in Lines 1 to 5 in Appendix B illustrate a typical routine for using our R package for robust analysis. Specifically, \texttt{library(rsem)} loads the R package. The codes \texttt{setwd("c:/rsem")} set the working directory to the folder where the data file and the EQS model file reside. Line 3 uses the R function \texttt{read.table} to read raw data in the file \texttt{mardianMV25.dat} into R. The argument \texttt{header=T} tells R that variable names are given in the data file and the argument \texttt{na.string="-99"} indicates that \texttt{-99} represents a missing datum in the data file. Line 4 converts the data into the matrix format. Line 5 uses the function \texttt{rsem\_main} from the package \texttt{rsem} to conduct the robust analysis. The first augment \texttt{dset} specifies the name of the data. The second augment \texttt{c(1,2,4,5)} is a vector to select the variables 1, 2, 4, and 5 for the structural model used in EQS. The third augment \texttt{"mcov.eqs"} gives the name of input file for EQS. The contents of \texttt{mcov.eqs} are given in Appendix C.

The immediate output from running the R codes above is given in Lines 9 to 49 of Appendix B. The basic information about the data set including the sample size and the number of variables is given on Lines 9 and 10. Line 13 lists the names of variables selected for the structural model in EQS. Lines 15 to 18 provide information on missing data patterns.
Line 15 tells the number of total observed patterns in the original sample, 2 in this example. 
Each row from Line 17 to 18 contains $p+2$ numbers regarding the missing data information for a particular pattern. The first number is the observed cases in the pattern, the second is the number of observed variables in the pattern, call it $p_i$. The next $p_i$ numbers are the set of indices for the observed variables. The last $p-p_i$ numbers are the set of indices for the missing variables.

Line 21 gives the estimated mean vector $\hat{\bmu}$ and Lines 25 to 28 gives the estimated covariance matrix $\hat{\mathbf{\Sigma}}$ using the expectation robust algorithm for selected variables listed on Line 13. The estimated mean vector and covariance matrix can then be used to perform the second-stage analysis by SEM software such as EQS. In this example, EQS is used and the EQS input file is given in Appendix C.

Lines 35 to 40 output four statistics that we recommended regarding the overall model fit, $T_{RML}$, $T_{AML}$, $T_{CRADF}$, and $T_{RF}$. The $T_{RML}$ given on Line 32 is the Satorra-Bentler scaled chi-square test statistic, the $T_{AML}$ on Line 33 is the mean- and variance-adjusted chi-square test statistic, the $T_{CRADF}$ on Line 34 is the Yuan-Bentler residual-based test statistic, and $T_{RF}$ is the Yuan-Bentler residual-based F-statistic. For each statistic, the associated $p$-value is also given.\footnote{EQS  
approximates the $\hat{m}_2$ use the nearest integer and obtains the p-value for $T_{AML}$ using the approximated degrees of freedom.}

The last part of the output provides information about parameter estimates for the structural model. On each line of the output, the first term \texttt{(A,B)} denotes a path from \texttt{B} to \texttt{A}. For example, on Line 48, \texttt{(V2,F1)} represents the factor loading from \texttt{F1} to \texttt{V2}. The next three terms are the parameter estimate, its associated consistent standard error estimate, and  the associated $z$-score.

The R codes on Line 5 conducts the basic robust analysis. By supplying different augments for the function \texttt{rsem\_main}, one can control the robust analysis. The full specification of this function is given below:
\begin{verbatim}
rsem_main(dset, V_forana, EQSmodel, moment = TRUE, varphi = 0.1, 
          max_it = 200, eqsdata = "data.txt", eqsweight = "weight.txt", 
          EQSpgm = "C:/Progra~1/EQS61/WINEQS")
\end{verbatim}
The first augment \texttt{dset} specifies the data to be used and this augment is required. The second augment \texttt{V\_forana} supplies the indices of variables that are used in the structural model. If not provided, all variables in the data set are used in the structural model. The third augment \texttt{EQSmodel} provides the name of the EQS input file. If omitted, only the robust mean vector and covariance matrix are estimated and no structural model will be analyzed.\footnote{This is useful when SEM software other than EQS is used.} The fourth augment is \texttt{moment} and its default value \texttt{TRUE} indicates that mean and covariance analysis will be conducted. Alternatively, if \texttt{moment=FALSE}, only covariance analysis will be conducted. The fifth augment \texttt{varphi=0.1} specifies the Huber-type wieght function according to (3) that gives the approximate proportion of cases to be down-weighted if data are from a normal population. The default value is 10\%. If \texttt{varphi=0}, no data will be down-weighted and the analysis reduces to an NML analysis. The sixth augment \texttt{max\_it} defines the maximum number of iterations for the ER algorithm. The default is 200 and if the number is exceeded, the user will be prompted to supply a bigger number. The seventh augment \texttt{eqsdata} specifies the file name to save the estimated mean vector and covariance matrix from the ER algorithm and should be the same as the file name for the argument \texttt{data} in the EQS input file (e.g., Line 7 in Appendix C). The eighth augment \texttt{eqsweight} specifies the file name to save the sandwich-type covariance matrix from the ER algorithm and should be the same as the file name for the argument \texttt{weight} in the EQS input file (e.g., Line 5 in Appendix C). The last augment tells the path to the EQS program and in general, it is not necessary to change.

After running the function \texttt{rsem\_main}, in addition to the immediate output discussed earlier, results for the analysis are also saved into an object, for example, \texttt{ex1} on Line 5 in Appendix. For example, \texttt{ex1\$misinfo} includes the missing data pattern information and sorted data according to missing data patterns. \texttt{ex1\$sem} provides estimated mean vector, covariance matrix, and sandwich-type covariance matrix $\hat{\mathbf{\Gamma}}$. To view other contents of the results, use the function \texttt{names(ex1)}.


Under MH(.10) in Table 1(a) are the $\bhtn$, the consistent SEs and the associated $z$-scores following M-estimator with Huber-type weight at $\varphi=.10$.\footnote{The R codes for the analysis are \texttt{rsem\_main(dset, c(1,2,4,5), "mcov.eqs")}.} Parallel results using NML are also reported in Table 1 for comparison purpose, where the SEs and $z$-scores are also based on a sandwich-type covariance matrix (Yuan \& Bentler, 2000).\footnote{The R codes for the analysis are \texttt{rsem\_main(dset, c(1,2,4,5), "mcov.eqs", varphi=0)}.} The four fit statistics from MH(.10) and the four parallel ones following NML are also reported in Table 1(b). Below each statistic is the associated $p$-value\footnote{EQS  
approximates the $\hat{m}_2$ use the nearest integer and obtains the p-value for $T_{AML}$ using the approximated degrees of freedom.}.
Like the parameter estimates, the statistics under MH(.10) and NML in Table 1(b) are very comparable, all suggest that the model in (12) fits the data well. Most of the parameter estimates and SEs under MH(.10) are comparable to those under NML because the sample is very close to normally distributed. Actually, the standardized Mardia's (1970) multivariate kurtosis 
is .057, not statistically significant at all.

\begin{table}[bt]
\begin{center}
Table 1.\\
(a) Parameter estimates $\bhtn$, their SEs and $z$-scores\\
\begin{tabular}{c|rrrrrrr}
\hline
&\multicolumn{3}{c}{MH(.10)}&& \multicolumn{3}{c}{NML}\\
\cline{2-4}\cline{6-8}
$\theta$ & $\hat\theta$  & $\SE$  & $z$    && $\hat\theta$ & $\SE$ &$z$ \\
\hline      
   $\tau_1$   &  39.447 & 1.698 & 23.228 &&  39.187 &  1.748 & 22.416  \\
   $\tau_2$   &  47.192 & 1.524 & 30.969 &&  46.660 &  1.585 & 29.435  \\
$\lambda_{21}$&   1.289 & 0.046 & 28.145 &&   1.290 &  0.046 & 28.251  \\
$\lambda_{42}$&   0.876 & 0.024 & 35.850 &&   0.882 &  0.026 & 33.681  \\
$\phi_{11}$   &  87.660 &34.171 &  2.565 && 102.040 & 33.993 &  3.002  \\
$\phi_{21}$   &  78.812 &23.546 &  3.347 &&  81.852 & 23.668 &  3.458  \\
$\phi_{22}$   & 192.695 &53.888 &  3.576 && 200.402 & 54.635 &  3.668  \\
$\psi_{11}$   & 180.696 &30.851 &  5.857 && 182.097 & 29.874 &  6.095  \\
$\psi_{22}$   &  41.575 &27.150 &  1.531 &&  30.703 & 25.392 &  1.209  \\
$\psi_{33}$   &  10.545 &55.172 &  0.191 &&  19.499 & 51.742 &  0.377  \\
$\psi_{44}$   & 203.808 &41.220 &  4.944 && 199.950 & 38.821 &  5.151  \\
\hline
\end{tabular}
\end{center}
\begin{center}
(b) Statistics for overall model evaluation\\
\begin{tabular}{c|ccccccccc}
\hline
&\multicolumn{4}{c}{MH(.10)}&& \multicolumn{4}{c}{NML}\\
\cline{2-5}\cline{7-10}
   &$T_{RML}$ & $T_{AML}$ &$T_{CRADF}$ & $T_{RF}$ &&$T_{RML}$& $T_{AML}$ & $T_{CRADF}$ & $T_{RF}$\\
\hline
$T$& 1.376  &1.220  & 1.427 & 0.472 && 1.361 &1.284 &1.285 &0.425\\
$p$& 0.711  &0.748  & 0.699 & 0.702 && 0.715 &0.733 &0.733 &0.736\\ 
\hline
\end{tabular}
\end{center}
\end{table}


The results in Table 1 suggest that M-estimator with Huber-type weight generate very close results when data are close to normally distributed. However,  practical data often do not follow a normal distribution as close as the open-closed-book data. Actually, we found that, among all samples that have been used in the SEM literature and are available to the public, the open-closed-book sample is most close to normally distributed. In the created missing data set, only three variables are observed on the last five cases. Multiplying each of these 15 numbers by 5 created a contaminated data set, which is also available at the same website for the R package. Applying the same procedures for Table 1 to this new data set 
generates the results in Table 2. Now MH(.10) and NML have much more difference. In particular, SEs under NML in Table 2(a) are uniformly greater than those under MH(.10). There is also a Heywood case under NML. The statistics in Table 2(b) under NML still endorse the model because they all account for nonnormality through including 4th-order moments, but they are not as supportive as those under MH(.10).

%\begin{table}[bt]
\begin{center}
Table 2.\\
(a) Parameter estimates $\bhtn$, their SEs and $z$-scores when the three observed variables in the last five cases are contaminated\\
\begin{tabular}{c|rrrrrrr}
\hline
&\multicolumn{3}{c}{MH(.10)}&& \multicolumn{3}{c}{NML}\\
\cline{2-4}\cline{6-8}
$\theta$ & $\hat\theta$  & $\SE$  & $z$    && $\hat\theta$ & $\SE$ &$z$ \\
\hline      
   $\tau_1$   &    40.813 &  1.625 & 25.122 &&   39.436 &   2.155 & 18.298\\
   $\tau_2$   &    49.397 &  1.547 & 31.925 &&   52.918 &   2.900 & 18.251\\
$\lambda_{21}$&     1.291 &  0.046 & 28.326 &&    1.399 &   0.087 & 16.072\\
$\lambda_{42}$&     0.899 &  0.028 & 32.088 &&    1.019 &   0.073 & 14.005\\
$\phi_{11}$   &    70.171 & 26.699 &  2.628 &&  127.887 &  42.616 &  3.001\\
$\phi_{21}$   &    68.254 & 20.463 &  3.336 &&  226.830 & 104.116 &  2.179\\
$\phi_{22}$   &   202.579 & 63.329 &  3.199 &&  691.490 & 309.217 &  2.236\\
$\psi_{11}$   &   168.629 & 31.932 &  5.281 &&  250.928 &  66.602 &  3.768\\
$\psi_{22}$   &    52.116 & 27.826 &  1.873 &&   99.705 & 102.299 &  0.975\\
$\psi_{33}$   &     6.732 & 61.601 &  0.109 &&  -62.183 & 122.096 & -0.509\\
$\psi_{44}$   &   216.024 & 50.702 &  4.261 &&  546.936 & 319.965 &  1.709\\
\hline
\end{tabular}
\end{center}
\begin{center}
(b) Statistics for overall model evaluation\\
\begin{tabular}{c|ccccccccc}
\hline
&\multicolumn{4}{c}{MH(.10)}&& \multicolumn{3}{c}{NML}\\
\cline{2-5}\cline{7-10}
   &$T_{RML}$ & $T_{AML}$ &$T_{CRADF}$ & $T_{RF}$ &&$T_{RML}$ &$T_{AML}$ & $T_{CRADF}$ & $T_{RF}$\\
\hline
$T$& 0.803  & 0.788 & 0.725 & 0.238 && 5.126 &2.516 &2.093 &0.699 \\
$p$& 0.849  & 0.852 & 0.867 & 0.870 && 0.163 &0.113 &0.553 & 0.555\\ 
\hline
\end{tabular}
\end{center}
%\end{table} 

%\newpage
\begin{center}
5. Discussion and Conclusion
\end{center}

In social and behavioral sciences, data sets from normal distributions are  rare (Micceri, 1989), and they typically have heavy tails with an unknown population distribution.  When a sample contains missing values, most researchers choose NML for analysis. However, biases in NMLEs can be greater than the parameters themselfs even when all missing values are MAR. The aim of the paper is to develop a two-stage robust procedure for SEM with missing data and an unknown population distribution. According to Rubin (1976), missing values by MAR mechanism can be ignored if the estimation is proceeded by ML. The first stage allows including auxiliary variables so that it is more realistic to assume that the missing data mechanism is MAR. The tuning parameter $\varphi$ in (3) allows us to choose different weighting schemes so that the resulting estimating equations in (1) and (2) are very close to those obtained when setting the score functions corresponding to the true likelihood to zero. Thus, the robust estimates are more close to the true MLEs than the pseudo NMLEs. Regardless of the weight chosen, the asymptotic covariance matrix of the saturated model can be consistently estimated. At the end of the first stage, the problem of mean and covariance structure analysis with missing data becomes the same as that for complete data. 

Yuan and Bentler (2010) showed that NMLEs are still consistent even when the population distribution is unknown. Consistency does not tell anything about efficiency or finite sample biases. Empirical results in Yuan et al.\ (2010) indicate that, with MAR mechanism and nonnormal population, biases in NMLEs at small samples can be as bigger as the parameter estimates themselves. The consistency results are obtained under the assumption that there is no data contamination. Even without any missing data, data contamination can be disastrous to NMLEs. The developed procedure control the effect of both heavy tails and data contamination by assigning cases with proper weights.

In this paper we have mainly worked with Huber-type M-estimators because they are very close to NMLEs for normally distributed population, and have been shown to work well in practical data analysis (Yuan, Chan \& Bentler, 2000;  Zhong, 2008; Zu \& Yuan, 2010).  However, the Huber-type M-estimator cannot handle the situation when the proportion of extreme values is greater than $1/(p+1)$, as measured by a property called the breakdown point.  When the percentage of extreme observations is above $1/(p+1)$, one may need to choose the S- or the MCD-estimators (Rocke, 1996; Cheng \& Victoria-Feser, 2002). Both can have a breakdown point of approximately $1/2$, while the S-estimator is usually more efficient. Since S-estimators also satisfy equations (1) and (2), the ER algorithm and the results in (8) also apply to S-estimators. In additional to breakdown point, the starting values for the ER algorithm may affect the robustness of the converged estimators. We set the starting value of $\bnu$ and $\bV$ in the ER algorithm at respectively $\bzero$ and $\bI$ in the SAS code, which are obviously not affected by contaminated cases. If there is a suspect that a large proportion of the data is contaminated, one may use a MCD or the minimum volume ellipsoid (MVE) estimators as the starting values. One may also fit the structural models to these estimators instead of just using them as starting values. However, these estimators tend to be less efficient than an M- or S-estimator. Actually, many observations get weights of zero in an estimator with a high breakdown point. When the sample size is not large enough, a high breakdown estimator may end up with a singular $\hat\bGamma$, which does not permit us to get valid statistics at the second stage analysis.

Although $T_{RML}$, $T_{AML}$, $T_{CRADF}$ and $T_{RF}$ are statistically justified and they perform well with complete data, and there is no reason for them to behave differently with missing data, further study about their finite sample behavior is still valuable, especially when the percentage of missing is large. For statistics that do not perform well with complete data, it is hard to imagine that they will perform well with missing data. Any such study should focus on statistics that perform well with complete data  (Bentler \& Yuan, 1999).


%\newpage

\begin{center}
Appendix A
\end{center}

This appendix provides the development and formulas for evaluating $\hat\bUpsilon$'s in (8b) with Huber-type weight. The formulas are programmed in the SAS and R programs introduced in section 4.

With the Huber-type weight, $w_{i3}(d_i)=1$. Then the estimating equations in (1) and (2) are derived from
$$
g_{i1}(\balpha)=w_{i1}(d\bnu_i)'\bV_i^{-1}(\bx_i-\bnu_i), \eqno(A1)
$$
and
$$
g_{i2}(\balpha)=\frac{1}{2}\tr\left\{\bV_i^{-1}(d\bV_i)\bV_i^{-1}[w_{i2}(\bx_i-\bnu_i)(\bx_i-\bnu_i)'-\bV_i]\right\}, \eqno(A2)
$$
where $d$ is for differentials.
It follows from (A1) and (A2) that
$$
dg_{i1}(\balpha)=
-w_{i1}(d\bnu_i)'\bV_i^{-1}(d\bnu_i)
-w_{i1}(d\bnu_i)'\bV_i^{-1}(d\bV_i)\bV_i^{-1}(\bx_i-\bnu_i)
+(dw_{i1})(d\bnu_i)'\bV_i^{-1}(\bx_i-\bnu_i)
\eqno(A3)
$$
and
$$
\begin{array}{ll}
dg_{i2}(\balpha)&=-\tr\left\{\bV_i^{-1}(d\bV_i)\bV_i^{-1}(d\bV_i)\bV_i^{-1}[w_{i2}(\bx_i-\bnu_i)(\bx_i-\bnu_i)']\right\}
\\[5mm]
&\displaystyle
+\frac{1}{2}
\tr\left\{\bV_i^{-1}(d\bV_i)\bV_i^{-1}(d\bV_i)\right\}
\\[5mm]
&\displaystyle
-\frac{1}{2}
\tr\left\{\bV_i^{-1}(d\bV_i)\bV_i^{-1}w_{i2}[(d\bnu_i)(\bx_i-\bnu_i)'+(\bx_i-\bnu_i)(d\bnu_i)']\right\}
\\[5mm]
&\displaystyle
+\frac{1}{2}
\tr\left\{\bV_i^{-1}(d\bV_i)\bV_i^{-1}[(dw_{i2})(\bx_i-\bnu_i)(\bx_i-\bnu_i)']\right\}.
\end{array}
\eqno(A4)
$$

Notice that both $w_{i1}$ and $w_{i2}$ are function of $d_i=[(\bx_i-\bnu_i)'\bV_i^{-1}(\bx_i-\bnu_i)]^{1/2}$, we have
$$
dw_{i1}(d_i)=\left\{ 
\begin{array}{ll}
0,   &{\rm if}\;d_i\leq \rho_i\\[5mm]\displaystyle
\frac{\rho_i}{d_i^3}[(d\bnu_i)'\bV_i^{-1}(\bx_i-\bnu_i)+.5(\bx_i-\bnu_i)'\bV_i^{-1}(d\bV_i)\bV_i^{-1}(\bx_i-\bnu_i)], &{\rm if}\;d_i>\rho_i
\end{array}
\right. 
\eqno(A5)
$$
and
$$
dw_{i2}(d_i)=\left\{ 
\begin{array}{ll}
0,   &{\rm if}\;d_i\leq \rho_i\\[5mm]
\displaystyle
\frac{\rho_i^2}{\kappa_i d_i^4}[2(d\bnu_i)'\bV_i^{-1}(\bx_i-\bnu_i)+(\bx_i-\bnu_i)'\bV_i^{-1}(d\bV_i)\bV_i^{-1}(\bx_i-\bnu_i)], &{\rm if}\;d_i>\rho_i
\end{array}
\right. 
\eqno(A6)
$$
Thus, when $d_i>\rho_i$, 
we have
$$
\begin{array}{ll}
dg_{i1}(\balpha)&=
-w_{i1}(d\bnu_i)'\bV_i^{-1}(d\bnu_i)
-w_{i1}(d\bnu_i)'\bV_i^{-1}(d\bV_i)\bV_i^{-1}(\bx_i-\bnu_i)
\\[5mm]&\displaystyle
+\frac{\rho}{d^3}
(d\bnu_i)'\bV_i^{-1}(\bx_i-\bnu_i)(\bx_i-\bnu_i)'\bV_i^{-1}(d\bnu_i)
\\[5mm]&\displaystyle
+\frac{\rho}{2d^3}
(\bx_i-\bnu_i)\bV_i^{-1}(d\bV_i)\bV_i^{-1}(\bx_i-\bnu_i)(\bx_i-\bnu_i)'\bV_i^{-1}(d\bnu_i)
\end{array}
\eqno(A7)
$$
and
$$
\begin{array}{ll}
dg_{i2}(\balpha)&=-\tr\left\{\bV_i^{-1}(d\bV_i)\bV_i^{-1}(d\bV_i)\bV_i^{-1}[w_{i2}(\bx_i-\bnu_i)(\bx_i-\bnu_i)']\right\}
\\[5mm]
&\displaystyle
+\frac{1}{2}
\tr\left\{\bV_i^{-1}(d\bV_i)\bV_i^{-1}(d\bV_i)\right\}
\\[5mm]
&\displaystyle
-\frac{1}{2}
\tr\left\{\bV_i^{-1}(d\bV_i)\bV_i^{-1}w_{i2}[(d\bnu_i)(\bx_i-\bnu_i)'+(\bx_i-\bnu_i)(d\bnu_i)']\right\}
\\[5mm]
&\displaystyle
+\frac{\rho^2}{\kappa_i d_i^4}
\tr\left\{\bV_i^{-1}(\bx_i-\bnu_i)(\bx_i-\bnu_i)'\bV_i^{-1}(d\bV_i)\bV_i^{-1}(\bx_i-\bnu_i)(d\bnu_i)'\right\}
\\[5mm]
&\displaystyle
+\frac{\rho^2}{2\kappa_i d_i^4}
\tr\left\{\bV_i^{-1}(\bx_i-\bnu_i)(\bx_i-\bnu_i)'\bV_i^{-1}(d\bV_i)\bV_i^{-1}
(\bx_i-\bnu_i)(\bx_i-\bnu_i)'\bV_i^{-1}(d\bV_i)\right\}.
\end{array}
\eqno(A8)
$$
Notice that, for matrices $\bA$, $\bB$, $\bC$ and $\bD$ of proper orders, there exists
$$
\tr(\bA\bB\bC\bD)=\ve'(\bD)(\bA\otimes \bC')\ve(\bB')=\ve'(\bD')(\bC'\otimes \bA)\ve(\bB). \eqno(A9)
$$ 
Let 
$$
\bb_i=\bV_i^{-1}(\bx_i-\bnu_i),
\;\;\;
\;\;\;
\bH_i=\bV_i^{-1}(\bx_i-\bnu_i)(\bx_i-\bnu_i)'\bV_i^{-1},
$$
$$
\bE_i=\frac{\partial\bnu_i}{\partial\bnu'}
\;\;\;
{\rm and}
\;\;\;
\bF_i=\frac{\partial\ve(\bV_i)}{\partial\ev'}.
$$
Using (A9), it follows from (A3) to (A8) that, when $d_i\leq\rho_i$, 
$$
\frac{\partial\bg_{i1}(\balpha)}{\partial\bnu'}=-\bE_i'\bV_i^{-1}\bE_i,
\;\;\;\;
\frac{\partial\bg_{i1}(\balpha)}{\partial\ev'}=-\bE_i'(\bV_i^{-1}\otimes\bb_i')\bF_i,
$$
$$
\frac{\partial\bg_{i2}(\balpha)}{\partial\bnu'}=-\frac{1}{\kappa_i}\bF_i'(\bb_i\otimes\bV_i^{-1})\bE_i,
\;\;\;\;
\frac{\partial\bg_{i2}(\balpha)}{\partial\ev'}=-\bF_i'[\frac{1}{\kappa_i}(\bH_i\otimes\bV_i^{-1})-\bW_i]\bF_i;
$$
and when $d_i>\rho_i$, 
$$
\frac{\partial\bg_{i1}(\balpha)}{\partial\bnu'}
=-\bE_i'(w_{i1}\bV_i^{-1}-\frac{\rho_i}{d_i^3}\bH_i)\bE_i
$$
$$
\frac{\partial\bg_{i1}(\balpha)}{\partial\ev'}=
-\bE_i'[w_{i1}(\bV_i^{-1}\otimes\bb_i')-\frac{\rho_i}{2d_i^3}
(\bH_i\otimes\bb_i')]\bF_i,
$$
$$
\frac{\partial\bg_{i2}(\balpha)}{\partial\bnu'}=
-\bF_i'[w_{i2}(\bb_i\otimes\bV_i^{-1})-\frac{\rho_i^2}{\kappa_i d_i^4}
(\bH_i\otimes\bb_i)]\bE_i,
$$
$$
\frac{\partial\bg_{i2}(\balpha)}{\partial\ev'}=
-\bF_i'[w_{i2}(\bH_i\otimes\bV_i^{-1})
-\bW_i-\frac{\rho_i^2}{2\kappa_i d_i^4}
(\bH_i\otimes\bH_i)]\bF_i.
$$

\newpage
\baselineskip=15.00pt 
\begin{center}
 Appendix B
\end{center}
\lstset{numbers=right}
\begin{lstlisting}[basicstyle={\ttfamily}]
library(rsem)
setwd("c:/rsem")
dset<-read.table("mardiaMV25.dat", header=T, na.string="-99")
dset<-data.matrix(dset)
ex1<-rsem_main(dset, c(1,2,4,5), "mcov.eqs") 

## Sample output from the above analysis.

Sample size n = 88 
Total number of variables p = 5 

The following variables are selected for SEM models 
Mechanics Vectors Analysis Statistics 

There are  2  missing data patterns. They are 
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]   57    5    1    2    3    4    5
[2,]   31    3    1    3    4    2    5

Estimated means: 
[1] 39.18057 50.90668 47.20384 41.05387

Estimated covariance matrix: 
     Mechanics  Vectors Analysis Statistics
[1,]  289.3151 124.7135 105.4223   102.6819
[2,]  124.7135 182.4506  95.7065   108.5030
[3,]  105.4223  95.7065 202.1062   180.7338
[4,]  102.6819 108.5030 180.7338   373.3670

Fit statistics:
            T       p
RML   1.37630 0.71109
AML   1.21980 0.74826
CRADF 1.42660 0.69930
RF    0.47245 0.70228

Parameter estimates:
            Parameter         RSE          z
(E1,E1)   180.6958200 30.85086800  5.8570741
(E2,E2)    41.5747190 27.15030900  1.5312798
(E3,E3)    10.5451330 55.17224100  0.1911311
(E4,E4)   203.8083700 41.22027600  4.9443718
(D1,D1)    87.6602940 34.17143500  2.5653091
(D1,D2)    78.8122260 23.54614800  3.3471388
(D2,D2)   192.6949000 53.88817100  3.5758293
(F1,V999)  39.4471310  1.69824790 23.2281347
(F2,V999)  47.1918250  1.52386050 30.9685992
(V2,F1)     1.2892982  0.04580923 28.1449413
(V4,F2)     0.8755543  0.02442287 35.8497667
\end{lstlisting}

\newpage
\begin{center}
Appendix C
\end{center}

\begin{lstlisting}[basicstyle={\ttfamily}]
/TITLE
EQS 6.1: Mean and covariance structure analysis. 
The file name is mcov.eqs.
/SPECIFICATION 
weight='weight.txt'; 
cases=88; variables=4; matrix=covariance; 
analysis=moment; methods=ML, robust; data='data.txt';
/LABELS
V1=Mechanics; V2=Vectors; V3=Analysis; V4=Statistics;
/EQUATIONS
V1=  F1+E1;
V2= *F1+E2;
V3=  F2+E3;
V4= *F2+E4;
F1= *V999+D1;
F2= *V999+D2;
/VARIANCES
E1-E4= *;
D1=*; 
D2=*;
/COVARIANCES
D1,D2= *;
/TECHNICAL
conv=0.0001;
itera=100;
/Means
/OUTPUT
CODEBOOK;
DATA='mcov.ETS';
PARAMETER ESTIMATES;
STANDARD ERRORS;
LISTING;
/END
\end{lstlisting}

\newpage

\begin{center}
Appendix D
\end{center}

\begin{verbatim}
/TITLE
EQS 6.1: Covariance structure analysis. The file name is cov.eqs
/SPECIFICATION 
weight='weight.txt'; 
cases=88; variables=4; matrix=covariance; 
analysis=covariance; methods=ML, robust; data='data.txt';
/LABELS
V1=Mechanics; V2=Vectors; V3=Analysis; V4=Statistics;
/EQUATIONS
V1=  F1+E1;
V2= *F1+E2;
V3=  F2+E3;
V4= *F2+E4;
/VARIANCES
E1-E4= *;
F1=*; 
F2=*;
/COVARIANCES
F1,F2= *;
/TECHNICAL
conv=0.0001;
itera=100;
/OUTPUT
CODEBOOK;
DATA='cov.ETS';
PARAMETER ESTIMATES;
STANDARD ERRORS;
LISTING;
/END
\end{verbatim}



\newpage

\baselineskip=18pt
\def\bibb{\hangindent=.5 true cm\hangafter=1\noindent}
\def\bib{\vskip3pt\hangindent=.5 true cm\hangafter=1\noindent}
\begin{center}
References
\end{center}

\bib 
Arminger, G., \& Sobel, M. E. (1990). Pseudo-maximum likelihood
estimation of mean and covariance structures with missing data.
{\em Journal of the American Statistical Association, 85}, 
195--203.

\bib
Bentler, P. M. (2008). {\em EQS 6 structural equations program
manual}. Encino, CA: Multivariate Software.

\bib
Bentler, P. M., \& Yuan, K.-H. (1999).
Structural equation modeling with small samples: Test statistics.  {\em Multivariate Behavioral Research}, {\em 34}, 181--197.

\bib
Browne, M. W. (1984). Asymptotic distribution-free methods for
the analysis of covariance structures.
{\em British Journal of Mathematical and Statistical Psychology, 37}, 62--83.

\bib
Cheng, T.-C., \& Victoria-Feser, M.-P. (2002). High-breakdown estimation of multivariate mean and covariance with missing observations. {\em British Journal of Mathematical and Statistical Psychology, 55}, 317-ñ335.

\bib
Lee, S.Y., \& Xia, Y.M. (2006). Maximum likelihood methods in treating outliers and symmetrically heavy-tailed distributions for nonlinear structural equation models with missing data. {\em Psychometrika, 71}, 565--585.

\bib 
Little, R. J. A. (1988). Robust estimation of the mean and covariance
matrix from data with missing values.
{\em Applied Statistics}, {\em 37}, 23--38.

\bib
Liu, C. (1997). ML estimation of the multivariate $t$ distribution and the EM algorithm. {\em Journal of Multivariate Analysis, 63}, 296--312.

\bib
Lopuha\"{a}, H. P. (1989). On the relation between S-estimators and
M-estimators of multivariate location and covariances. {\em Annals of
Statistics, 17}, 1662--83.

\bib
Mair, P., Wu, E., \& Bentler, P. M. (2010). EQS Goes R: Simulations for SEM Using the Package REQS.
 {\em Structural Equation Modeling: A Multidisciplinary Journal, 17}, 333--349.

\bib
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis
with applications. {\em Biometrika, 57}, 519--530.

\bib 
Mardia, K. V., Kent, J. T., \& Bibby, J. M. (1979).
Multivariate analysis. New York: Academic Press.

\bib
Micceri, T. (1989).
The unicorn, the normal curve, and other improbable creatures. {\em 
Psychological Bulletin, 105}, 156--166.

\bib
Rocke, D. M. (1996). Robustness properties of S-estimators of multivariate location and shape in
high dimension. {\em Annals of Statistics, 24}, 1327--1345.

\bib
Rubin, D. B. (1976). Inference and missing data (with discussions). {\em  Biometrika, 63}, 581--592.

\bib
Savalei, V., \& Bentler, P. M. (2009). A two-stage ML approach to missing data: theory and application to auxiliary variables. {\em Structural Equation Modeling, 16}, 477--497. 

\bib 
Yuan, K.-H. (2010).
Expectation-robust algorithm and estimating equation for means and covariances with missing data. Under review.

\bib
Yuan, K.-H., \& Bentler, P. M. (2000).
 Three likelihood-based methods for mean and covariance
structure analysis with nonnormal missing data.
{\em Sociological Methodology, 30}, 167--202.

\bib
Yuan, K.-H., \& Bentler, P. M. (2001). 
A unified approach to multigroup structural equation modeling
with nonstandard samples. In
G. A. Marcoulides \& R. E. Schumacker (Eds.),
{\em Advanced structural equation modeling: New developments and techniques} (pp.\ 35--56). Mahwah, NJ: Lawrence Erlbaum Associates.

\bib
Yuan, K.-H., \& Bentler, P. M. (2010).
Consistency of normal distribution based pseudo maximum likelihood estimates when data are missing at random. {\em American Statistician, 64}, 263--267.

\bib
Yuan, K.-H.,  Bentler, P. M., \& Chan, W. (2004).
Structural equation modeling with heavy tailed distributions. {\em Psychometrika, 69}, 421--436.

\bib
Yuan, K.-H., Chan, W., \& Bentler, P. M. (2000). 
Robust  transformation with applications to structural equation 
modeling. {\em British Journal of Mathematical and Statistical Psychology}, {\em 53}, 31--50.

\bib
Yuan, K.-H., \& Lu, L. (2008).
SEM with missing data and unknown population using two-stage ML:
Theory and its application. {\em Multivariate Behavioral Research, 62}, 621--652. 

\bib
Yuan, K.-H., Marshall, L. L., \& Bentler, P. M. (2002).
A unified approach to exploratory factor analysis with 
missing data, nonnormal data,
and in the presence of outliers. {\em Psychometrika, 67}, 95--122.

\bib
Yuan, K.-H., Wallentin, F., \& Bentler, P. M. (2009).
ML versus MI for missing data with violation of distribution conditions.

\bib
Zhong, X. (2008).
{\em Effects of outlying observations on standard errors in factor analysis}.
Master degree thesis. University of Notre Dame.

\bib
Zu, J., \& Yuan, K.-H. (2010).
Local influence and robust procedures for mediation analysis. {\em Multivariate Behavioral Research, 45}, 1--44.

\end{document}