lab:people:zwang:2011_tasks [Zhiyong Johnny Zhang]

Week of August 30

20 hours on mediation project

<todo #>Submit simulation for MAR condition. It is weird some of the jobs running in 16 cores are not allowed. I change the parameter to 12 and resubmit it.</todo>

<todo #>Provide an R alternative code for What Ke-Hai gave in SAS in class.</todo>

<todo #>Do the reading in Jeanne's class (still ongoing). It's so demanding that I guess this course is more burden than help for me. </todo>

<todo #>One paragraph of writing each day</todo>

<todo #>If have time, write out the simulation design

Week of Sep 5

<todo>Write an introduction part for the robust mediation analysis for missing data.</todo> (I spent ?? hour writing this, ?? hours read MacKinnon's 2001 paper)

<todo #>What is mediation</todo> A behavioral psychologist may be interested in whether expectations start a self-fulfilling prophecy that affects behavior. Or he/she is more interested in whether physical abuse in early childhood leads to deviant processing of social information that leads to aggressive behavior. Exploring any potential causal relation between an independent variable and a dependent variable is when mediation analysis comes. Mediating variable transmits the effect of the independent variable as a stimulus to the dependent variable as a response (Hebb 1966, MacKinnon 2007).

<todo #>Why need mediation</todo> Not exclusively in behavioral psychology, mediation model is widely applied to multiple research subjects in psychology and science. Part of related work is on theory development while the remaining also gives a possibility to applying to practical prevention and treatment intervention (Khoo 2001).

<todo #>Current development on robust mediation</todo> A typical example of mediation model is shown in Figure 1, where M is a possible mediator between independent variable X and dependent variable Y. In this path graph, the key path is the indirect one X→M→Y. If M does intervene in the effect of X on Y, then the product of path weight a, b should be significantly different from zero. Basically, a and b are regression coefficients which are part of the components in two regression equations M=a*X, Y=c’*X+b*M. From the perspective of regression, we can easily understand the existence of outliers may change the slope of the regession line dramatically. In other words, outliers in research data may make the hypothesis test less powerful or even invalid any more. This issue stems from the violation of the homoscedasticity assumption on the error component in multiple regressions. Maximum Likelihood Estimator is a common tool to obtain the path weights in the mediation model. Howerver, this approach needs a clearer assumption of on the data, normality. If the data have outliers involved, we cannot obtain an efficient estimator from the mediation model. Thus the robust method that down-weights the outliers can be employed to alleviate the non-normal problem.

<todo #>Current development on missing data mediation analysis</todo> Besides the outlier playing a role of data spoiler, the missing part of the data also pose a negative influence on the process of parameter estimation in mediation analysis. Although we cannot know the missing entries in data forever, there are some techniques that have been developed in order to solve this issue. Before delving in more detail of these techniques, we need to first introduce the missing pattern briefly. There are three types of missing patterns, regardless the specific reasons why this part of data is lost. They are Missing Complete At Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR). A possible MCAR case is the researcher drops part of the data. MAR and MNAR are more common in real life. Suppose we have some participant to do a questionnaire. If a question on the questionnaire is accidentally omitted by the participants, this missingness is MAR; if it is skipped deliberately (maybe because the participants are about to run out of time), this case is MNAR. The first two patterns of missingness are both ignorable, because whether an item is missing or not is not related to the latent feature(s) of the participants. There are some sophisticate methods to overcome the problem. Full information likelihood (FIML) method and Multiple Imputation are two efficient approaches among them. When it comes to the MNAR data, the situation turns out to be more complicated.Some studies assume auxiliary variables are needed to fulfill the mediation analysis, while the other hold the point of view that all the information regarding the missingness is all in the existent data.

<todo #>What is the purpose of the current study</todo> In our study, we try to give a integrated method to solve the issue when there are both non-normality and missingness problem in the data. To make the model simple and clear, we only take MCAR and MAR as the missing patterns into account.

<todo #>If have time, write out the simulation design</todo> The model is the same one as shown in Fig. 1. In our simulation, c is set to be 0, which means there is no direct effect of X on Y. The indirect effects a and b are set to 0 and .39 equally to represent the cases when there are no mediation effects and medium effects, respectively. All variance and residual variance equal to 1. Sample sizes considered here are 100, 300 and 500. The proportions of missingness include 10%, 20% and 30% while the rates of outliers include 6% and 10%.

<todo #>Fix the problem of jobs waiting in the queue forever</todo>

Week of 9.23

<todo #>verify simulation results for MAR conditions: modify the codes and analyze the new result: e.g., how parameters change (6hr)</todo>

<todo #>literature review work from “computational statistics and data analysis ” and “behavioral research method”. Send to johnny the title and page number of the articles to read before reading in depth. write a double-space 1-page summary for each. (8hr reading and 6hr for writing)</todo>

<todo #>Read paper “Predicting unobserved links in incompletely observed networks”: This is an interesting paper regarding discovering hidden cliques among nations. The main data structure employed is undirected graph G=(V,E). The probability of a potential edge is hypothesized to be proportional to the dot product of the latent feature vector appending some covariates. The procedure of the maximization of likelihood function is done via a repeated process similar to EM. But I think the authors have not carefully considered how to improve this model training procedure. The reassignment of a vertex to a group can be made more seriously. Also the time series nature can be explored, which presents the data as a Markov Chain. The results are shown mainly on the estimate of retrieval rate (edges found/edges checked) and its corresponding CI. Another contribution of this paper is to use the covariates to improve the edge finding. This is proved effective in the case of relatively sparse social network</todo>

<todo #>Read paper “A stochastic lie detector”. The reasons I chose this one are A: lie's sensitive and worth investigating; B: RRT seems to gain popularity in psychology these days. The issue addressed in this paper is part of the participants respond dishonestly in questionnaire. There are two methods in history to deal with this issue in order to obtain a better estimate of the true prevalence rate (e.g. for domestic violence or drug use). The new methods proposed by the authors is taking into account dishonest response proportion. A ML estimator is given to demonstrate the effectiveness in a comparison with the tradition direct question means and the later no dishonest rate RRT. I don't want to insert any formula here because they are quite straight-forward. One limitation the author admits is the way to test the accuracy of the estimate is not good enough. It's doubtful if the new estimator is inflated. But the good thing is anyway we get some sense of the proportion of dishonest response that is unavailable in any previous work.</todo>

Week of October 10

<todo>find out the clear definition of social network (postponed until the talk with Alison)</todo>

<todo>do a survey on the existed application of pagerank (postponed until the talk with Alison)</todo>

<todo>investigate whether or not pagerank can be applied in psychology (postponed until the talk with Alison)</todo>

<todo>sem may also expect a fusion of pagerank (postponed until the talk with Alison)</todo>

<todo #>find some references regarding social networks in sociology and business</todo>

<todo #>resubmit the job on crc using the set 'smp 8'. Before that, I tried mpi but it failed.</todo>

<todo>find out if there's any doable IRT project</todo>

Table of Contents

Week of August 30

Week of Sep 5

Week of 9.23

Week of October 10