Estimate univariate and multivariate sample skewness and kurtosis in popular statistical software

This tutorial explains four different methods for calculating univariate and multivariate skewness and kurtosis in your data: an online calculator that allows the use of Excel, SPSS, and SAS data sets directly as well as text and csv files, an SPSS program, a SAS program, and instructions for STATA.

Online calculator

This introduces the online tool to calculate univariate/multivariate skewness and kurtosis at http://webpower.psychstat.org/models/kurtosis/

The interface of the tool looks like

Data file

The data file can be chosen by clicking the Choose File button (it might appear differently for different browsers). We DO NOT save your data file and it is deleted immediately after calculation.

Type of data

The following types of data are allowed:

SPSS data file with the extension name .sav
SAS data file with the extension name .sas7bdat
Excel data file with the extension name .xls or .xlsx
CSV file (comma separated value data file) with extension name .csv
TXT file (text file) with extension name .txt

Select variables

A subset of variables can be used. To use the whole data set, leave this field blank. To select a subset of variables, provide the column numbers that separated by comma (,). For example

1, 2-5, 7-9, 11

will select variables 1, 2, 3, 4, 5, 7, 8, 9, 11

Missing data

Missing data values can be provided. If multiple values are used to denote missing data, they can be separated by comma (,). For example,

-999, -888, NA

will replace all three values above to missing data.

R

R functions for the calculation

library(MASS)
skewkurt<-function(x, na.rm=TRUE){
    if (na.rm) x <- x[!is.na(x)]
    n <- length(x)
    skew1 <- (sum((x - mean(x))^3)/n)/(sum((x - mean(x))^2)/n)^(3/2)
    skew2 <- n/((n-1)*(n-2))*sum((x - mean(x))^3)/(sd(x))^3
    skew.se <- sqrt(6*n*(n-1) / ((n-2)*(n+1)*(n+3)))

    temp<- sum((x - mean(x))^4)/(sum((x - mean(x))^2)^2)
    kurt1 <- n * temp -3     
    kurt2 <- n*(n+1)*(n-1)/((n-2)*(n-3))*temp - 3*(n-1)^2/((n-2)*(n-3))
    kurt.se <- sqrt(4*(n^2-1)*skew.se^2 / ((n-3)*(n+5)))
    list(skew1=skew1, skew2=skew2, skew.se=skew.se, kurt1=kurt1, kurt2=kurt2, kurt.se=kurt.se)
}


multi.skewkurt<-function (x, na.rm = TRUE){
    x <- as.matrix(x)
    if (na.rm) 
        x <- na.omit(x)
    n <- dim(x)[1]
    p <- dim(x)[2]
    x <- scale(x, scale = FALSE)
    S <- cov(x)*(n-1)/n
    #S.inv <- solve(S)
    S.inv <- ginv(S)
	D <- x %*% S.inv %*% t(x)
    b1p <- sum(D^3)/n^2
    b2p <- tr(D^2)/n
    chi.df <- p * (p + 1) * (p + 2)/6
    k <- (p + 1) * (n + 1) * (n + 3)/(n * ((n + 1) * (p + 1) - 
        6))
    small.skew <- n * k * b1p/6
    M.skew <- n * b1p/6
    M.kurt <- (b2p - p * (p + 2)) * sqrt(n/(8 * p * (p + 2)))
    p.skew <- 1 - pchisq(M.skew, chi.df)
    p.small <- 1 - pchisq(small.skew, chi.df)
    p.kurt <- 2 * (1 - pnorm(abs(M.kurt)))

    results <- list(n.obs = n, n.var = p, b1p = b1p, b2p = b2p, 
        skew = M.skew, small.skew = small.skew, p.skew = p.skew, 
        p.small = p.small, kurtosis = M.kurt, p.kurt = p.kurt)
    return(results)
}

mardia<-function(x, na.rm=TRUE){
	if (na.rm) 
	x <- na.omit(x)
	n <- dim(x)[1]
	p <- dim(x)[2]
	
	uni <- function(x){
		n <- length(x)
		xbar <- mean(x)
		m2 <- sum((x-xbar)^2)/n
		m3 <- sum((x-xbar)^3)/n
		m4 <- sum((x-xbar)^4)/n

		skewness <- sqrt(n*(n-1))/(n-2)*m3/m2^1.5
		kurtosis <- (n-1)/((n-2)*(n-3))*((n+1)*(m4/m2^2-3)+6)
		
		skew.se <- sqrt(6*n*(n-1) / ((n-2)*(n+1)*(n+3)))
		kurt.se <- sqrt(4*(n^2-1)*skew.se^2 / ((n-3)*(n+5)))
		c(skewness, skew.se=skew.se, kurtosis, kurt.se)
	}
	
	univariate <- apply(x, 2, uni)
	rownames(univariate) <- c('Skewness', 'SE_skew', 'Kurtosis', 'SE_kurt')
	
	x <- scale(x, scale = FALSE)
	S <- cov(x)*(n-1)/n
	S.inv <- ginv(S)
	D <- x %*% S.inv %*% t(x)
	b1p <- sum(D^3)/n^2
	b2p <- sum(diag(D^2))/n
	chi.df <- p * (p + 1) * (p + 2)/6
	k <- (p + 1) * (n + 1) * (n + 3)/(n * ((n + 1) * (p + 1) - 6))
	small.skew <- n * k * b1p/6
	M.skew <- n * b1p/6
	M.kurt <- (b2p - p * (p + 2)) * sqrt(n/(8 * p * (p + 2)))
	p.skew <- 1 - pchisq(M.skew, chi.df)
	p.small <- 1 - pchisq(small.skew, chi.df)
	p.kurt <- 2 * (1 - pnorm(abs(M.kurt)))
	
	multivariate <- rbind(c(b1p, M.skew, p.skew), c(b2p, M.kurt, p.kurt))
	rownames(multivariate) <- c('Skewness', 'Kurtosis')
	colnames(multivariate) <- c('b', 'z', 'p-value')

	results <- list(n.obs = n, n.var = p, univariate = univariate, multivariate=multivariate)
	
	cat('Sample size: ', n, "\n")
	cat('Number of variables: ', p, "\n\n")
	
	cat("Univariate skewness and kurtosis\n")
	print(t(univariate))
	
	cat("\nMardia's multivariate skewness and kurtosis\n")
	print(multivariate)
	cat("\n")
	
	invisible(results)
}

SPSS

An SPSS macro developed by Dr. Lawrence T. DeCarlo needs to be used. We have edited this macro to get the skewness and kurtosis only.

First, download the macro (right click here to download) to your computer under a folder such as c:\Users\johnny\.

Second, open a script editor within SPSS

Third, in the script editor, type the following

INCLUDE file='C:\Users\johnny\mardia.sps'.
mardia vars=V2 V3 /.
execute.

Note that you will need to change the folder to the SPSS macro file you just downloaded. Also, the vars to use to calculate the skewness and kurtosis should be changed from V2 V3 to your variables.

Finally, the output is shown like

SAS

A SAS macro can be used to calculate multivariate skewness and kurtosis. We have edited this macro to get the skewness and kurtosis only.

First, download the macro to your computer, e.g., to C:\Users\johnny\. right click here to download the macro

Second, in the sas script editor, type

%inc "C:\Users\johnny\mardia.sas";
%mardia(data=testdataset, var=V2 V3)

The first line provides the sas macro file location. In the second, one needs to specify the data and variables to use.

An example is shown below

data cork;
           input n e s w @@;
           datalines;
         72 66 76 77   91 79 100 75
         60 53 66 63   56 68 47 50
         56 57 64 58   79 65 70 61
         41 29 36 38   81 80 68 58
         32 32 35 36   78 55 67 60
         30 35 34 26   46 38 37 38
         39 39 31 27   39 35 34 37
         42 43 31 25   32 30 30 32
         37 40 31 25   60 50 67 54
         33 29 27 36   35 37 48 39
         32 30 34 28   39 36 39 31
         63 45 74 63   50 34 37 40
         54 46 60 52   43 37 39 50
         47 51 52 43   48 54 57 43
         ;

         %inc "C:\Users\johnny\mardia.sas";
         %mardia(data=cork, var=n e s w)

The sample output looks like this

STATA

Univariate skewness and kurtosis can be calculated in STATA along with other descriptive statistics by adding detail as an option to the summarize command:

summarize var1 var2 var3 var4, detail

Just change var1, var2, etc. to the variables of interest in your data set. Skewness and kurtosis will appear in the lower right corner of each variable's output

Multivariate skewness and kurtosis can then be calculated using the following syntax:

mvtest normality var1 var2 var3 var4, stats(skewness kurtosis)

Again, specify the appropriate variable names. The sample output will appear as

FAQ

Can you calculate the statistics for me?

Yes! If you send us the data, we can do the calculation by ourselves and delete the data from our computer once it is done.

Which variables should I use?

We are interested in the skewness and kurtosis of all the continuous variables (including ordinal ones such as likert scale variables) you have used in the data analysis of your paper. Therefore, do not include categorical, binomial, or proportional data, etc. Note that this means to not include ID, condition #, or other such variables in your calculations. If you have used composite scores such as mean or sum scores, please calculate skewness and kurtosis for them instead of the item scores.

Should I include reverse codings?

If items have been reverse-coded, please make sure to only include one or the other.

What if I didn't use any continuous variables in my paper?

If there are no suitable variables from your paper then please ignore our email. Nevertheless, we appreciate your interest in our study!

What if my paper has more than one study?

Please calculate skewness and kurtosis separately for each study. This is especially important for multivariate skewness and kurtosis as we only want those measures of a group of variables used together in the same analysis.

What if my data is in a format not listed here?

Convert your data to csv format and use the online calculator. Please let us know if you run into trouble.

What criteria did you use for selecting papers?

We are interested in the skewness/kurtosis of all papers published in Psychological Science and the American Educational Research Journal in the past 5 years.

2014/04/03 11:43 · johnny · 0 Comments · 0 Linkbacks

Useful statistical concepts

Cramér–Rao bound

2014/03/13 15:08 · johnny · 0 Comments · 0 Linkbacks

Best (my) way to use CRC clusters

Software needed

PuTTY https://www.putty.org/
Notepad++ https://notepad-plus-plus.org/downloads/
FileZilla https://filezilla-project.org/download.php?type=client

If you are using a Mac, use the following similar software

Terminal (already installed) to replace PuTTY.
TextWrangler to replace Notepad++
FileZilla can be equally used on a Mac.

Upload files using FileZilla

See the screen below:

Host: crcfe01.crc.nd.edu
Username: your netid
Password: your password
Port: 22
Click Quickconnect
On the remote site (right panel), browse to the Private folder. Typically create folders and files under the Private folder.
On the Local site (left panel), choose the files/folders to upload and drag them to the remote site folder.

Use PuTTY

See the screen below:

Steps

Host name: crcfe01.crc.nd.edu
Port: 22
Connection type: ssh
You might save the information by clicking “Save” button.
Click on Open button
In the new window
- log in as: Your netid
- password: your password (note that you will not say anything when typing. just finish typing your password).
- Then you should login.

Most useful commands

pwd : print the current directory
ls : list the file and folders
- ls -alh: list all the information
cd dir : change to a new directory (dir)
- cd .. : go to last level of directory
- cd : go the the initial directory you log in
mkdir dir : create a new directory called dir
rm file1 : delete a file
rm -r dir : delete a directory
nano : a text editor.
chmod g+s dirname: file created belongs to the group.
setfacl -d -m g::rwx mathfeud : set the permission for the group

Use Notepad++

Use the ftp of Notepad++ to edit files directly.
Show the ftp windows by the menu: Plugins – nppFTP – show NppFTP window (see the screen below)

On the ftp window, click the settings button and then Profile settings (see the screen below)

Then in the new window
- Click “Add new” on the bottom left corner
- Give a name, e.g., CRC
- Then input the information required
  - Hostname: crcfe01.crc.nd.edu
  - Connection type: SFTP
  - Port: 22
  - Username: your netid
  - Password: leave empty and check “Ask for password”
  - Initial remote directory: the output of pwd commend in putty.
  - close the window

Connect to ftp. Click the connection button and choose CRC as shown below.
- Input your password when asked.
- Your files will be listed.
- To edit a file, click on it and it will open in Notepad++.
- You can also right click to create/delete a directory, create/delete a file.
- After editing, save the file directly using Notepad++.

Submit a job (SGE)

For more information, go to http://wiki.crc.nd.edu/wiki/index.php/Submitting_Batch/SGE_jobs

Log in using Putty
Create a new folder -test- within Putty using command
```
mkdir test
```
or do that within Notepad++ using the FTP plugin
Go the new folder test.
Create a test R file with the following content. You can use Notepad++. If using Putty, you can use the
```
nano test.R
```
The content of the R file is
```
rnorm(100)
```
- When using nano, Ctrl+O saves the file and Ctrl+X quits the edits.
Create another file (a submission file) called test.sh. In the file, include the following information (change the email address to yours)
```
#!/bin/csh
#$ -M xxxx@nd.edu
#$ -m abe
#$ -r y

module load R

R CMD BATCH test.R
```
To submit a job, in Putty (has to use Putty or other SSH client), type the command
```
qsub test.sh
```
Your job should be submitted now.
To check the status of your job, use
```
qstat -u netid
```
Your job should already finished by now, so you won't see anything for this simply example.
You can list all the files using command ls. You will see a file called test.Rout. Any error can also be found in it.
There will also be a file called test.sh.oxxxx (xxxx is a number). If no error, the size of it should be 0 typically.
If there is error in your job, you will see a file like test.sh.exxxx.
To use SAS, change module load R to module load sas. All available module can be found using module avail
If the test job works fine, you can start your regular jobs!

Submit an array job (SGE)

Sometimes, it might be convenient to split a big jobs to smaller ones. Preferably, each small job runs within its own folder. For example, one might have a sas job in the following folders

sasjob1
sasjob2
sasjob3
sasjob4

Within each folder, there is a sas file called runsas.sas. To submit such jobs, use the batch file with contents below

#!/bin/csh
#$ -M zzhang4@nd.edu
#$ -m abe
#$ -r y
#$ -t 1-4

module load sas
cd sasjob$SGE_TASK_ID

sas runsas.sas

If one prefers to put all small jobs in one folder like

sasjob1.sas
sasjob2.sas
sasjob3.sas
sasjob4.sas

Then the following script file can be used

#!/bin/csh
#$ -M zzhang4@nd.edu
#$ -m abe
#$ -r y
#$ -t 1-4

module load sas

sas sasjob$SGE_TASK_ID.sas

Debug

To reset the CRC environment, use

/opt/crc/usr/local/bin/crc_setup

2014/02/25 12:05 · johnny · 0 Comments · 0 Linkbacks

Barplot with error bars, annotated by Kruskal-Wallis or ANOVA p-value

Use the WGCNA package.

To install, use the command

install.packages('WGCNA')

This package requires the installation of the impute package that can be installed using

source("http://bioconductor.org/biocLite.R")
biocLite("impute")

An example:

The code used

verboseBarplot(height,group,numberStandardErrors=2,AnovaTest=TRUE, main="2 SE, Anova", ylim=c(0, 2.5))

2013/09/05 09:59 · johnny · 0 Comments · 0 Linkbacks

Install an external program within R

This is an example to download and install an external program / software within R.

dotRcheck <- function(){
  #Check for audiolyzR installation
  directory<-getwd()
  if (.Platform$OS.type=="windows"){
      ANSWER <- readline("Do you want to install Graphviz for path diagram generation? \nType y or n\n")
      if (substr(ANSWER,1,1)=="y"){
        tmpdir <- getwd()
        destfile<-paste(tmpdir, "graphviz-2.30.1.msi", sep="/")
        download.file("http://www.graphviz.org/pub/graphviz/stable/windows/graphviz-2.30.1.msi", destfile=destfile)
        shell(paste("msiexec /a ",destfile))              
      }
  }
}

2013/03/01 09:22 · johnny · 0 Comments · 0 Linkbacks

<< Newer entries | Older entries >>

Note. Everything on this blog only reflects my personal view which may or may not be true and is not related to any organization or institute.

Table of Contents