Wednesday, July 17, 2013

'parCorPerm' - Permutational (non-parametric) significance test of multivariate pairwise partial correlations.



'parCorPerm'

Permutational significance test of multivariate pairwise partial correlations. 

The input data matrix is a set of variables for which partial correlations of each variable with a response variable, given all the others, is calculated.

You can do this for 'standard' data using R package 'ppcor', which is used here to calculate the correlation coefficients and must be loaded. The permutation test is a useful alternative for calculating statistical significance, for example in the case of variables that are distances where standard assumptions of the independence of data points is violated.


I adapted this function from a related one for permutational significance testing of 2 variable correlations 'corPerm' by Pierre Legendre, see http://adn.biol.umontreal.ca/~numericalecology/Rcode/.

The function tests partial correlations with one response variable of interest (a historical artefact of my need for this function) and it assumes (as written here) the response variable is represented by the final (right hand) column of the input data matrix.

Reference: Guerin, G.R., Biffin, E. Jardine, D.I., Cross, H.B. & Lowe, A.J. (2014) A spatially predictive baseline for monitoring multivariate species occurrences and phylogenetic shifts in Mediterranean southern Australia. Journal of Vegetation Science 25, 338–348

require(ppcor)

#set up functions
sample_func <- function(x)
{
sample(x, length(x))
}


PFUNCTION <- function(x, y, z)
{
if(x > y) z <- z+1
return(z)
}


parCorPerm <- function(x,nperm) #where x is a data.frame with variables in columns and the response of interest in the final (RH) column; and nperm is the desired number of permutations
{
r.ref <- spcor(x) #calculate reference partial correlation on raw data
temp.ref <- as.data.frame(r.ref$estimate) #extract the coefficients

nGT <- as.data.frame(as.vector(rep(1, ncol(x)))) # a vector of 1s same length as number of variables
colnames(nGT) <- "value"

response <- as.data.frame(x[,ncol(x)]) #isolate the response column which must be the last one
number <- as.numeric(ncol(x)-1) #calcuate number of columns without the response
x <- as.data.frame(x[,1:number]) #extract just the variables, exclude the response


for(i in 1:nperm)
{
y.perm <- as.data.frame(apply(x, 2, sample_func)) #permute variables
test_data <- cbind(y.perm, response) #join permuted with unpermuted response
r.perm <- spcor(test_data) #calculate correlation coefficient with permuted variables
temp.perm <- as.data.frame(r.perm$estimate) #extract coefficients
nGT$value <- mapply(PFUNCTION, temp.perm[, ncol(temp.perm)], temp.ref[, ncol(temp.ref)], nGT$value)

}
P <- nGT$value/(nperm+1)
cat('\nPearson partial correlation of response variable to each explanatory variable (i.e. given the others)','\n')
cat('Prob (',nperm,'permutations) =',P,'\n','\n')
return(list(Variables=c(colnames(x), "Response"), Correlation=temp.ref[, ncol(temp.ref)], No.perm=nperm, P.perm=P ))

}