Friday, February 21, 2014

Automated way to apply a function across a set of variables without losing their names


There are plenty of options in R for repeating the same function across multiple inputs. A common case is where you need to repeat a function across the columns of a data frame and don't want to do it manually. You can use 'apply' or its variants. But because 'apply' treats the columns as vectors, it initially strips the column names, which therefore don't appear on plots, for example. This  is annoying when you have many variables, or want to produce labels automatically.

Below is a simple example of an alternative that keeps the columns names in the output.

1) box plotting a series of variables against the same factor (set up so that the grouping factor is the first column):


df <- data.frame(groups=rep(c("a", "b"), times=10), bananas=rnorm(20, mean=10, sd=3), soup=c(1:20), words=runif(20, min=50, max=80))

df

plots <- function(frame) {
rep <- 2

d <- ncol(frame)-1

for (i in 1:d) {

boxplot(frame[,rep]~frame$groups, col=c("brown", "orange"), range=0, main=colnames(frame[,rep, drop=FALSE]))

rep <- rep+1

}
}

plots(df)



Example of one of the plots produced with this function

#

Compare to the same thing using apply:

plots2 <- function(x) {
boxplot(x~df$groups, col=c("brown", "orange"), range=0, main=colnames(x))
}

apply(df[2:4, drop=FALSE], 2, plots2)

[You get the plots without labels, plus the error message: "In `[.data.frame`(df, 2:4, drop = FALSE) : drop argument will be ignored", because this is trying to keep the names, when apply doesn't want to]



#

2) On the other hand, in the example of applying the same model to different variables, 'apply' seems to work fine, as it returns the column names to the output:

models <- function(x, family, data) {
summary(glm(x~groups, family=family, data=data))

}


apply(df[,2:4, drop=FALSE], 2, models, family=gaussian, data=df)


Note also with 'apply', you can use it on a function with multiple arguments (inputs) if you specify them as above, assuming the additional arguments are always the same.