R tools and scripts for vegetation science and ecology: 2015

Friday, July 10, 2015

Function for calculating habitat fragmentation indices/stats for particular coordinates

Here is a function that calculates 'fragstats' for user-defined circular buffers around sites defined by XY coordinates. It is essentially a wrapper for the 'ClassStat' function (from package SDMTools) with the purpose of a) implementing calculation for a circular buffer zone defined by a radius in distance units rather than a rectangular area such as defined by a raster, and b) allowing batch processing with coordinates from multiple sites.

The function may be useful if you wish to analyse habitat fragmentation indices at 1 or more scales/radii around a set of point locations, rather than across an entire landscape.

The input data are site coordinates and a RasterLayer object that contains habitat/vegetation presence categories defined by integers (absences may be NAs).

buffer.frags.R [https://github.com/GregGuerin/biomap/raw/master/buffer.frags.R]

An example:

source("buffer.frags.R")

eg.rast <- raster()
extent(eg.rast) <- c(0,10, 0, 10)
res(eg.rast) <- c(0.5,0.5)
eg.rast[] <- NA
eg.rast[sample(c(1:ncell(eg.rast)), 100)] <- 1
plot(eg.rast)
eg.coords <- data.frame(Longitude=c(3, 7), Latitude=c(4, 8))
row.names(eg.coords) <- c("SiteA", "SiteB")
eg.coords

points(eg.coords, pch=20, cex=2)
frags <- buffer.frags(eg.coords, radius=150000, vegetation.base.raster=eg.rast)
#
frags <- do.call(rbind, frags) #TO CONDENSE INTO SINGLE DATA.FRAME
frags #NOTE:class =100 in 'frags' is the default for NAs in the input raster (i.e. no habitat/vegetation).

Monday, May 18, 2015

Phylogenetic diversity calculator for community matrix

Here is a function that calculates phylogenetic diversity (i.e. Faith's) for a set of community samples. It takes a community matrix (could be species inventories from field plots or map grid cells) and an associated tree with branch lengths.

phylogenetic.diversity.sites.R [https://raw.githubusercontent.com/GregGuerin/biomap/master/phylogenetic.diversity.sites.R]

Although coded differently, this is more or less equivalent to the 'pd' function in the picante package. I have pulled it out as a stand-alone function for a community matrix from the unweighted (pd) case of this phylogenetic endemism function, which has point record inputs and gridded map outputs. It returns a list with a vector of phylogenetic diversity scores (site names as labels). The example:


library(vegan)
data(mite)
library(ape)
mite.tree <- rtree(n=ncol(mite), tip.label=colnames(mite))



source(“phylogenetic.diversity.sites.R”)
mite.PD <- phylogenetic.diversity.sites(mite, phylo.tree=mite.tree)

mite.PD

Thursday, April 30, 2015

Function for maximising rank correlation between environmental and compositional differences

As a follow-on from the post on scaling environmental variables before calculating distances, there is this function ('bioenv') in the vegan package which can be used to find the set of variables for which (scaled) environmental distances are the most rank-correlated with compositional dissimilarity. This could be useful for some analyses- i.e. if you want to know what environmental changes are the most important for turnover in the system.

Friday, April 10, 2015

Scaling environmental variables

Multivariate analysis in community ecology often involves some comparison of samples (e.g. field plots) based on some measure of species composition, function or habitat structure as well as environmental variables (like rainfall, soil nutrient levels or depth etc).

So a basic data processing step is to find pairwise distances between sample plots based on the environmental variables, for example, as input for a partial mantel test. For simple environmental variables, generally Euclidean distance is appropriate and may be calculated on all variables at once or a subset. To be tidy with this, it is useful to standardise the variables first, to avoid those on larger scales or with larger variances having disproportionate influence.

Here is an example for nicely behaved normally distributed dummy environmental variables. Scaling these variables (that may be on different scales or measured in different units etc) makes them more comparable and gives them even weight in the distances. The scatterplot at the bottom shows that distances calculated on the unscaled versus scaled variables are not equivalent.

> variables <- data.frame(var1 = rnorm(100, mean=500, sd=10), var2 = rnorm(100, 13, 5), var3 = rnorm(100, 30, 5))

> variables

        var1      var2     var3

1   511.0022  6.714630 35.99468

2   497.8158 17.111932 23.84066

3   486.5962  8.608682 35.21800

4   496.9038 10.772592 27.12847

5   503.3185  8.448805 28.20107

6   504.2738 10.866616 31.44947

7   493.8976 18.157606 26.38141

8   492.1818 19.294316 26.23843

9   489.9431  7.349317 29.21912

10  478.5376 15.526236 21.32091

. . .

> par(mfrow=c(2,3)); hist(variables$var1); hist(variables$var2); hist(variables$var3)

> distances_raw <- dist(variables)

> variables <- as.data.frame(scale(variables))

> variables

             var1         var2          var3

1    0.9818746685 -1.288906408  1.3751342411

2   -0.2532038824  0.865837810 -1.2146544102

3   -1.3040573556 -0.896381771  1.2096385673

4   -0.3386237022 -0.447931619 -0.5140854569

5    0.2621974712 -0.929514889 -0.2855340385

6    0.3516740650 -0.428445964  0.4066368357

7   -0.6201900599  1.082543939 -0.6732696538

8   -0.7808971996  1.318116576 -0.7037355547

9   -0.9905750496 -1.157373534 -0.0686077652

10  -2.0588471840  0.537216980 -1.7515649547

. . .

> hist(variables$var1); hist(variables$var2); hist(variables$var3)

> distances_scaled <- dist(variables)

Second row shows the same variables re-scaled

> plot(distances_raw ~ distances_scaled, cex=0.7, col="red", pch=20, main="Euclidean distances from raw versus scaled variables")

Tuesday, February 10, 2015

Biodiversity mapping functions for R (5): presence/absence of species across a gridded map

Map grid cell presence/absence

I am sequentially posting some self-contained functions for mapping biodiversity metrics in R.

This function is currently contained within some of the endemism functions I have posted, but I have pulled it out so that it can be used in isolation:

map.pa.matrix.R [see full code at https://raw.githubusercontent.com/GregGuerin/biomap/master/map.pa.matrix.R]

Description --
Given georeferenced incidence data for species, generates a binary presence/absence matrix associated with grid cells of a raster.

Details --
This function generates a binary species presence/absence matrix associated with a raster layer based on georeferenced incidence data. This is a data processing step for mapping various biodiversity metrics onto raster layers. The outputs can be used as inputs into these functions, or if desired they can be used like site-based data (at the resolution of the raster) for various analysis such as ordination, or incidence/frequency data for particular species can be extracted.

Usage --
An example:

require(vegan)
data(mite)
data(mite.xy)
map.pa.matrix(mite, records="site", site.coords=mite.xy)

Biodiversity mapping functions for R (4): Phylogenetic Endemism (test against expectation)

Phylogenetic endemism – non-parametric tests

I am sequentially posting some self-contained functions for mapping biodiversity metrics in R, this one is:

pe.null.test.R [see full code at https://raw.githubusercontent.com/GregGuerin/biomap/master/pe.null.test.R]

Description --
Taking the outputs from the 'phylogenetic.endemism' function, tests whether observed phylogenetic diversity/endemism is higher than expected, using non-parametric methods

Details --
With the outputs from the 'phylogenetic.endemism' function, performs the following tests:

1) non-parametric significance test as to whether observed phylogenetic diversity/endemism is higher or lower than expected, given species richness (and observed species frequencies)

2) identifies and maps outliers (i.e. in terms of map grid cells that have higher or lower PD/PE) based on quantiles. As categorical: whether score lies more than 1.5 (or other user-defined amount) times outside the interquartile range; as continuous: the factor of the interquartile by which observed values differ from the median / 50% quantile). Returns vectors of values plus raster maps.

Usage --
An example:

mite.PE <- phylogenetic.endemism(mite, records="site", site.coords=mite.xy, sep.comm.spp="none", phylo.tree=mite.tree, sep.phylo.spp="none", weight.type="geo")

pe.mite.test <- pe.null.test(mite.PE)

And an example of the raster outputs from phylogenetic.endemism.R and pe.null.test.R:

Biodiversity mapping functions for R (3): Phylogenetic Endemism (raw)

Phylogenetic Endemism

I am sequentially posting some self-contained functions for mapping biodiversity metrics in R, this one is:

phylogenetic.endemism.R [full code at https://raw.githubusercontent.com/GregGuerin/biomap/master/phylogenetic.endemism.R]

Description --
Calculates phylogenetic endemism (phylogenetic diversity inversely weighted by the spatial range of particular branch lengths) or alternatively (unweighted) phylogenetic diversity across gridded maps using individual or site-based point records.

Details --
This implementation of phylogenetic endemism allows alternative calculation of weights for branch length ranges. Weights can be calculated based on the frequency of occurrence in grid cells, or alternatively by the georeferenced span of the range. Unweighted phylogenetic diversity can also be selected.

Usage --
An example:

require(vegan)
data(mite)
data(mite.xy)
require(ape)
mite.tree <- rtree(n=ncol(mite), tip.label=colnames(mite)) #for this example, generate a phylogenetic tree of the species in the mite dataset with random relationships and branch lengths
####Usage of the function:
mite.PE <- phylogenetic.endemism(mite, records="site", site.coords=mite.xy, sep.comm.spp="none", phylo.tree=mite.tree, sep.phylo.spp="none", weight.type="geo")

Biodiversity mapping functions for R (2): Weighted Endemism: test against expectation

Weighted Endemism non-parametric tests

I am sequentially posting some self-contained functions for mapping biodiversity metrics in R, this one is:

endemism.null.test.R
[full code at https://raw.githubusercontent.com/GregGuerin/biomap/master/endemism.null.test.R]:

Update: there is now an associated article: Guerin, G.R., Ruokolainen, L. & Lowe, A.J. (2015) A georeferenced implementation of weighted endemism. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12361

Description --
Taking the outputs from the 'weighted.endemism' function (see previous post), tests whether observed endemism is higher than expected, using non-parametric methods

Details --
With the outputs from the 'weighted.endemism' function, performs the following tests:

1) non-parametric significance test as to whether observed endemism is higher or lower than expected, given species richness (and observed species frequencies)

2) identifies and maps outliers (i.e. in terms of map grid cells that have higher or lower endemism) based on quantiles. As categorical: whether endemism score lies more than 1.5 (or other user-defined amount) times outside the interquartile range; as continuous: the factor of the interquartile by which observed values differ from the median / 50% quantile). Returns vectors of values plus raster maps.

Raw weighted endemism scores are biased both by the completeness of species sampling and species richness itself. Correcting by dividing by the observed number of species ('corrected weighted endemism' of Crisp et al. 2001) is a proposed correction, but the relationship between endemism scores and species richness is not linear under a null model (random species draws), as increasingly infrequent species are drawn as richness increases, thereby increasing CWE. While correcting endemism scores in a more sophisticated way is possible, this function does not correct the scores per se, but compares them to a null distribution. This is achieved by making replicate random draws from the species pool based on the observed species richness (i.e. same number of species) and the actual species frequencies (more frequent species more likely to be drawn). The distribution of the resulting set of null endemism scores is compared to observed endemism and subsequently grid cells can be mapped as higher or lower than expected (based on significance testing and comparison to null quantiles).

Usage --
An example:

endemism_mydata <- weighted.endemism(mite, site.coords=mite.xy, records="site")
endemism.test.example <- endemism.null.test(endemism_mydata)

And an example of an output from a regional flora dataset from South Australia (non-parametric statistical significance that endemism is higher (or lower) than expected):

Biodiversity mapping functions for R (1): Weighted Endemism (raw)

Weighted Endemism

I am sequentially posting some self-contained functions for mapping biodiversity metrics in R.

When I began a project producing regional biodiversity maps from a large inventory dataset, I wanted to stay in R for its seamless data processing and downstream modelling and analysis capabilities. The ‘Biodiverse’ software package [see https://code.google.com/p/biodiverse/] already has a large range of biodiversity mapping applications, but as far as I know there were previously no self-contained applications in R for what I needed to do.

Long story short, to do the analysis I wanted in R and with some alternative implementations, I had to code it from scratch (building on functionality such as in package ‘raster’). From this I have developed some self-contained and open-source functions for general use.

Additional options can be incorporated, such as moving window analysis, down-scaling etc, using functions in ‘raster’. I am also working towards integration with other site-based biodiversity estimation functions in R (i.e. to start with incidence data and output rasters and associated data).

The first function here is:

weighted.endemism.R [full code at https://raw.githubusercontent.com/GregGuerin/biomap/master/weighted.endemism.R]:

Update: there is now an associated article: Guerin, G.R., Ruokolainen, L. & Lowe, A.J. (2015) A georeferenced implementation of weighted endemism. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12361

Description --Calculates weighted endemism (species richness inversely weighted by species ranges) across gridded maps using single or site-based point records.

Details --This implementation of weighted endemism allows alternative calculation of weights for species ranges as well as the option of user-supplied weights. Weights can be calculated based on the frequency of occurrence in grid cells, or alternatively by the geographical size of the species range, calculated in one (span) or two (area) dimensions.

Usage -- An example:

require(vegan)
data(mite)
data(mite.xy)
endemism_mydata <- weighted.endemism(mite, site.coords = mite.xy, records="site")

R tools and scripts for vegetation science and ecology

Friday, July 10, 2015

Function for calculating habitat fragmentation indices/stats for particular coordinates

Function for calculating habitat fragmentation indices/stats for particular coordinates

Monday, May 18, 2015

Phylogenetic diversity calculator for community matrix

Phylogenetic diversity calculator for community matrix

Thursday, April 30, 2015

Function for maximising rank correlation between environmental and compositional differences

Friday, April 10, 2015

Scaling environmental variables

Tuesday, February 10, 2015

Biodiversity mapping functions for R (5): presence/absence of species across a gridded map

Map grid cell presence/absence

Biodiversity mapping functions for R (4): Phylogenetic Endemism (test against expectation)

Phylogenetic endemism – non-parametric tests

Biodiversity mapping functions for R (3): Phylogenetic Endemism (raw)

Phylogenetic Endemism

Biodiversity mapping functions for R (2): Weighted Endemism: test against expectation

Weighted Endemism non-parametric tests

Biodiversity mapping functions for R (1): Weighted Endemism (raw)

Weighted Endemism

Popular Posts

Search This Blog