Friday, July 10, 2015

Function for calculating habitat fragmentation indices/stats for particular coordinates

Function for calculating habitat fragmentation indices/stats for particular coordinates

Here is a function that calculates 'fragstats' for user-defined circular buffers around sites defined by XY coordinates. It is essentially a wrapper for the 'ClassStat' function (from package SDMTools) with the purpose of a) implementing calculation for a circular buffer zone defined by a radius in distance units rather than a rectangular area such as defined by a raster, and b) allowing batch processing with coordinates from multiple sites.

The function may be useful if you wish to analyse habitat fragmentation indices at 1 or more scales/radii around a set of point locations, rather than across an entire landscape.

The input data are site coordinates and a RasterLayer object that contains habitat/vegetation presence categories defined by integers (absences may be NAs).

buffer.frags.R [https://github.com/GregGuerin/biomap/raw/master/buffer.frags.R]

An example:


source("buffer.frags.R")
eg.rast <- raster()
extent(eg.rast) <- c(0,10, 0, 10)
res(eg.rast) <- c(0.5,0.5)
eg.rast[] <- NA
eg.rast[sample(c(1:ncell(eg.rast)), 100)] <- 1
plot(eg.rast)
eg.coords <- data.frame(Longitude=c(3, 7), Latitude=c(4, 8))
row.names(eg.coords) <- c("SiteA", "SiteB")
eg.coords
points(eg.coords, pch=20, cex=2)
frags <- buffer.frags(eg.coords, radius=150000, vegetation.base.raster=eg.rast)
#
frags <- do.call(rbind, frags) #TO CONDENSE INTO SINGLE DATA.FRAME
frags #NOTE:class =100 in 'frags' is the default for NAs in the input raster (i.e. no habitat/vegetation).

Monday, May 18, 2015

Phylogenetic diversity calculator for community matrix

Phylogenetic diversity calculator for community matrix

Here is a function that calculates phylogenetic diversity (i.e. Faith's) for a set of community samples. It takes a community matrix (could be species inventories from field plots or map grid cells) and an associated tree with branch lengths.

phylogenetic.diversity.sites.R [https://raw.githubusercontent.com/GregGuerin/biomap/master/phylogenetic.diversity.sites.R]

Although coded differently, this is more or less equivalent to the 'pd' function in the picante package. I have pulled it out as a stand-alone function for a community matrix from the unweighted (pd) case of this phylogenetic endemism function, which has point record inputs and gridded map outputs. It returns a list with a vector of phylogenetic diversity scores (site names as labels). The example:



library(vegan)
data(mite)
library(ape)
mite.tree <- rtree(n=ncol(mite), tip.label=colnames(mite))


source(“phylogenetic.diversity.sites.R”)
mite.PD <- phylogenetic.diversity.sites(mite, phylo.tree=mite.tree)
mite.PD




Thursday, April 30, 2015

Function for maximising rank correlation between environmental and compositional differences

As a follow-on from the post on scaling environmental variables before calculating distances, there is this function ('bioenv') in the vegan package which can be used to find the set of variables for which (scaled) environmental distances are the most rank-correlated with compositional dissimilarity. This could be useful for some analyses- i.e. if you want to know what environmental changes are the most important for turnover in the system.

Friday, April 10, 2015

Scaling environmental variables

Multivariate analysis in community ecology often involves some comparison of samples (e.g. field plots) based on some measure of species composition, function or habitat structure as well as environmental variables (like rainfall, soil nutrient levels or depth etc).

So a basic data processing step is to find pairwise distances between sample plots based on the environmental variables, for example, as input for a partial mantel test. For simple environmental variables, generally Euclidean distance is appropriate and may be calculated on all variables at once or a subset. To be tidy with this, it is useful to standardise the variables first, to avoid those on larger scales or with larger variances having disproportionate influence.

Here is an example for nicely behaved normally distributed dummy environmental variables. Scaling these variables (that may be on different scales or measured in different units etc) makes them more comparable and gives them even weight in the distances. The scatterplot at the bottom shows that distances calculated on the unscaled versus scaled variables are not equivalent.



> variables <- data.frame(var1 = rnorm(100, mean=500, sd=10), var2 = rnorm(100, 13, 5), var3 = rnorm(100, 30, 5))
> variables
        var1      var2     var3
1   511.0022  6.714630 35.99468
2   497.8158 17.111932 23.84066
3   486.5962  8.608682 35.21800
4   496.9038 10.772592 27.12847
5   503.3185  8.448805 28.20107
6   504.2738 10.866616 31.44947
7   493.8976 18.157606 26.38141
8   492.1818 19.294316 26.23843
9   489.9431  7.349317 29.21912
10  478.5376 15.526236 21.32091

. . .

> par(mfrow=c(2,3)); hist(variables$var1); hist(variables$var2); hist(variables$var3)
> distances_raw <- dist(variables)
> variables <- as.data.frame(scale(variables))
> variables
             var1         var2          var3
1    0.9818746685 -1.288906408  1.3751342411
2   -0.2532038824  0.865837810 -1.2146544102
3   -1.3040573556 -0.896381771  1.2096385673
4   -0.3386237022 -0.447931619 -0.5140854569
5    0.2621974712 -0.929514889 -0.2855340385
6    0.3516740650 -0.428445964  0.4066368357
7   -0.6201900599  1.082543939 -0.6732696538
8   -0.7808971996  1.318116576 -0.7037355547
9   -0.9905750496 -1.157373534 -0.0686077652
10  -2.0588471840  0.537216980 -1.7515649547

. . .

> hist(variables$var1); hist(variables$var2); hist(variables$var3)
> distances_scaled <- dist(variables)

Second row shows the same variables re-scaled



> plot(distances_raw ~ distances_scaled, cex=0.7, col="red", pch=20, main="Euclidean distances from raw versus scaled variables")


Tuesday, February 10, 2015

Biodiversity mapping functions for R (5): presence/absence of species across a gridded map

Map grid cell presence/absence

I am sequentially posting some self-contained functions for mapping biodiversity metrics in R. 

This function is currently contained within some of the endemism functions I have posted, but I have pulled it out so that it can be used in isolation:

map.pa.matrix.R [see full code at https://raw.githubusercontent.com/GregGuerin/biomap/master/map.pa.matrix.R]


Description --
Given georeferenced incidence data for species, generates a binary presence/absence matrix associated with grid cells of a raster.


Details --
This function generates a binary species presence/absence matrix associated with a raster layer based on georeferenced incidence data. This is a data processing step for mapping various biodiversity metrics onto raster layers. The outputs can be used as inputs into these functions, or if desired they can be used like site-based data (at the resolution of the raster) for various analysis such as ordination, or incidence/frequency data for particular species can be extracted.


Usage --
An example:


require(vegan)
data(mite)
data(mite.xy)
map.pa.matrix(mite, records="site", site.coords=mite.xy)


Biodiversity mapping functions for R (4): Phylogenetic Endemism (test against expectation)

Phylogenetic endemism – non-parametric tests

I am sequentially posting some self-contained functions for mapping biodiversity metrics in R, this one is:

pe.null.test.R [see full code at https://raw.githubusercontent.com/GregGuerin/biomap/master/pe.null.test.R]


Description --
Taking the outputs from the 'phylogenetic.endemism' function, tests whether observed phylogenetic diversity/endemism is higher than expected, using non-parametric methods


Details --
With the outputs from the 'phylogenetic.endemism' function, performs the following tests:


1) non-parametric significance test as to whether observed phylogenetic diversity/endemism is higher or lower than expected, given species richness (and observed species frequencies)


2) identifies and maps outliers (i.e. in terms of map grid cells that have higher or lower PD/PE) based on quantiles. As categorical: whether score lies more than 1.5 (or other user-defined amount) times outside the interquartile range; as continuous: the factor of the interquartile by which observed values differ from the median / 50% quantile). Returns vectors of values plus raster maps.


Usage --
An example:


mite.PE <- phylogenetic.endemism(mite, records="site", site.coords=mite.xy, sep.comm.spp="none", phylo.tree=mite.tree, sep.phylo.spp="none", weight.type="geo")


pe.mite.test <- pe.null.test(mite.PE)



And an example of the raster outputs from phylogenetic.endemism.R and pe.null.test.R:






Biodiversity mapping functions for R (3): Phylogenetic Endemism (raw)

Phylogenetic Endemism

I am sequentially posting some self-contained functions for mapping biodiversity metrics in R, this one is:

phylogenetic.endemism.R [full code at https://raw.githubusercontent.com/GregGuerin/biomap/master/phylogenetic.endemism.R]


Description --
Calculates phylogenetic endemism (phylogenetic diversity inversely weighted by the spatial range of particular branch lengths) or alternatively (unweighted) phylogenetic diversity across gridded maps using individual or site-based point records.


Details --
This implementation of phylogenetic endemism allows alternative calculation of weights for branch length ranges. Weights can be calculated based on the frequency of occurrence in grid cells, or alternatively by the georeferenced span of the range. Unweighted phylogenetic diversity can also be selected.


Usage --
An example:


require(vegan)
data(mite)
data(mite.xy)
require(ape)
mite.tree <- rtree(n=ncol(mite), tip.label=colnames(mite)) #for this example, generate a phylogenetic tree of the species in the mite dataset with random relationships and branch lengths

####Usage of the function:
mite.PE <- phylogenetic.endemism(mite, records="site", site.coords=mite.xy, sep.comm.spp="none", phylo.tree=mite.tree, sep.phylo.spp="none", weight.type="geo")