Mapping soil functional properties using multilevel models

>*Tor-Gunnar Vågen (ICRAF), Leigh Winowiecki (CIAT), Lulseged T. Desta (CIAT) and Jerome E. Tondoh (CIAT)*

In this document we present examples of R functions and scripts that may be used to generate local maps of soil functional properties based on for example remote sensing data, climate surfaces and/or digital elevation models and their derivatives. The data used come from the Africa Soil Information Service (*AfSIS*) project, which consists of a set of sites stratified across sub-Saharan Africa based on Koeppen-Geiger climate zones[1]. The methodology used was developed for land health assessment as is referred to as the Land Degradation Surveillance Framework (*LDSF*)[2,3]. A copy of the field guide can be accessed through [this](http://dvn.iq.harvard.edu/dvn/dv/icrafgsl/faces/study/StudyPage.xhtml?globalId=hdl:1902.1/19793&tab=files&studyListingIndex=6_cdd00cd01a2a42d2dc22ca256ad2 "field guide") link (requires registration).

We demonstrate a prediction model for soil organic carbon (SOC), using linear mixed effects (lme) or multilevel modeling approaches. Such models are well suited for developing local maps of soil properties. We use surface reflectance calibrated Landsat ETM+ reflectance from the GLS2010 library as predictors. The example data are from sites in Tanzania, Mozambique and Mali, respectively, but have been made anonymous in order to make this tutorial publicly available.

The code presented here is intended as a starting point. Other approaches, such as *gradient boosting models (gbm)* and *bayesian additive regression trees (BART)* may be interesting to explore as well. There is no such thing as "the right model" as models are essentially wrong and the selection of model depends on the application as well. You are encouraged to use this tutorial as a starting point and we hope it is of help in developing similar models and maps using your own data.

Required packages:
- *nlme* (comes with your R installation)
- *lattice*
- *car*
- *nnet*
- *sp*
- *rgdal*
- *raster*
- *MASS*

### Example data
The example dataset used in this exercise can be downloaded through the Dataverse site [here](http://dvn.iq.harvard.edu/dvn/dv/icrafgsl/faces/study/StudyPage.xhtml?globalId=hdl:1902.1/19793&tab=files&studyListingIndex=6_cdd00cd01a2a42d2dc22ca256ad2"data") as well.

**Data citation:
Vågen, Tor-G; Winowiecki, Leigh A; Tondoh, Jerome E; Desta, Lulseged T, 2013-01-01, "Africa Soil Information Service (AfSIS) - Soil Health Mapping", http://hdl.handle.net/1902.1/19793 v2012.**

setwd("~/Desktop/afsisData"); #Change this to where you saved the data
dat <- read.table("afsisSoil.csv", sep=",", header=T);

To see the content and the structure of the data, you may issue the following commands:

head(dat); str(dat);

The variables are:
- Country, Site, Cluster, Plot are sampling plot locations (see the LDSF field guide[4])
- DepthTop and DepthBottom soil depth in cm (all these data are topsoils)
- elev: Elevation from SRTM (m)
- map: Mean annual precipitation (mm)
- mfi: Modified fournier index (rainfall agressiveness)
- slope: Slope based on SRTM (degrees)
- flowaccum: Flow accumulation from SRTM
- SRFIBL ... SRFIMC: Calibrated surface reflectance for Landsat bands 1...5,7
- PBI and PVI: Prependicular background and vegetation indices, respectively.

Let's take a look at the SOC data by Site...
library(lattice);
bwplot(Site ~ SOC, dat, box.ratio=0.05, fill="brown")
Note that the SOC values are concentrations multiplied by 100 as we will be producing the final map in 16-bit format.

## Model development (lme)
Before proceeding it is often a good idea to look at the distribution of the variable we are modeling to determine if any transformations are required.
library(lattice);
densityplot(~SOC, dat);

library(nlme);
soc.lme1 <- lme(log(SOC) ~ SRFIGL+SRFIRL+SRFINA+SRFIMB+SRFIMC+PBI+PVI+map, random=~1|Site/Cluster, dat);
summary(soc.lme1);

The model summary above indicates that we may want to simplify the model. Mean annual rainfall does not contribute much to the model when we include Landsat reflectance. After a few iterations with adding/removing variables from the model and testing each model we end up with:

library(nlme);
soc.lme2 <- lme(log(SOC) ~ SRFIGL+SRFINA+SRFIMB+SRFIMC+PVI, random=~1|Site/Cluster, dat);
summary(soc.lme2);

The residuals from *soc.lme2* look reasonable as well:
plot(soc.lme2);

Measured versus predicted SOC (back-transforming the predicted values):

library(car);
scatterplot(exp(predict(soc.lme2)) ~ SOC, xlab="Measured SOC", ylab="Predicted SOC",xlim=c(0,8000), ylim=c(0,8000), smooth=F, dat)

The above model has an adjusted R-squared of 0.85. As shown in the summary variance is highest between sites, but also relatively high within many of the sites (between sampling clusters).

## Mappping SOC
For this exercise, we show an example of how to use the model developed earlier (*soc.lme2*) to predict SOC from satellite imagery. The images used are from the GLS2005 archive from sites in Mali, Tanzania and Mozambique. We have clipped the individual sites to reduce the size of the dataset. The images are in the original zip archive downloaded earlier.

Note that the bands must be named consistent with the covariates in the model and that we create a "dummy" band for each site which contains the site ID so that the model uses site-specific coefficients in the predictions of SOC.

library(raster);
setwd("~/Desktop/afsisData"); #Change this to where you saved the data
rast1 <- stack(raster("1/SRFIGL.tif"),raster("1/SRFINA.tif"),raster("1/SRFIMB.tif"),raster("1/SRFIMC.tif"),raster("1/PVI.tif"));
site <- raster(rast1,1);
names(site) <- 'Site';
site[site>0] <- 7;
rast1 <- stack(rast1,site);

rast2<- stack(raster("2/SRFIGL.tif"),raster("2/SRFINA.tif"),raster("2/SRFIMB.tif"),raster("2/SRFIMC.tif"),raster("2/PVI.tif"));
site <- raster(rast2,1);
names(site) <- 'Site';
site[site>0] <- 39;
rast2 <- stack(rast2,site);

rast3<- stack(raster("3/SRFIGL.tif"),raster("3/SRFINA.tif"),raster("3/SRFIMB.tif"),raster("3/SRFIMC.tif"),raster("3/PVI.tif"));
site <- raster(rast3,1);
names(site) <- 'Site';
site[site>0] <- 1;
rast3 <- stack(rast3,site);

Let's apply *soc.lme2* to the imagery and display the results.

mapSOC1 <- exp(predict(rast1, soc.lme2, level=1));
mapSOC2 <- exp(predict(rast2, soc.lme2, level=1));
mapSOC3 <- exp(predict(rast3, soc.lme2, level=1));

col=colorRampPalette(c("yellow","white","wheat","black"))(255); #You may want to play with these!
zlim=c(0,3500) #Or an appropriate stretch for the site you are modeling

plot(mapSOC1, col=col, main="Map of SOC (g/kg) for Site 1 (Mali)", xlab="Longitude", ylab="Latitude", zlim=zlim);
plot(mapSOC2, col=col, main="Map of SOC (g/kg) for Site 2 (Mozambique)", xlab="Longitude", ylab="Latitude", zlim=zlim);
plot(mapSOC3, col=col, main="Map of SOC (g/kg) for Site 3 (Tanzania)", xlab="Longitude", ylab="Latitude", zlim=zlim);

Note that the model we use above is the "1-level" model, which means that we are using the "site-specific" model coefficients from *soc.lme2*. Given the size of the training dataset this model should give us reasonable predictions, but it is useful to compare the distribution of predicted SOC from the map surface and the measured values used to develop the model.

mapSOC.val1 <- getValues(mapSOC1);
mapSOC.val2 <- getValues(mapSOC2);
mapSOC.val3 <- getValues(mapSOC3);

plot(density(na.omit(mapSOC.val1)), col="orange", lwd=2, ylim=c(0,0.008), xlim=c(0,3000), main="");
lines(density(na.omit(mapSOC.val2)), col="red", lwd=2, main="");
lines(density(na.omit(mapSOC.val3)), col="black", lwd=2, main="");
legend("topright", bty="n", c("Site 1","Site 2","Site 3"), lty=c(1,2,2), col=c("orange","red","black"), lwd=c(2,2,2));

\Compare the above density plots to the boxplots for the measured values by site. The corresponding sites in the boxplot are:
* Site 1: 7
* Site 2: 39
* Site 3: 1

## References
[1] Kottek, M., Grieser, J., Beck, C., Rudolf, B., & Rubel, F. (2006). World Map of the Köppen-Geiger climate classification updated. Meteorologische Zeitschrift, 15(3), 259–263. doi:10.1127/0941-2948/2006/0130

[2] Vågen, T.-G., Davey, F., & Shepherd, K. D. (2012). Land health surveillance: Mapping soil carbon in Kenyan rangelands. In P. K. R. Nair & D. Garrity (Eds.), Agroforestry - The Future of Global Land Use. Springer.

[3] Vågen, T.-G., Shepherd, K. D., & Walsh, M. G. (2006). Sensing landscape level change in soil fertility following deforestation and conversion in the highlands of Madagascar using Vis-NIR spectroscopy. Geoderma, 133(3-4), 281–294. doi:10.1016/j.geoderma.2005.07.014

[4] Vågen, T.-G., Winowiecki, L. A., Tamene Desta, L., & Tondoh, J. E. (2010). The Land Degradation Surveillance Framework (LDSF) - Field Guide. 14 p. Nairobi, Kenya: World Agroforestry Centre.

You are using an outdated browser that is not supported by GeoNode.