Generalized Additive Models with Binary Data

GAMs are particularly valuable with binary response variables (for background, see p. 593). To illustrate the use of gam for modelling binary response data, we return to the example analysed by logistic regression on p. 595. We want to understand how the isolation of an island and its area influence the probability that the island is occupied by our study species.

island<-read.table("c:\\temp\\isolation.txt",header=T)
attach(island)
names(island)

[1] "incidence" "area" "isolation"

In the logistic regression, isolation had a highly significant negative effect on the probability that an island will be occupied by our species (p = 0.004), and area (island size) had a significant positive effect on the likelihood of occupancy (p = 0.019). But we have no a priori reason to believe that the logit of the probability should be linearly related to either of the explanatory variables. We can try using a GAM to fit smoothed functions to the incidence data:

model3<-gam(incidence~s(area)+s(isolation),binomial)
summary(model3)

Family: binomial
Link function: logit
Formula:
incidence ~ s(area) + s(isolation)

Parametric coefficients:
             Estimate  Std. Error    z value     Pr(>|z|)
(Intercept)    1.6371      0.8545      1.916       0.0554    .

Approximate significance of smooth terms:
                 edf   Est.rank  Chi.sq    p-value
s(area)        2.429          5   6.335    0.27494
s(isolation)   1.000          1   7.532    0.00606     **

R-sq.(adj) = 0.63 Deviance explained = 63.1%
UBRE score = -0.32096 Scale est. = 1 n = 50

This indicates ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.