Generalized additive models (GAMs) are a way to estimate nonlinear relationships between a response and several predictors simultaneously. A [[link function]] can also be employed, hence the additive model is "generalized". GAMs balance the interpretable (yet biased) linear model and the “black box” machine learning algorithms.
The general form of a GAM is
$g(E(Y_i)) = \beta_0 + s_1 (x_{i,1}) + \dots + s_p (x_{i,p})$
where $s_p(x_p)$ is a smooth (non-parametric) function of the $p^{th}$ predictor and $g(u)$ is the optional link function.
However, not all predictors need enter the model non-parametrically. The interpretation of any predictors that enter parametrically (i.e., linearly) is the same as the standard [[base/R/linear regression|linear regression]] model.
When the expected degrees of freedom (edf) is close to $1$, the predictor should likely enter the model linearly. If a plot of the smooth controlling for other variables indicates a line can be drawn through the confidence bands, this is also evidence that the variable may enter linearly. Use both the edf and a visualization to make a determination.
The expected degrees of freedom is equal to the [[trace]] of the [[hat matrix]] and is reported in the [[R]] output. However, many statisticians argue that edf is a flawed metaphor and should be used with caution.
GAMs are not ideal for models with strong interaction terms, as those terms cannot be separated out into an additive model. You can include an interaction term, but things get complicated even in lower dimensions and the benefits of a GAM over other approaches fades (especially interpretability).
```R
library(mgcv)
# Fit a GAM with a smooth on x1 and parametrically on x2, x3
modGAM = gam(y ~ s(x1) + x2 + x3, data=data, family=...)
# Predict
predictions <- predict.gam(modGAM, newdata=test)
```
The `family` can be `gaussian`, `poissson` or `binomial` depending on the data (continuous normal data, count data, or number of successes, respectively).
The summary output shows the results of an F-test where the null hypothesis is that the coefficient for the given smooth is zero. Low p-values can be used to reject the null and indicates the variable (as smoothed) is significant.
The deviance explained is reported as a percentage equal to $R^2 = 1 - RD/ND$ and the adjusted $R^2$ can be used to compare performance against other models.
To plot the marginal relationship of each smoothed predictor, use `plot(modGAM)` with the `mgcv` package.
> [!Tip]- Additional Resources
> - [GAM: The Predictive Modeling Silver Bullet | Kim Larsen (Stitch Fix)](https://multithreaded.stitchfix.com/blog/2015/07/30/gam/)