section5.1solutions.ipynb

   1 {
   2  "cells": [
   3   {
   4    "cell_type": "markdown",
   5    "metadata": {},
   6    "source": [
   7     "# Section 5.1 solutions"
   8    ]
   9   },
  10   {
  11    "cell_type": "markdown",
  12    "metadata": {
  13     "heading_collapsed": true
  14    },
  15    "source": [
  16     "### Imports and defintions"
  17    ]
  18   },
  19   {
  20    "cell_type": "code",
  21    "execution_count": null,
  22    "metadata": {
  23     "hidden": true,
  24     "init_cell": true
  25    },
  26    "outputs": [],
  27    "source": [
  28     "library(tidyverse)\n",
  29     "# library(cowplot)\n",
  30     "library(repr)\n",
  31     "library(ggfortify)\n",
  32     "\n",
  33     "# Change plot size to 4 x 3\n",
  34     "options(repr.plot.width=6, repr.plot.height=4)"
  35    ]
  36   },
  37   {
  38    "cell_type": "code",
  39    "execution_count": null,
  40    "metadata": {
  41     "hidden": true,
  42     "init_cell": true
  43    },
  44    "outputs": [],
  45    "source": [
  46     "source('plot_extensions.R')"
  47    ]
  48   },
  49   {
  50    "cell_type": "markdown",
  51    "metadata": {},
  52    "source": [
  53     "## Exercise 5.2"
  54    ]
  55   },
  56   {
  57    "cell_type": "markdown",
  58    "metadata": {},
  59    "source": [
  60     "### Load the `cemheat` dataset."
  61    ]
  62   },
  63   {
  64    "cell_type": "code",
  65    "execution_count": null,
  66    "metadata": {},
  67    "outputs": [],
  68    "source": [
  69     "cemheat <- read.csv('cemheat.csv')\n",
  70     "head(cemheat)"
  71    ]
  72   },
  73   {
  74    "cell_type": "markdown",
  75    "metadata": {},
  76    "source": [
  77     "### Make scatterplots of heat against each of TA and TS in turn, and comment on what you see."
  78    ]
  79   },
  80   {
  81    "cell_type": "code",
  82    "execution_count": null,
  83    "metadata": {},
  84    "outputs": [],
  85    "source": [
  86     "taheat <- ggplot(cemheat, aes(x=TA, y=heat)) + geom_point()\n",
  87     "tsheat <-  ggplot(cemheat, aes(x=TS, y=heat)) + geom_point()\n",
  88     "\n",
  89     "multiplot(taheat, tsheat, cols=2)"
  90    ]
  91   },
  92   {
  93    "cell_type": "markdown",
  94    "metadata": {},
  95    "source": [
  96     "Blah, blah, comment, blah."
  97    ]
  98   },
  99   {
 100    "cell_type": "markdown",
 101    "metadata": {},
 102    "source": [
 103     "### Use GenStat to fit each individual regression equation (of heat on TA and of heat on TS) in turn, and then to fit the regression equation with two explanatory variables. Does the latter regression equation give you a better model than either of the individual ones?"
 104    ]
 105   },
 106   {
 107    "cell_type": "code",
 108    "execution_count": null,
 109    "metadata": {},
 110    "outputs": [],
 111    "source": [
 112     "fit.ta <- lm(heat ~ TA, data = cemheat)\n",
 113     "summary(fit.ta)\n",
 114     "anova(fit.ta)"
 115    ]
 116   },
 117   {
 118    "cell_type": "code",
 119    "execution_count": null,
 120    "metadata": {},
 121    "outputs": [],
 122    "source": [
 123     "ggplotRegression(fit.ta)"
 124    ]
 125   },
 126   {
 127    "cell_type": "code",
 128    "execution_count": null,
 129    "metadata": {},
 130    "outputs": [],
 131    "source": [
 132     "fit.ts <- lm(heat ~ TS, data = cemheat)\n",
 133     "summary(fit.ts)\n",
 134     "anova(fit.ts)"
 135    ]
 136   },
 137   {
 138    "cell_type": "code",
 139    "execution_count": null,
 140    "metadata": {},
 141    "outputs": [],
 142    "source": [
 143     "ggplotRegression(fit.ts)"
 144    ]
 145   },
 146   {
 147    "cell_type": "code",
 148    "execution_count": null,
 149    "metadata": {},
 150    "outputs": [],
 151    "source": [
 152     "fit.tats <- lm(heat ~ TA + TS, data = cemheat)\n",
 153     "summary(fit.tats)\n",
 154     "anova(fit.tats)"
 155    ]
 156   },
 157   {
 158    "cell_type": "code",
 159    "execution_count": null,
 160    "metadata": {},
 161    "outputs": [],
 162    "source": [
 163     "ggplotRegression(fit.tats)"
 164    ]
 165   },
 166   {
 167    "cell_type": "markdown",
 168    "metadata": {},
 169    "source": [
 170     "Now combine the results into one dataframe for easy comparison."
 171    ]
 172   },
 173   {
 174    "cell_type": "code",
 175    "execution_count": null,
 176    "metadata": {},
 177    "outputs": [],
 178    "source": [
 179     "fits <- list(fit.ta, fit.ts, fit.tats)\n",
 180     "data.frame(\n",
 181     "    \"Vars\" = sapply(fits, function(x) toString(attr(summary(x)$terms, \"variables\")[-(1:2)]) ),\n",
 182     "    \"Adj R^2\" = sapply(fits, function(x) summary(x)$adj.r.squared)\n",
 183     ")"
 184    ]
 185   },
 186   {
 187    "cell_type": "markdown",
 188    "metadata": {},
 189    "source": [
 190     "### According to the regression equation with two explanatory variables fitted in part (b), what is the predicted value of heat when TA = 15 and TS = 55?"
 191    ]
 192   },
 193   {
 194    "cell_type": "code",
 195    "execution_count": null,
 196    "metadata": {},
 197    "outputs": [],
 198    "source": [
 199     "predict(fit.tats, data.frame(\"TA\" = 15, \"TS\" = 55))"
 200    ]
 201   },
 202   {
 203    "cell_type": "markdown",
 204    "metadata": {},
 205    "source": [
 206     "### By looking at the (default) composite residual plots, comment on the appropriateness of the fitted regression model."
 207    ]
 208   },
 209   {
 210    "cell_type": "code",
 211    "execution_count": null,
 212    "metadata": {},
 213    "outputs": [],
 214    "source": [
 215     "autoplot(fit.tats)"
 216    ]
 217   },
 218   {
 219    "cell_type": "markdown",
 220    "metadata": {},
 221    "source": [
 222     "## Exercise 5.3: Fitting a quadratic regression model"
 223    ]
 224   },
 225   {
 226    "cell_type": "code",
 227    "execution_count": null,
 228    "metadata": {},
 229    "outputs": [],
 230    "source": [
 231     "anaerobic <- read.csv('anaerob.csv')\n",
 232     "ggplot(anaerobic, aes(x=oxygen, y=ventil)) + geom_point()"
 233    ]
 234   },
 235   {
 236    "cell_type": "markdown",
 237    "metadata": {},
 238    "source": [
 239     "### Using GenStat, perform the regression of expired ventilation (`ventil`) on oxygen uptake (`oxygen`). Are you at all surprised by how good this regression model seems?"
 240    ]
 241   },
 242   {
 243    "cell_type": "code",
 244    "execution_count": null,
 245    "metadata": {},
 246    "outputs": [],
 247    "source": [
 248     "fit.o <- lm(ventil ~ oxygen, data = anaerobic)\n",
 249     "summary(fit.o)\n",
 250     "anova(fit.o)"
 251    ]
 252   },
 253   {
 254    "cell_type": "code",
 255    "execution_count": null,
 256    "metadata": {},
 257    "outputs": [],
 258    "source": [
 259     "ggplotRegression(fit.o)"
 260    ]
 261   },
 262   {
 263    "cell_type": "code",
 264    "execution_count": null,
 265    "metadata": {},
 266    "outputs": [],
 267    "source": [
 268     "autoplot(fit.o)"
 269    ]
 270   },
 271   {
 272    "cell_type": "markdown",
 273    "metadata": {},
 274    "source": [
 275     "### Now form a new variable `oxy2`, say, by squaring oxygen.\n",
 276     "(Create a new column in the `anearobic` dataframe which is `anaerobic$oxygen ^ 2`.) Perform the regression of `ventil` on `oxygen` and `oxy2`. Comment on the fit of this model according to the printed output (and with recourse to Figure 3.2 in Example 3.1)."
 277    ]
 278   },
 279   {
 280    "cell_type": "code",
 281    "execution_count": null,
 282    "metadata": {},
 283    "outputs": [],
 284    "source": [
 285     "anaerobic$oxy2 <- anaerobic$oxygen^2\n",
 286     "head(anaerobic)"
 287    ]
 288   },
 289   {
 290    "cell_type": "code",
 291    "execution_count": null,
 292    "metadata": {},
 293    "outputs": [],
 294    "source": [
 295     "fit.o2 <- lm(ventil ~ oxygen + oxy2, data = anaerobic)\n",
 296     "summary(fit.o2)\n",
 297     "anova(fit.o2)"
 298    ]
 299   },
 300   {
 301    "cell_type": "markdown",
 302    "metadata": {},
 303    "source": [
 304     "### Make the usual residual plots and comment on the fit of the model again."
 305    ]
 306   },
 307   {
 308    "cell_type": "code",
 309    "execution_count": null,
 310    "metadata": {},
 311    "outputs": [],
 312    "source": [
 313     "ggplotRegression(fit.o2)"
 314    ]
 315   },
 316   {
 317    "cell_type": "code",
 318    "execution_count": null,
 319    "metadata": {},
 320    "outputs": [],
 321    "source": [
 322     "autoplot(fit.o2)"
 323    ]
 324   },
 325   {
 326    "cell_type": "code",
 327    "execution_count": null,
 328    "metadata": {},
 329    "outputs": [],
 330    "source": []
 331   }
 332  ],
 333  "metadata": {
 334   "kernelspec": {
 335    "display_name": "R",
 336    "language": "R",
 337    "name": "ir"
 338   },
 339   "language_info": {
 340    "codemirror_mode": "r",
 341    "file_extension": ".r",
 342    "mimetype": "text/x-r-source",
 343    "name": "R",
 344    "pygments_lexer": "r",
 345    "version": "3.4.2"
 346   },
 347   "toc": {
 348    "nav_menu": {},
 349    "number_sections": true,
 350    "sideBar": true,
 351    "skip_h1_title": false,
 352    "title_cell": "Table of Contents",
 353    "title_sidebar": "Contents",
 354    "toc_cell": false,
 355    "toc_position": {
 356     "height": "calc(100% - 180px)",
 357     "left": "10px",
 358     "top": "150px",
 359     "width": "342px"
 360    },
 361    "toc_section_display": true,
 362    "toc_window_display": true
 363   }
 364  },
 365  "nbformat": 4,
 366  "nbformat_minor": 2
 367 }