Exercise: Junior School Project
================
Nora Wickelmaier
2025-06-20

Load the Junior School Project collected from primary (U.S. term is
elementary) schools in inner London in R. You might need to install the
faraway package first with `install.packages("faraway")`.

The data frame contains the following variables:

|           |                                                                                                                                                              |
|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `school`  | 50 schools code 1–50                                                                                                                                         |
| `class`   | a factor with levels `1`, `2`, `3`, and `4`                                                                                                                  |
| `gender`  | a factor with levels `boy` and `girl`                                                                                                                        |
| `social`  | class of the father I = 1; II = 2; III nonmanual = 3; III manual = 4; IV = 5; V = 6; Long-term unemployed = 7; Not currently employed = 8; Father absent = 9 |
| `raven`   | test score                                                                                                                                                   |
| `id`      | student id coded 1–1402                                                                                                                                      |
| `english` | score on English                                                                                                                                             |
| `math`    | score on Maths                                                                                                                                               |
| `year`    | year of school                                                                                                                                               |

We want to investigate how math achievement is influenced by raven score
and social class of the father. If you need a refresher on Raven’s
Progressive Matrices, check here:
<https://en.wikipedia.org/wiki/Raven%27s_Progressive_Matrices>.
Basically, it is an intelligent test.

We will take a subset of the data, so that each student provides only
one data point, for simplicity:

``` r
data("jsp", package = "faraway")
dat <- jsp |> subset(year == 0)
```

<img src="jsp_files/figure-gfm/unnamed-chunk-2-1.png" style="display: block; margin: auto;" />

1.  Create a new variable `craven` where `raven` is centered over all
    students

2.  Create another variable `gcraven` where `craven` is centered over
    all schools. Create a variable `mraven` containing the centered
    average school means first, so that you can calculate

``` r
dat$gcraven <- dat$craven - dat$mraven

# Check your results
aggregate(craven ~ school, dat, mean) |> head()
```

    ##   school     craven
    ## 1      1 -2.6886533
    ## 2      2  0.1250722
    ## 3      3  2.0584055
    ## 4      4  0.9167389
    ## 5      5  2.3512627
    ## 6      6  0.4584055

``` r
aggregate(gcraven ~ school, dat, mean) |> head()
```

    ##   school       gcraven
    ## 1      1  1.253746e-15
    ## 2      2 -1.184075e-15
    ## 3      3 -1.421172e-15
    ## 4      4  1.184283e-15
    ## 5      5  5.075722e-16
    ## 6      6  0.000000e+00

3.  Create a plot with `lattice::xyplot()` with `gcraven` on the
    $x$-axis and `math` on the $y$-axis and one panel for each school.
    Use `type = c("p",    "g", "r")`. You can also use `ggplot2` if you
    want to. What would be your conclusion about the need for
    school-specific slopes based on this plot?

<img src="jsp_files/figure-gfm/unnamed-chunk-7-1.png" style="display: block; margin: auto;" />

4.  We will consider the following levels of the data:

    - Level 1: students
    - Level 2: schools

    And the variables associated with the levels:

    | Level | Variable  | Description                                  |
    |-------|-----------|----------------------------------------------|
    | 2     | `school`  | 50 schools code 1–50                         |
    | 2     | `mraven`  | mean raven score of school (overall mean 0)  |
    | 1     | `social`  | class of the father (categorical)            |
    | 1     | `gcraven` | centered test score (mean for each school 0) |
    | 1     | `math`    | score on Maths                               |

    Fit the following model containing school-specific intercepts and
    slopes with `lme4::lmer()`

    $$
    \begin{align*}
    \text{(Level 1)} \quad y_{ij} &= b_{0i} + b_{1i}\,gcraven_{ij} + b_{2i}\,social_{ij} + b_{3i}\,(gcraven_{ij}\times social_{ij}) + \varepsilon_{ij}\\
    \text{(Level 2)} \quad b_{0i} &= \beta_0 + \beta_4\,mraven_i + \upsilon_{0i} \\
                     \quad b_{1i} &= \beta_1 + \beta_5\,mraven_i + \upsilon_{1i}\\
                     \quad b_{2i} &= \beta_2\\
                     \quad b_{3i} &= \beta_3\\
    \text{(2) in (1)} \quad y_{ij} &= \beta_{0} + \beta_{1}\,gcraven_{ij} + \beta_{2}\,social_{ij} + \beta_{3}(gcraven_{ij}\times social_{ij})\\
                                &~~~ + \beta_{4}\,mraven_i + \beta_{5}\,(gcraven_{ij} \times mraven_{i})\\
                                &~~~ + \upsilon_{0i} + \upsilon_{1i}\,gcraven_{ij} + \varepsilon_{ij}
    \end{align*}
    $$ with
    $\boldsymbol\upsilon \sim N(\boldsymbol 0, \boldsymbol{\Sigma}_\upsilon)$
    i.i.d., $\varepsilon_{ij} \sim N(0, \sigma^2)$ i.i.d.

5.  Interpret the parameters of the model:

    - How much does math score increases if the raven score for a
      student increases by one point for the reference social class of
      the father?
    - How much does math score increases when the raven score per school
      increases by one point for the reference social class of the
      father?
    - What is your conclusion about the interactions in the model. Are
      they needed?
    - Does the inclusion of `social` improve the model fit? How can we
      test this?