-
-
Notifications
You must be signed in to change notification settings - Fork 57
/
Copy pathmultilevel.Rmd
135 lines (102 loc) · 3.43 KB
/
multilevel.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: "Multilevel Correlations"
output:
rmarkdown::html_vignette:
toc: true
fig_width: 10.08
fig_height: 6
tags: [r, correlation, types]
vignette: >
%\VignetteIndexEntry{Multilevel Correlations}
\usepackage[utf8]{inputenc}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
bibliography: bibliography.bib
---
---
This vignette can be cited as:
```{r cite}
citation("correlation")
```
---
```{r, include=FALSE}
library(knitr)
options(
knitr.kable.NA = "",
digits = 2,
out.width = "100%",
message = FALSE,
warning = FALSE,
dpi = 450
)
if (!requireNamespace("ggplot2", quietly = TRUE) ||
!requireNamespace("BayesFactor", quietly = TRUE) ||
!requireNamespace("lme4", quietly = TRUE)) {
knitr::opts_chunk$set(eval = FALSE)
}
```
## Data
Imagine we have an experiment in which **10 individuals** completed a task with
**100 trials**. For each of the 1000 trials (10 * 100) in total,
we measured two things, **V1** and **V2**, and we are interested in
**investigating the link between these two variables**.
We will generate data using the
[`simulate_simpson()`](https://easystats.github.io/bayestestR/reference/simulate_simpson.html)
function from this package and look at its summary:
```{r}
library(correlation)
data <- simulate_simpson(n = 100, groups = 10)
summary(data)
```
Now let's visualize the two variables:
```{r}
library(ggplot2)
ggplot(data, aes(x = V1, y = V2)) +
geom_point() +
geom_smooth(colour = "black", method = "lm", se = FALSE) +
theme_classic()
```
That seems pretty straightforward! It seems like there is a **negative
correlation** between V1 and V2. Let's test this.
## Simple correlation
```{r}
correlation(data)
```
Indeed, there is a **strong, negative and significant correlation** between V1
and V2.
Great, can we go ahead and **publish these results in _PNAS_**?
## The Simpson's Paradox
Not so fast! Ever heard of the [**Simpson's Paradox**](https://en.wikipedia.org/wiki/Simpson%27s_paradox)?
Let's colour our datapoints by group (by individuals):
```{r}
library(ggplot2)
ggplot(data, aes(x = V1, y = V2)) +
geom_point(aes(colour = Group)) +
geom_smooth(aes(colour = Group), method = "lm", se = FALSE) +
geom_smooth(colour = "black", method = "lm", se = FALSE) +
theme_classic()
```
Mmh, interesting. It seems like, for each subject, the relationship is
different. The (global) negative trend seems to be an artifact of **differences between the groups** and could be spurious!
**Multilevel *(as in multi-group)* ** correlations allow us to account for
**differences between groups**. It is based on a partialization of the group,
entered as a random effect in a mixed linear regression.
You can compute them with the
[**correlations**](https://github.com/easystats/correlation) package by setting
the `multilevel` argument to `TRUE`.
```{r}
correlation(data, multilevel = TRUE)
```
For completeness, let's also see if its Bayesian cousin agrees with it:
```{r}
correlation(data, multilevel = TRUE, bayesian = TRUE)
```
**Dayum!**
We were too hasty in our conclusions! Taking the group into account
seems to be super important.
_Note_: In this simple case where only two variables are of interest, it would be
of course best to directly proceed using a mixed regression model instead of
correlations. That being said, the latter can be useful for exploratory
analysis, when multiple variables are of interest, or in combination with a
network or structural approach.