|
| 1 | +# Correlation structures |
| 2 | + |
| 3 | +Before fitting the model, you must specify the correlation between observations (a `CorrStructure`). It determines the calculation of the. The default is always `Heteroscedastic`, i.e. independent but not identically distributed observations. |
| 4 | + |
| 5 | +All constructors accept the Boolean keyword `adj`, which defaults to `true`. If `true`, a finite-sample adjustment is applied to the covariance matrix. The adjustment factor is n / (n - 1), where n is the number of clusters for clustered data and the number of observations otherwise. |
| 6 | + |
| 7 | +Four subtypes are currently available: |
| 8 | + |
| 9 | +## `Homoscedastic` |
| 10 | + |
| 11 | +```julia |
| 12 | +Homoscedastic(method::String = "OIM") |
| 13 | +``` |
| 14 | +Observations are independent and identically distributed. The optional argument `method` is only relevant for maximum-likelihood estimators. It controls the estimation of the covariance matrix: `"OIM"` uses the observed information matrix, whereas `"OPG"` uses the outer product of the gradient. Only linear and maximum-likelihood estimators support homoscedastic errors. |
| 15 | + |
| 16 | +## `Heteroscedastic` |
| 17 | + |
| 18 | +```julia |
| 19 | +Heteroscedastic() |
| 20 | +``` |
| 21 | +Observations are independent, but they may differ in distribution. This structure leads to sandwich covariance matrices (a.k.a. Huber-Eicker-White). |
| 22 | + |
| 23 | +## `Clustered` |
| 24 | + |
| 25 | +```julia |
| 26 | +Clustered(DF::DataFrame, cluster::Symbol) |
| 27 | +``` |
| 28 | + |
| 29 | +Observations are independent across clusters, but they may differ in their joint distribution within clusters. `cluster` specifies the column of the `DataFrame` to cluster on. |
| 30 | + |
| 31 | +## `CrossCorrelated` |
| 32 | + |
| 33 | +This structure accommodates other correlation structures. The first argument determines the precise pattern. |
| 34 | + |
| 35 | +### Two-way clustering |
| 36 | + |
| 37 | +```julia |
| 38 | +CrossCorrelated("Two-way clustering", DF::DataFrame, c₁::Symbol, c₂::Symbol) |
| 39 | +``` |
| 40 | +if two observations share any cluster, they may be arbitrarily correlated. |
| 41 | + |
| 42 | +### Correlation across time |
| 43 | + |
| 44 | +```julia |
| 45 | +CrossCorrelated("Time", |
| 46 | + DF::DataFrame, |
| 47 | + time::Symbol, |
| 48 | + bandwidth::Real, |
| 49 | + kernel::Function = parzen |
| 50 | + ) |
| 51 | +``` |
| 52 | + |
| 53 | +The maximum possible correlation between two observations declines with the time difference between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidth and the kernel function control the upper bound. `time` specifies the column of `DF` that contains the date of each observation (of type `Date`). |
| 54 | + |
| 55 | +The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae. |
| 56 | + |
| 57 | +!!! warning |
| 58 | + |
| 59 | + The resulting covariance matrices differ from the Newey-West estimator, which assumes independence across units (though observations for the same unit may correlate across time). |
| 60 | + |
| 61 | +### Correlation across space |
| 62 | + |
| 63 | +```julia |
| 64 | +CrossCorrelated("Space", |
| 65 | + DF::DataFrame, |
| 66 | + latitude::Symbol, |
| 67 | + longitude::Symbol, |
| 68 | + bandwidth::Real, |
| 69 | + kernel::Function = parzen |
| 70 | + ) |
| 71 | +``` |
| 72 | + |
| 73 | +The maximum possible correlation between two observations declines with the spatial distance between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidth and the kernel function control the upper bound. `latitude` and `longitude` specify the columns of `DF` that contain the coordinates of each observation in radians (of type `Float64`). |
| 74 | + |
| 75 | +The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae. |
| 76 | + |
| 77 | +### Correlation across time and space |
| 78 | + |
| 79 | +```julia |
| 80 | +CrossCorrelated("Time and space", |
| 81 | + DF::DataFrame, |
| 82 | + time::Symbol, |
| 83 | + bandwidth_time::Real, |
| 84 | + latitude::Symbol, |
| 85 | + longitude::Symbol, |
| 86 | + bandwidth_space::Real, |
| 87 | + kernel::Function = parzen |
| 88 | + ) |
| 89 | +``` |
| 90 | + |
| 91 | +The maximum possible correlation between two observations declines with the time difference and the spatial distance between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidths and the kernel function control the upper bound. `time` specifies the column of `DF` that contains the date of each observation. `latitude` and `longitude` specify the columns of `DF` that contain the coordinates of each observation in radians (`Float64`). |
| 92 | + |
| 93 | +The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae. |
0 commit comments