Skip to content

Commit d13d20a

Browse files
authored
Merge pull request #13 from gragusa/improv
Merging this and prepare for release.
2 parents 4465a7e + 9049395 commit d13d20a

File tree

3 files changed

+139
-85
lines changed

3 files changed

+139
-85
lines changed

README.md

+124-81
Original file line numberDiff line numberDiff line change
@@ -1,115 +1,158 @@
11
# CovarianceMatrices.jl
22

33
[![Build Status](https://travis-ci.org/gragusa/CovarianceMatrices.jl.svg?branch=master)](https://travis-ci.org/gragusa/CovarianceMatrices.jl)
4-
54
[![CovarianceMatrices](http://pkg.julialang.org/badges/CovarianceMatrices_0.5.svg)](http://pkg.julialang.org/?pkg=CovarianceMatrices&ver=0.5)
6-
75
[![Coverage Status](https://coveralls.io/repos/gragusa/CovarianceMatrices.jl/badge.svg?branch=master&service=github)](https://coveralls.io/github/gragusa/CovarianceMatrices.jl?branch=master)
86
[![codecov.io](http://codecov.io/github/gragusa/CovarianceMatrices.jl/coverage.svg?branch=master)](http://codecov.io/github/gragusa/CovarianceMatrices.jl?branch=master)
97

108
Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation for Julia.
119

12-
## Status
13-
14-
- [x] Basic interface
15-
- [x] Implement Type0-Type4 (HC0, HC1, HC2, HC3, HC4) variances
16-
- [x] Implement Type5-Type5c
17-
- [x] HAC with manual bandwidth
18-
- [x] HAC automatic bandwidth (Andrews)
19-
- [x] AR(1) model
20-
- [ ] AR(p) model
21-
- [ ] MA(p) model
22-
- [ ] ARMA(p) model
23-
24-
- [ ] HAC automatic bandwidth (Newey-West)
25-
- [ ] De-meaned versions: E[(X-\mu)(X-\mu)']
26-
- [x] VARHAC
27-
- [ ] Extend API to allow passing option to automatic bandwidth selection methods
28-
- [x] Cluster Robust HC
29-
- [ ] Two-way cluster robust
30-
- [x] Interface with `GLM.jl`
31-
- [ ] Drop-in `show` method for `GLM.jl`
32-
33-
## Install
10+
## Installation
3411

35-
```
12+
The package is registered on [METADATA](http::/github.com/JuliaLang/METADATA.jl), so installing it amounts to issue
13+
```julia
3614
Pkg.add("CovarianceMatrices")
3715
```
3816

39-
## Basic usage
17+
## Introduction
18+
19+
This package provides types and methods useful to obtain consistent estimates of the long run covariance matrix of a random process.
20+
21+
Three classes of estimators are considered:
22+
23+
1. **HAC** - heteroskedasticity and autocorrelation consistent (Andrews, 1996; Newey and West)
24+
2. **HC** - hetheroskedasticity (White, 1982)
25+
3. **CRVE** - cluster robust (Arellano, 1986)
26+
27+
The typical application of these estimators is in inference about the parameters of a models. Accordingly, this package extends methods defined in [StatsBase.jl](http://github.com/JuliaStat/StatsBase.jl) and [GLM.jl](http://github.com/JuliaStat/GLM.jl) to make it easy obtaining inference that is robust to heteroskedasticity and/or autocorrelation, or presence of cluster components.
28+
29+
# Quick tour
30+
31+
## HAC (Heteroskedasticity and Autocorrelation Consistent)
32+
33+
Available kernel types are:
34+
35+
- `TruncatedKernel`
36+
- `BartlettKernel`
37+
- `ParzenKernel`
38+
- `TukeyHanningKernel`
39+
- `QuadraticSpectralKernel`
40+
41+
For example, `ParzenKernel(NeweyWest)` return an instance of `TruncatedKernel` parametrized by `NeweyWest`, the type that corresponds to the optimal bandwidth calculated following Newey and West (1994). Similarly, `ParzenKernel(Andrews)` corresponds to the optimal bandwidth obtained in Andrews (1991). If the bandwidth is known, it can be directly passed, i.e. `TruncatedKernel(2)`.
42+
43+
The examples below hopefully clarify the API, that is however relatively simple to use.
4044

41-
### Heteroskedasticity Robust
45+
### Long run variance of the regression coefficient
4246

47+
In the regression context, the function `vcov` does all the work. Its API` is
48+
```julia
49+
vcov(::DataFrameRegressionModel, ::HAC; prewhite = true)
4350
```
51+
52+
The example below, describe a typical call to it on a regression model on generated data (a regression with autoregressive error component):
53+
```julia
4454
using CovarianceMatrices
4555
using DataFrames
46-
using GLM
56+
srand(1)
57+
n = 500
58+
x = randn(n,5)
59+
u = Array{Float64}(2*n)
60+
u[1] = rand()
61+
for j in 2:2*n
62+
u[j] = 0.78*u[j-1] + randn()
63+
end
64+
u = u[n+1:2*n]
65+
y = 0.1 + x*[0.2, 0.3, 0.0, 0.0, 0.5] + u
4766

48-
# A Gamma example, from McCullagh & Nelder (1989, pp. 300-2)
49-
# The weights are added just to test the interface and are not part
50-
# of the original data
51-
clotting = DataFrame(
52-
u = log([5,10,15,20,30,40,60,80,100]),
53-
lot1 = [118,58,42,35,27,25,21,19,18],
54-
lot2 = [69,35,26,21,18,16,13,12,12],
55-
w = [1/8, 1/9, 1/25, 1/6, 1/14, 1/25, 1/15, 1/13, 0.3022039]
56-
)
57-
wOLS = fit(GeneralizedLinearModel, lot1~u, clotting, Normal(), wts = array(clotting[:w]))
67+
df = DataFrame()
68+
df[:y] = y
69+
for j in enumerate([:x1, :x2, :x3, :x4, :x5])
70+
df[j[2]] = x[:,j[1]]
71+
end
72+
```
73+
Using the data in `df`, the coefficient of the regression can be estimated using `GLM`
5874

59-
vcov(wOLS, HC0())
60-
vcov(wOLS, HC1())
61-
vcov(wOLS, HC2())
62-
vcov(wOLS, HC3())
63-
vcov(wOLS, HC4())
75+
```julia
76+
lm1 = glm(y~x1+x2+x3+x4+x5, df, Normal(), IdentityLink())
77+
```
6478

65-
stderr(wOLS, HC0())
66-
stderr(wOLS, HC1())
67-
stderr(wOLS, HC2())
68-
stderr(wOLS, HC3())
69-
stderr(wOLS, HC4())
79+
To get a consistent estimate of the long run variance of the estimated coefficient using a Quadratic Spectral kernel with automatic bandwidth selection a la' Andrews
80+
```julia
81+
vcov(lm1, QuadraticSpectralKernel(Andrews), prewhite = false)
82+
```
83+
If one wants to estimate the long-time variance using the same kernel, but with a bandwidth selected a la' Newey-West
84+
```julia
85+
vcov(lm1, QuadraticSpectralKernel(NeweyWest), prewhite = false)
86+
```
87+
The standard errors can be obtained by the `stderr` function
88+
```julia
89+
vcov(::DataFrameRegressionModel, ::HAC; prewhite = true)
90+
```
91+
Sometime is useful to access the bandwidth automatically selected. This can be done using the `optimalbw`
92+
```julia
93+
optimalbw(NeweyWest, QuadraticSpectralKernel, lm1; prewhite = false)
94+
optimalbw(Andrews, QuadraticSpectralKernel, lm1; prewhite = false)
95+
```
7096

97+
### Long run variance of the average of the process
98+
99+
Sometime is of interest estimating the long-run variance of the average of the process. At the moment this can be done by carrying out a regression on a constant (the sample mean of the realization of the process) and using `vcov` or `stderr` to obtain a consistent variance.
100+
101+
```julia
102+
lm2 = glm(u~1, df, Normal(), IdentityLink())
103+
vcov(lm1, ParzenKernel(NeweyWest), prewhite = false)
104+
stderr(lm1, ParzenKernel(NeweyWest), prewhite = false)
71105
```
72106

73-
### Heteroskedasticity and Autocorrelation Robust
107+
## HC (Heteroskedastic consistent)
74108

109+
As in the HAC case, `vcov` and `stderr` are the main functions. They know get as argument the type of robust variance being sought
110+
```julia
111+
vcov(::DataFrameRegressionModel, ::HC)
75112
```
76-
## Simulated AR(1) and estimate it using OLS
77-
srand(1)
78-
y = zeros(Float64, 100)
79-
rho = 0.8
80-
y[1] = randn()
81-
for j = 2:100
82-
y[j] = rho * y[j-1] + randn()
83-
end
113+
Where HC is an abstract type with the following concrete types:
84114

85-
data = DataFrame(y = y[2:100], yl = y[1:99])
86-
AR1 = fit(GeneralizedLinearModel, y~yl, data, Normal())
115+
- `HC0`
116+
- `HC1` (this is `HC0` with the degree of freedom adjustment)
117+
- `HC2`
118+
- `HC3`
119+
- `HC4`
120+
- `HC4m`
121+
- `HC5`
87122

88-
## Default assume iid (which is correct in this case)
89-
vcov(AR1)
90123

91-
## Truncated kernel (TruncatedKernel)
92-
vcov(AR1, TruncatedKernel(0)) ## Same as iid because bandwidth = 0
93-
vcov(AR1, TruncatedKernel(1))
94-
vcov(AR1, TruncatedKernel(2))
95-
vcov(AR1, TruncatedKernel()) ## Optimal bandwidth
124+
```
125+
using CovarianceMatrices
126+
using DataFrames
127+
using GLM
96128
97-
## Bartelett kernel
98-
vcov(AR1, BartlettKernel(0)) ## Same as iid because bandwidth = 0
99-
vcov(AR1, BartlettKernel(1))
100-
vcov(AR1, BartlettKernel(2))
101-
vcov(AR1, BartlettKernel()) ## Optimal bandwidth
129+
# A Gamma example, from McCullagh & Nelder (1989, pp. 300-2)
130+
# The weights are added just to test the interface and are not part
131+
# of the original data
132+
clotting = DataFrame(
133+
u = log([5,10,15,20,30,40,60,80,100]),
134+
lot1 = [118,58,42,35,27,25,21,19,18],
135+
lot2 = [69,35,26,21,18,16,13,12,12],
136+
w = 9*[1/8, 1/9, 1/25, 1/6, 1/14, 1/25, 1/15, 1/13, 0.3022039]
137+
)
138+
wOLS = fit(GeneralizedLinearModel, lot1~u, clotting, Normal(), wts = array(clotting[:w]))
102139
103-
## Parzen kernel
104-
vcov(AR1, ParzenKernel(0)) ## Same as iid because bandwidth = 0
105-
vcov(AR1, ParzenKernel(1))
106-
vcov(AR1, ParzenKernel(2))
107-
vcov(AR1, ParzenKernel()) ## Optimal bandwidth
140+
vcov(wOLS, HC0
141+
vcov(wOLS, HC1)
142+
vcov(wOLS, HC2)
143+
vcov(wOLS, HC3)
144+
vcov(wOLS, HC4)
145+
vcov(wOLS, HC4m)
146+
vcov(wOLS, HC5)
147+
```
108148

109-
## Quadratic-Spectral kernel
110-
vcov(AR1, QuadraticSpectralKernel(0.1))
111-
vcov(AR1, QuadraticSpectralKernel(.5))
112-
vcov(AR1, QuadraticSpectralKernel(2.))
113-
vcov(AR1, QuadraticSpectralKernel()) ## Optimal bandwidth
149+
## CRHC (Cluster robust heteroskedasticty consistent)
150+
The API of this class of variance estimators is subject to change, so please use with care. In particular, since the `CRHC` type needs to know with variable is used for clustering---a good way pass it must be thought out. For the moment, the following approach works --- as long as no missing values are present in the original dataframe.
114151

152+
```julia
153+
using RDatasets
154+
df = dataset("plm", "Grunfeld")
155+
lm = glm(Inv~Value+Capital, df, Normal(), IdentityLink())
156+
vcov(lm, CRHC1(convert(Array, df[:Firm])))
157+
stderr(lm, CRHC1(convert(Array, df[:Firm])))
115158
```

src/HAC.jl

+4-1
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ function Γ(X::AbstractMatrix, j::Int64)
174174
end
175175

176176
vcov(X::AbstractMatrix, k::VARHAC) = varhac(X, k.imax, k.ilag, k.imodel)
177-
177+
178178
function vcov(X::AbstractMatrix, k::HAC, bw, D, prewhite::Bool)
179179
n, p = size(X)
180180
Q = zeros(p, p)
@@ -248,6 +248,9 @@ function vcov{T<:OptimalBandwidth}(r::DataFrameRegressionModel, k::HAC{Optimal{T
248248
end
249249

250250
vcov{T<:Fixed}(r::DataFrameRegressionModel, k::HAC{Optimal{T}}; args...) = vcov(r.model, k; args...)
251+
stderr(x::DataFrameRegressionModel, k::HAC; kwargs...) = sqrt.(diag(vcov(x, k; kwargs...)))
252+
253+
251254

252255
vcov(r::DataFrameRegressionModel, k::VARHAC) = vcov(r.model, k)
253256

src/HC.jl

+11-3
Original file line numberDiff line numberDiff line change
@@ -100,9 +100,17 @@ function meat(l::LinPredModel, k::HC)
100100
scale!(Base.LinAlg.At_mul_B(z, z.*u), 1/nobs(l))
101101
end
102102

103-
vcov(x::DataFrameRegressionModel, k::RobustVariance) = vcov(x.model, k)
104-
stderr(x::DataFrameRegressionModel, k::RobustVariance) = sqrt(diag(vcov(x, k)))
105-
stderr(x::LinPredModel, k::RobustVariance) = sqrt(diag(vcov(x, k)))
103+
vcov(x::DataFrameRegressionModel, k::HC) = vcov(x.model, k)
104+
105+
vcov{T<:RobustVariance}(x::DataFrameRegressionModel, k::Type{T}) = vcov(x.model, k())
106+
107+
stderr{T<:HC}(x::DataFrameRegressionModel, k::Type{T}) = sqrt(diag(vcov(x, k())))
108+
109+
stderr{T<:CRHC}(x::DataFrameRegressionModel, k::T) = sqrt(diag(vcov(x, k)))
110+
111+
stderr(x::DataFrameRegressionModel, k::HC) = sqrt(diag(vcov(x, k)))
112+
113+
stderr(x::LinPredModel, k::HC) = sqrt(diag(vcov(x, k)))
106114

107115
################################################################################
108116
## Cluster

0 commit comments

Comments
 (0)