-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5010e6b
commit 97da9ef
Showing
2 changed files
with
160 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Bootstrapping | ||
|
||
This package does not provide support for bootstrap standard errors at the moment. Nonetheless, it is possible to bootstrap with the existing tools. This tutorial provides some sample code. | ||
|
||
We first load some packages: | ||
```julia | ||
using StatsBase | ||
using DataFrames | ||
using CSV | ||
using Microeconometrics | ||
``` | ||
|
||
We then set up the problem: | ||
```julia | ||
S = CSV.read(joinpath(datadir, "auto.csv")) ; | ||
S[:gpmw] = ((1.0 ./ S[:mpg]) ./ S[:weight]) * 100000 ; | ||
M = Dict(:response => "gpmw", :control => "foreign + 1") ; | ||
D = Microdata(S, M) ; | ||
``` | ||
|
||
Next, we obtain the coefficient estimates: | ||
```julia | ||
E = fit(OLS, D, novar = true) ; | ||
``` | ||
|
||
We can now set up the bootstrap: | ||
```julia | ||
srand(0101) | ||
|
||
reps = 1000 ; | ||
n = nobs(E) ; | ||
wgts = fill(0, n) ; | ||
B = Array{Float64}(reps, dof(E)) ; | ||
``` | ||
The vector `wgts` will translate the draw of a bootstrap sample into an input for `Microdata`. The matrix `B` will contain the sample of coefficient estimates. Don't forget to set the seed for the sake of reproducibility! | ||
|
||
The algorithm is: | ||
```julia | ||
for b = 1:reps | ||
|
||
wgts .= 0 | ||
draw = rand(1:n, n) | ||
|
||
for d in draw | ||
wgts[d] += 1 | ||
end | ||
|
||
Db = Microdata(S, M, weights = fweights(wgts)) | ||
Eb = fit(OLS, Db, novar = true) | ||
B[b, :] = coef(Eb)' | ||
end | ||
``` | ||
Note that we do not compute the covariance matrix at each step, which saves us some time. | ||
|
||
We can finally see the results: | ||
```julia | ||
E.V = cov(B) ; | ||
coeftable(E_boot) | ||
``` | ||
The output is: | ||
```julia | ||
Estimate St. Err. t-stat. p-value C.I. (95%) | ||
foreign: Foreign 0.2462 0.0682 3.6072 0.0003 0.1124 0.3799 | ||
(Intercept) 1.609 0.0237 67.9372 <1e-99 1.5626 1.6554 | ||
``` | ||
|
||
You can easily adapt this code to more complex problems (e.g., critical values) or parallelize it for additional speed! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# Correlation structures | ||
|
||
Before fitting the model, you must specify the correlation between observations (a `CorrStructure`). It determines the calculation of the. The default is always `Heteroscedastic`, i.e. independent but not identically distributed observations. | ||
|
||
All constructors accept the Boolean keyword `adj`, which defaults to `true`. If `true`, a finite-sample adjustment is applied to the covariance matrix. The adjustment factor is n / (n - 1), where n is the number of clusters for clustered data and the number of observations otherwise. | ||
|
||
Four subtypes are currently available: | ||
|
||
## `Homoscedastic` | ||
|
||
```julia | ||
Homoscedastic(method::String = "OIM") | ||
``` | ||
Observations are independent and identically distributed. The optional argument `method` is only relevant for maximum-likelihood estimators. It controls the estimation of the covariance matrix: `"OIM"` uses the observed information matrix, whereas `"OPG"` uses the outer product of the gradient. Only linear and maximum-likelihood estimators support homoscedastic errors. | ||
|
||
## `Heteroscedastic` | ||
|
||
```julia | ||
Heteroscedastic() | ||
``` | ||
Observations are independent, but they may differ in distribution. This structure leads to sandwich covariance matrices (a.k.a. Huber-Eicker-White). | ||
|
||
## `Clustered` | ||
|
||
```julia | ||
Clustered(DF::DataFrame, cluster::Symbol) | ||
``` | ||
|
||
Observations are independent across clusters, but they may differ in their joint distribution within clusters. `cluster` specifies the column of the `DataFrame` to cluster on. | ||
|
||
## `CrossCorrelated` | ||
|
||
This structure accommodates other correlation structures. The first argument determines the precise pattern. | ||
|
||
### Two-way clustering | ||
|
||
```julia | ||
CrossCorrelated("Two-way clustering", DF::DataFrame, c₁::Symbol, c₂::Symbol) | ||
``` | ||
if two observations share any cluster, they may be arbitrarily correlated. | ||
|
||
### Correlation across time | ||
|
||
```julia | ||
CrossCorrelated("Time", | ||
DF::DataFrame, | ||
time::Symbol, | ||
bandwidth::Real, | ||
kernel::Function = parzen | ||
) | ||
``` | ||
|
||
The maximum possible correlation between two observations declines with the time difference between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidth and the kernel function control the upper bound. `time` specifies the column of `DF` that contains the date of each observation (of type `Date`). | ||
|
||
The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae. | ||
|
||
!!! warning | ||
|
||
The resulting covariance matrices differ from the Newey-West estimator, which assumes independence across units (though observations for the same unit may correlate across time). | ||
|
||
### Correlation across space | ||
|
||
```julia | ||
CrossCorrelated("Space", | ||
DF::DataFrame, | ||
latitude::Symbol, | ||
longitude::Symbol, | ||
bandwidth::Real, | ||
kernel::Function = parzen | ||
) | ||
``` | ||
|
||
The maximum possible correlation between two observations declines with the spatial distance between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidth and the kernel function control the upper bound. `latitude` and `longitude` specify the columns of `DF` that contain the coordinates of each observation in radians (of type `Float64`). | ||
|
||
The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae. | ||
|
||
### Correlation across time and space | ||
|
||
```julia | ||
CrossCorrelated("Time and space", | ||
DF::DataFrame, | ||
time::Symbol, | ||
bandwidth_time::Real, | ||
latitude::Symbol, | ||
longitude::Symbol, | ||
bandwidth_space::Real, | ||
kernel::Function = parzen | ||
) | ||
``` | ||
|
||
The maximum possible correlation between two observations declines with the time difference and the spatial distance between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidths and the kernel function control the upper bound. `time` specifies the column of `DF` that contains the date of each observation. `latitude` and `longitude` specify the columns of `DF` that contain the coordinates of each observation in radians (`Float64`). | ||
|
||
The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae. |