Skip to content

Commit 97da9ef

Browse files
authored
Fixed documentation
1 parent 5010e6b commit 97da9ef

File tree

2 files changed

+160
-0
lines changed

2 files changed

+160
-0
lines changed

docs/src/bootstrapping.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Bootstrapping
2+
3+
This package does not provide support for bootstrap standard errors at the moment. Nonetheless, it is possible to bootstrap with the existing tools. This tutorial provides some sample code.
4+
5+
We first load some packages:
6+
```julia
7+
using StatsBase
8+
using DataFrames
9+
using CSV
10+
using Microeconometrics
11+
```
12+
13+
We then set up the problem:
14+
```julia
15+
S = CSV.read(joinpath(datadir, "auto.csv")) ;
16+
S[:gpmw] = ((1.0 ./ S[:mpg]) ./ S[:weight]) * 100000 ;
17+
M = Dict(:response => "gpmw", :control => "foreign + 1") ;
18+
D = Microdata(S, M) ;
19+
```
20+
21+
Next, we obtain the coefficient estimates:
22+
```julia
23+
E = fit(OLS, D, novar = true) ;
24+
```
25+
26+
We can now set up the bootstrap:
27+
```julia
28+
srand(0101)
29+
30+
reps = 1000 ;
31+
n = nobs(E) ;
32+
wgts = fill(0, n) ;
33+
B = Array{Float64}(reps, dof(E)) ;
34+
```
35+
The vector `wgts` will translate the draw of a bootstrap sample into an input for `Microdata`. The matrix `B` will contain the sample of coefficient estimates. Don't forget to set the seed for the sake of reproducibility!
36+
37+
The algorithm is:
38+
```julia
39+
for b = 1:reps
40+
41+
wgts .= 0
42+
draw = rand(1:n, n)
43+
44+
for d in draw
45+
wgts[d] += 1
46+
end
47+
48+
Db = Microdata(S, M, weights = fweights(wgts))
49+
Eb = fit(OLS, Db, novar = true)
50+
B[b, :] = coef(Eb)'
51+
end
52+
```
53+
Note that we do not compute the covariance matrix at each step, which saves us some time.
54+
55+
We can finally see the results:
56+
```julia
57+
E.V = cov(B) ;
58+
coeftable(E_boot)
59+
```
60+
The output is:
61+
```julia
62+
Estimate St. Err. t-stat. p-value C.I. (95%)
63+
foreign: Foreign 0.2462 0.0682 3.6072 0.0003 0.1124 0.3799
64+
(Intercept) 1.609 0.0237 67.9372 <1e-99 1.5626 1.6554
65+
```
66+
67+
You can easily adapt this code to more complex problems (e.g., critical values) or parallelize it for additional speed!

docs/src/correlation_structures.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Correlation structures
2+
3+
Before fitting the model, you must specify the correlation between observations (a `CorrStructure`). It determines the calculation of the. The default is always `Heteroscedastic`, i.e. independent but not identically distributed observations.
4+
5+
All constructors accept the Boolean keyword `adj`, which defaults to `true`. If `true`, a finite-sample adjustment is applied to the covariance matrix. The adjustment factor is n / (n - 1), where n is the number of clusters for clustered data and the number of observations otherwise.
6+
7+
Four subtypes are currently available:
8+
9+
## `Homoscedastic`
10+
11+
```julia
12+
Homoscedastic(method::String = "OIM")
13+
```
14+
Observations are independent and identically distributed. The optional argument `method` is only relevant for maximum-likelihood estimators. It controls the estimation of the covariance matrix: `"OIM"` uses the observed information matrix, whereas `"OPG"` uses the outer product of the gradient. Only linear and maximum-likelihood estimators support homoscedastic errors.
15+
16+
## `Heteroscedastic`
17+
18+
```julia
19+
Heteroscedastic()
20+
```
21+
Observations are independent, but they may differ in distribution. This structure leads to sandwich covariance matrices (a.k.a. Huber-Eicker-White).
22+
23+
## `Clustered`
24+
25+
```julia
26+
Clustered(DF::DataFrame, cluster::Symbol)
27+
```
28+
29+
Observations are independent across clusters, but they may differ in their joint distribution within clusters. `cluster` specifies the column of the `DataFrame` to cluster on.
30+
31+
## `CrossCorrelated`
32+
33+
This structure accommodates other correlation structures. The first argument determines the precise pattern.
34+
35+
### Two-way clustering
36+
37+
```julia
38+
CrossCorrelated("Two-way clustering", DF::DataFrame, c₁::Symbol, c₂::Symbol)
39+
```
40+
if two observations share any cluster, they may be arbitrarily correlated.
41+
42+
### Correlation across time
43+
44+
```julia
45+
CrossCorrelated("Time",
46+
DF::DataFrame,
47+
time::Symbol,
48+
bandwidth::Real,
49+
kernel::Function = parzen
50+
)
51+
```
52+
53+
The maximum possible correlation between two observations declines with the time difference between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidth and the kernel function control the upper bound. `time` specifies the column of `DF` that contains the date of each observation (of type `Date`).
54+
55+
The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae.
56+
57+
!!! warning
58+
59+
The resulting covariance matrices differ from the Newey-West estimator, which assumes independence across units (though observations for the same unit may correlate across time).
60+
61+
### Correlation across space
62+
63+
```julia
64+
CrossCorrelated("Space",
65+
DF::DataFrame,
66+
latitude::Symbol,
67+
longitude::Symbol,
68+
bandwidth::Real,
69+
kernel::Function = parzen
70+
)
71+
```
72+
73+
The maximum possible correlation between two observations declines with the spatial distance between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidth and the kernel function control the upper bound. `latitude` and `longitude` specify the columns of `DF` that contain the coordinates of each observation in radians (of type `Float64`).
74+
75+
The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae.
76+
77+
### Correlation across time and space
78+
79+
```julia
80+
CrossCorrelated("Time and space",
81+
DF::DataFrame,
82+
time::Symbol,
83+
bandwidth_time::Real,
84+
latitude::Symbol,
85+
longitude::Symbol,
86+
bandwidth_space::Real,
87+
kernel::Function = parzen
88+
)
89+
```
90+
91+
The maximum possible correlation between two observations declines with the time difference and the spatial distance between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidths and the kernel function control the upper bound. `time` specifies the column of `DF` that contains the date of each observation. `latitude` and `longitude` specify the columns of `DF` that contain the coordinates of each observation in radians (`Float64`).
92+
93+
The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae.

0 commit comments

Comments
 (0)