Skip to content

Commit

Permalink
Fixed documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
lbittarello authored Jul 16, 2018
1 parent 5010e6b commit 97da9ef
Show file tree
Hide file tree
Showing 2 changed files with 160 additions and 0 deletions.
67 changes: 67 additions & 0 deletions docs/src/bootstrapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Bootstrapping

This package does not provide support for bootstrap standard errors at the moment. Nonetheless, it is possible to bootstrap with the existing tools. This tutorial provides some sample code.

We first load some packages:
```julia
using StatsBase
using DataFrames
using CSV
using Microeconometrics
```

We then set up the problem:
```julia
S = CSV.read(joinpath(datadir, "auto.csv")) ;
S[:gpmw] = ((1.0 ./ S[:mpg]) ./ S[:weight]) * 100000 ;
M = Dict(:response => "gpmw", :control => "foreign + 1") ;
D = Microdata(S, M) ;
```

Next, we obtain the coefficient estimates:
```julia
E = fit(OLS, D, novar = true) ;
```

We can now set up the bootstrap:
```julia
srand(0101)

reps = 1000 ;
n = nobs(E) ;
wgts = fill(0, n) ;
B = Array{Float64}(reps, dof(E)) ;
```
The vector `wgts` will translate the draw of a bootstrap sample into an input for `Microdata`. The matrix `B` will contain the sample of coefficient estimates. Don't forget to set the seed for the sake of reproducibility!

The algorithm is:
```julia
for b = 1:reps

wgts .= 0
draw = rand(1:n, n)

for d in draw
wgts[d] += 1
end

Db = Microdata(S, M, weights = fweights(wgts))
Eb = fit(OLS, Db, novar = true)
B[b, :] = coef(Eb)'
end
```
Note that we do not compute the covariance matrix at each step, which saves us some time.

We can finally see the results:
```julia
E.V = cov(B) ;
coeftable(E_boot)
```
The output is:
```julia
Estimate St. Err. t-stat. p-value C.I. (95%)
foreign: Foreign 0.2462 0.0682 3.6072 0.0003 0.1124 0.3799
(Intercept) 1.609 0.0237 67.9372 <1e-99 1.5626 1.6554
```

You can easily adapt this code to more complex problems (e.g., critical values) or parallelize it for additional speed!
93 changes: 93 additions & 0 deletions docs/src/correlation_structures.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Correlation structures

Before fitting the model, you must specify the correlation between observations (a `CorrStructure`). It determines the calculation of the. The default is always `Heteroscedastic`, i.e. independent but not identically distributed observations.

All constructors accept the Boolean keyword `adj`, which defaults to `true`. If `true`, a finite-sample adjustment is applied to the covariance matrix. The adjustment factor is n / (n - 1), where n is the number of clusters for clustered data and the number of observations otherwise.

Four subtypes are currently available:

## `Homoscedastic`

```julia
Homoscedastic(method::String = "OIM")
```
Observations are independent and identically distributed. The optional argument `method` is only relevant for maximum-likelihood estimators. It controls the estimation of the covariance matrix: `"OIM"` uses the observed information matrix, whereas `"OPG"` uses the outer product of the gradient. Only linear and maximum-likelihood estimators support homoscedastic errors.

## `Heteroscedastic`

```julia
Heteroscedastic()
```
Observations are independent, but they may differ in distribution. This structure leads to sandwich covariance matrices (a.k.a. Huber-Eicker-White).

## `Clustered`

```julia
Clustered(DF::DataFrame, cluster::Symbol)
```

Observations are independent across clusters, but they may differ in their joint distribution within clusters. `cluster` specifies the column of the `DataFrame` to cluster on.

## `CrossCorrelated`

This structure accommodates other correlation structures. The first argument determines the precise pattern.

### Two-way clustering

```julia
CrossCorrelated("Two-way clustering", DF::DataFrame, c₁::Symbol, c₂::Symbol)
```
if two observations share any cluster, they may be arbitrarily correlated.

### Correlation across time

```julia
CrossCorrelated("Time",
DF::DataFrame,
time::Symbol,
bandwidth::Real,
kernel::Function = parzen
)
```

The maximum possible correlation between two observations declines with the time difference between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidth and the kernel function control the upper bound. `time` specifies the column of `DF` that contains the date of each observation (of type `Date`).

The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae.

!!! warning

The resulting covariance matrices differ from the Newey-West estimator, which assumes independence across units (though observations for the same unit may correlate across time).

### Correlation across space

```julia
CrossCorrelated("Space",
DF::DataFrame,
latitude::Symbol,
longitude::Symbol,
bandwidth::Real,
kernel::Function = parzen
)
```

The maximum possible correlation between two observations declines with the spatial distance between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidth and the kernel function control the upper bound. `latitude` and `longitude` specify the columns of `DF` that contain the coordinates of each observation in radians (of type `Float64`).

The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae.

### Correlation across time and space

```julia
CrossCorrelated("Time and space",
DF::DataFrame,
time::Symbol,
bandwidth_time::Real,
latitude::Symbol,
longitude::Symbol,
bandwidth_space::Real,
kernel::Function = parzen
)
```

The maximum possible correlation between two observations declines with the time difference and the spatial distance between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidths and the kernel function control the upper bound. `time` specifies the column of `DF` that contains the date of each observation. `latitude` and `longitude` specify the columns of `DF` that contain the coordinates of each observation in radians (`Float64`).

The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae.

0 comments on commit 97da9ef

Please sign in to comment.