Fixed documentation

lbittarello · web-flow · commit 97da9eff1744 · 2018-07-16T00:42:17.000-05:00
diff --git a/docs/src/bootstrapping.md b/docs/src/bootstrapping.md
@@ -0,0 +1,67 @@
+# Bootstrapping
+
+This package does not provide support for bootstrap standard errors at the moment. Nonetheless, it is possible to bootstrap with the existing tools. This tutorial provides some sample code.
+
+We first load some packages:
+```julia
+using StatsBase
+using DataFrames
+using CSV
+using Microeconometrics
+```
+
+We then set up the problem:
+```julia
+S        = CSV.read(joinpath(datadir, "auto.csv")) ;
+S[:gpmw] = ((1.0 ./ S[:mpg]) ./ S[:weight]) * 100000 ;
+M        = Dict(:response => "gpmw", :control => "foreign + 1") ;
+D        = Microdata(S, M) ;
+```
+
+Next, we obtain the coefficient estimates:
+```julia
+E = fit(OLS, D, novar = true) ;
+```
+
+We can now set up the bootstrap:
+```julia
+srand(0101)
+
+reps = 1000 ;
+n    = nobs(E) ;
+wgts = fill(0, n) ;
+B    = Array{Float64}(reps, dof(E)) ;
+```
+The vector `wgts` will translate the draw of a bootstrap sample into an input for `Microdata`. The matrix `B` will contain the sample of coefficient estimates. Don't forget to set the seed for the sake of reproducibility!
+
+The algorithm is:
+```julia
+for b = 1:reps
+
+    wgts .= 0
+    draw  = rand(1:n, n)
+
+    for d in draw
+        wgts[d] += 1
+    end
+
+    Db      = Microdata(S, M, weights = fweights(wgts))
+    Eb      = fit(OLS, Db, novar = true)
+    B[b, :] = coef(Eb)'
+end
+```
+Note that we do not compute the covariance matrix at each step, which saves us some time.
+
+We can finally see the results:
+```julia
+E.V = cov(B) ;
+coeftable(E_boot)
+```
+The output is:
+```julia
+                   Estimate  St. Err.   t-stat.   p-value      C.I. (95%)
+foreign: Foreign     0.2462    0.0682    3.6072    0.0003    0.1124  0.3799
+(Intercept)           1.609    0.0237   67.9372    <1e-99    1.5626  1.6554
+```
+
+You can easily adapt this code to more complex problems (e.g., critical values) or parallelize it for additional speed!
diff --git a/docs/src/correlation_structures.md b/docs/src/correlation_structures.md
@@ -0,0 +1,93 @@
+# Correlation structures
+
+Before fitting the model, you must specify the correlation between observations (a `CorrStructure`). It determines the calculation of the. The default is always `Heteroscedastic`, i.e. independent but not identically distributed observations.
+
+All constructors accept the Boolean keyword `adj`, which defaults to `true`. If `true`, a finite-sample adjustment is applied to the covariance matrix. The adjustment factor is n / (n - 1), where n is the number of clusters for clustered data and the number of observations otherwise.
+
+Four subtypes are currently available:
+
+## `Homoscedastic`
+
+```julia
+Homoscedastic(method::String = "OIM")
+```
+Observations are independent and identically distributed. The optional argument `method` is only relevant for maximum-likelihood estimators. It controls the estimation of the covariance matrix: `"OIM"` uses the observed information matrix, whereas `"OPG"` uses the outer product of the gradient. Only linear and maximum-likelihood estimators support homoscedastic errors.
+
+## `Heteroscedastic`
+
+```julia
+Heteroscedastic()
+```
+Observations are independent, but they may differ in distribution. This structure leads to sandwich covariance matrices (a.k.a. Huber-Eicker-White).
+
+## `Clustered`
+
+```julia
+Clustered(DF::DataFrame, cluster::Symbol)
+```
+
+Observations are independent across clusters, but they may differ in their joint distribution within clusters. `cluster` specifies the column of the `DataFrame` to cluster on.
+
+## `CrossCorrelated`
+
+This structure accommodates other correlation structures. The first argument determines the precise pattern.
+
+### Two-way clustering
+
+```julia
+CrossCorrelated("Two-way clustering", DF::DataFrame, c₁::Symbol, c₂::Symbol)
+```
+if two observations share any cluster, they may be arbitrarily correlated.
+
+### Correlation across time
+
+```julia
+CrossCorrelated("Time",
+        DF::DataFrame,
+        time::Symbol,
+        bandwidth::Real,
+        kernel::Function = parzen
+    )
+```
+
+The maximum possible correlation between two observations declines with the time difference between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidth and the kernel function control the upper bound. `time` specifies the column of `DF` that contains the date of each observation (of type `Date`).
+
+The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae.
+
+!!! warning
+
+    The resulting covariance matrices differ from the Newey-West estimator, which assumes independence across units (though observations for the same unit may correlate across time).
+
+### Correlation across space
+
+```julia
+CrossCorrelated("Space",
+        DF::DataFrame,
+        latitude::Symbol,
+        longitude::Symbol,
+        bandwidth::Real,
+        kernel::Function = parzen
+    )
+```
+
+The maximum possible correlation between two observations declines with the spatial distance between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidth and the kernel function control the upper bound. `latitude` and `longitude` specify the columns of `DF` that contain the coordinates of each observation in radians (of type `Float64`).
+
+The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae.
+
+### Correlation across time and space
+
+```julia
+CrossCorrelated("Time and space",
+        DF::DataFrame,
+        time::Symbol,
+        bandwidth_time::Real,
+        latitude::Symbol,
+        longitude::Symbol,
+        bandwidth_space::Real,
+        kernel::Function = parzen
+    )
+```
+
+The maximum possible correlation between two observations declines with the time difference and the spatial distance between them. The actual correlation is arbitrary below that limit. (See [Conley (1999)](https://www.sciencedirect.com/science/article/pii/S0304407698000840).) The bandwidths and the kernel function control the upper bound. `time` specifies the column of `DF` that contains the date of each observation. `latitude` and `longitude` specify the columns of `DF` that contain the coordinates of each observation in radians (`Float64`).
+
+The following kernels are predefined for convenience: Bartlett (`bartlett`), Parzen (`parzen`), Truncated (`truncated`) and Tukey-Hanning (`tukeyhanning`). See [Andrews (1991)](http://jstor.org/stable/2938229) for formulae.