Skip to content

Commit 33ebe18

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 5dbf795dd583119ae44cb91bd6faec3187d16e99
1 parent 8e18962 commit 33ebe18

File tree

1,326 files changed

+7010
-6919
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,326 files changed

+7010
-6919
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 2b9911387a87939f5431ebd42de4a501
3+
config: 6512fbfd1f388b7413fda4f63f754eb4
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.

dev/_downloads/521b554adefca348463adbbe047d7e99/plot_linear_model_coefficient_interpretation.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,34 @@
306306
# Also, AGE, EXPERIENCE and EDUCATION are the three variables that most
307307
# influence the model.
308308
#
309+
# Interpreting coefficients: being cautious about causality
310+
# ---------------------------------------------------------
311+
#
312+
# Linear models are a great tool for measuring statistical association, but we
313+
# should be cautious when making statements about causality, after all
314+
# correlation doesn't always imply causation. This is particularly difficult in
315+
# the social sciences because the variables we observe only function as proxies
316+
# for the underlying causal process.
317+
#
318+
# In our particular case we can think of the EDUCATION of an individual as a
319+
# proxy for their professional aptitude, the real variable we're interested in
320+
# but can't observe. We'd certainly like to think that staying in school for
321+
# longer would increase technical competency, but it's also quite possible that
322+
# causality goes the other way too. That is, those who are technically
323+
# competent tend to stay in school for longer.
324+
#
325+
# An employer is unlikely to care which case it is (or if it's a mix of both),
326+
# as long as they remain convinced that a person with more EDUCATION is better
327+
# suited for the job, they will be happy to pay out a higher WAGE.
328+
#
329+
# This confounding of effects becomes problematic when thinking about some
330+
# form of intervention e.g. government subsidies of university degrees or
331+
# promotional material encouraging individuals to take up higher education.
332+
# The usefulness of these measures could end up being overstated, especially if
333+
# the degree of confounding is strong. Our model predicts a :math:`0.054699`
334+
# increase in hourly wage for each year of education. The actual causal effect
335+
# might be lower because of this confounding.
336+
#
309337
# Checking the variability of the coefficients
310338
# --------------------------------------------
311339
#
@@ -742,6 +770,9 @@
742770
# * Coefficients must be scaled to the same unit of measure to retrieve
743771
# feature importance. Scaling them with the standard-deviation of the
744772
# feature is a useful proxy.
773+
# * Interpreting causality is difficult when there are confounding effects. If
774+
# the relationship between two variables is also affected by something
775+
# unobserved, we should be careful when making conclusions about causality.
745776
# * Coefficients in multivariate linear models represent the dependency
746777
# between a given feature and the target, **conditional** on the other
747778
# features.
Binary file not shown.

dev/_downloads/cf0f90f46eb559facf7f63f124f61e04/plot_linear_model_coefficient_interpretation.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -303,7 +303,7 @@
303303
"cell_type": "markdown",
304304
"metadata": {},
305305
"source": [
306-
"Now that the coefficients have been scaled, we can safely compare them.\n\n<div class=\"alert alert-danger\"><h4>Warning</h4><p>Why does the plot above suggest that an increase in age leads to a\n decrease in wage? Why the `initial pairplot\n <marginal_dependencies>` is telling the opposite?</p></div>\n\nThe plot above tells us about dependencies between a specific feature and\nthe target when all other features remain constant, i.e., **conditional\ndependencies**. An increase of the AGE will induce a decrease\nof the WAGE when all other features remain constant. On the contrary, an\nincrease of the EXPERIENCE will induce an increase of the WAGE when all\nother features remain constant.\nAlso, AGE, EXPERIENCE and EDUCATION are the three variables that most\ninfluence the model.\n\n## Checking the variability of the coefficients\n\nWe can check the coefficient variability through cross-validation:\nit is a form of data perturbation (related to\n[resampling](https://en.wikipedia.org/wiki/Resampling_(statistics))).\n\nIf coefficients vary significantly when changing the input dataset\ntheir robustness is not guaranteed, and they should probably be interpreted\nwith caution.\n\n"
306+
"Now that the coefficients have been scaled, we can safely compare them.\n\n<div class=\"alert alert-danger\"><h4>Warning</h4><p>Why does the plot above suggest that an increase in age leads to a\n decrease in wage? Why the `initial pairplot\n <marginal_dependencies>` is telling the opposite?</p></div>\n\nThe plot above tells us about dependencies between a specific feature and\nthe target when all other features remain constant, i.e., **conditional\ndependencies**. An increase of the AGE will induce a decrease\nof the WAGE when all other features remain constant. On the contrary, an\nincrease of the EXPERIENCE will induce an increase of the WAGE when all\nother features remain constant.\nAlso, AGE, EXPERIENCE and EDUCATION are the three variables that most\ninfluence the model.\n\n## Interpreting coefficients: being cautious about causality\n\nLinear models are a great tool for measuring statistical association, but we\nshould be cautious when making statements about causality, after all\ncorrelation doesn't always imply causation. This is particularly difficult in\nthe social sciences because the variables we observe only function as proxies\nfor the underlying causal process.\n\nIn our particular case we can think of the EDUCATION of an individual as a\nproxy for their professional aptitude, the real variable we're interested in\nbut can't observe. We'd certainly like to think that staying in school for\nlonger would increase technical competency, but it's also quite possible that\ncausality goes the other way too. That is, those who are technically\ncompetent tend to stay in school for longer.\n\nAn employer is unlikely to care which case it is (or if it's a mix of both),\nas long as they remain convinced that a person with more EDUCATION is better\nsuited for the job, they will be happy to pay out a higher WAGE.\n\nThis confounding of effects becomes problematic when thinking about some\nform of intervention e.g. government subsidies of university degrees or\npromotional material encouraging individuals to take up higher education.\nThe usefulness of these measures could end up being overstated, especially if\nthe degree of confounding is strong. Our model predicts a $0.054699$\nincrease in hourly wage for each year of education. The actual causal effect\nmight be lower because of this confounding.\n\n## Checking the variability of the coefficients\n\nWe can check the coefficient variability through cross-validation:\nit is a form of data perturbation (related to\n[resampling](https://en.wikipedia.org/wiki/Resampling_(statistics))).\n\nIf coefficients vary significantly when changing the input dataset\ntheir robustness is not guaranteed, and they should probably be interpreted\nwith caution.\n\n"
307307
]
308308
},
309309
{
@@ -682,7 +682,7 @@
682682
"cell_type": "markdown",
683683
"metadata": {},
684684
"source": [
685-
"We observe that the AGE and EXPERIENCE coefficients are varying a lot\ndepending of the fold.\n\n## Wrong causal interpretation\n\nPolicy makers might want to know the effect of education on wage to assess\nwhether or not a certain policy designed to entice people to pursue more\neducation would make economic sense. While Machine Learning models are great\nfor measuring statistical associations, they are generally unable to infer\ncausal effects.\n\nIt might be tempting to look at the coefficient of education on wage from our\nlast model (or any model for that matter) and conclude that it captures the\ntrue effect of a change in the standardized education variable on wages.\n\nUnfortunately there are likely unobserved confounding variables that either\ninflate or deflate that coefficient. A confounding variable is a variable that\ncauses both EDUCATION and WAGE. One example of such variable is ability.\nPresumably, more able people are more likely to pursue education while at the\nsame time being more likely to earn a higher hourly wage at any level of\neducation. In this case, ability induces a positive [Omitted Variable Bias](https://en.wikipedia.org/wiki/Omitted-variable_bias) (OVB) on the EDUCATION\ncoefficient, thereby exaggerating the effect of education on wages.\n\nSee the `sphx_glr_auto_examples_inspection_plot_causal_interpretation.py`\nfor a simulated case of ability OVB.\n\n## Lessons learned\n\n* Coefficients must be scaled to the same unit of measure to retrieve\n feature importance. Scaling them with the standard-deviation of the\n feature is a useful proxy.\n* Coefficients in multivariate linear models represent the dependency\n between a given feature and the target, **conditional** on the other\n features.\n* Correlated features induce instabilities in the coefficients of linear\n models and their effects cannot be well teased apart.\n* Different linear models respond differently to feature correlation and\n coefficients could significantly vary from one another.\n* Inspecting coefficients across the folds of a cross-validation loop\n gives an idea of their stability.\n* Coefficients are unlikely to have any causal meaning. They tend\n to be biased by unobserved confounders.\n* Inspection tools may not necessarily provide insights on the true\n data generating process.\n\n"
685+
"We observe that the AGE and EXPERIENCE coefficients are varying a lot\ndepending of the fold.\n\n## Wrong causal interpretation\n\nPolicy makers might want to know the effect of education on wage to assess\nwhether or not a certain policy designed to entice people to pursue more\neducation would make economic sense. While Machine Learning models are great\nfor measuring statistical associations, they are generally unable to infer\ncausal effects.\n\nIt might be tempting to look at the coefficient of education on wage from our\nlast model (or any model for that matter) and conclude that it captures the\ntrue effect of a change in the standardized education variable on wages.\n\nUnfortunately there are likely unobserved confounding variables that either\ninflate or deflate that coefficient. A confounding variable is a variable that\ncauses both EDUCATION and WAGE. One example of such variable is ability.\nPresumably, more able people are more likely to pursue education while at the\nsame time being more likely to earn a higher hourly wage at any level of\neducation. In this case, ability induces a positive [Omitted Variable Bias](https://en.wikipedia.org/wiki/Omitted-variable_bias) (OVB) on the EDUCATION\ncoefficient, thereby exaggerating the effect of education on wages.\n\nSee the `sphx_glr_auto_examples_inspection_plot_causal_interpretation.py`\nfor a simulated case of ability OVB.\n\n## Lessons learned\n\n* Coefficients must be scaled to the same unit of measure to retrieve\n feature importance. Scaling them with the standard-deviation of the\n feature is a useful proxy.\n* Interpreting causality is difficult when there are confounding effects. If\n the relationship between two variables is also affected by something\n unobserved, we should be careful when making conclusions about causality.\n* Coefficients in multivariate linear models represent the dependency\n between a given feature and the target, **conditional** on the other\n features.\n* Correlated features induce instabilities in the coefficients of linear\n models and their effects cannot be well teased apart.\n* Different linear models respond differently to feature correlation and\n coefficients could significantly vary from one another.\n* Inspecting coefficients across the folds of a cross-validation loop\n gives an idea of their stability.\n* Coefficients are unlikely to have any causal meaning. They tend\n to be biased by unobserved confounders.\n* Inspection tools may not necessarily provide insights on the true\n data generating process.\n\n"
686686
]
687687
}
688688
],

0 commit comments

Comments
 (0)