Skip to content

Commit 8db7cc4

Browse files
authored
Enforce positivity on sigmas in SEM model (#783)
* enforce positivity on sigmas in SEM model Signed-off-by: Nathaniel <[email protected]> * remove redundant advi argument Signed-off-by: Nathaniel <[email protected]> * updating with chris's feedback Signed-off-by: Nathaniel <[email protected]> --------- Signed-off-by: Nathaniel <[email protected]>
1 parent acc0594 commit 8db7cc4

File tree

2 files changed

+1869
-1956
lines changed

2 files changed

+1869
-1956
lines changed

examples/case_studies/CFA_SEM.ipynb

+1,812-1,924
Large diffs are not rendered by default.

examples/case_studies/CFA_SEM.myst.md

+57-32
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ kernelspec:
2323

2424
> "Evidently, the notions of relevance and dependence are far more basic to human reasoning than the numerical values attached to probability judgments...the language used for representing probabilistic information should allow assertions about dependency relationships to be expressed qualitatively, directly, and explicitly" - Pearl in _Probabilistic Reasoning in Intelligent Systems_ {cite:t}`pearl1985prob`
2525
26-
Measurement data is psychometrics is often derived from a strategically constructed survey aimed at a particular target phenomena. Some intuited, but not yet measured, concept that arguably plays a determining role in human action, motivation or sentiment. The relative “fuzziness” of the subject matter in psychometrics has had a catalyzing effect on the methodological rigour sought in the science.
26+
Measurement data in psychometrics is often derived from a strategically constructed survey aimed at a particular target phenomena. Some intuited, but not yet measured, concept that arguably plays a determining role in human action, motivation or sentiment. The relative “fuzziness” of the subject matter in psychometrics has had a catalyzing effect on the methodological rigour sought in the science.
2727

2828
Survey designs are agonized over for correct tone and rhythm of sentence structure. Measurement scales are doubly checked for reliability and correctness. The literature is consulted and questions are refined. Analysis steps are justified and tested under a wealth of modelling routines. Model architectures are defined and refined to better express the hypothesized structures in the data-generating process. We will see how such due diligence leads to powerful and expressive models that grant us tractability on thorny questions of human affect.
2929

@@ -96,7 +96,7 @@ A measurement model is a key component within the more general structural equati
9696

9797
We'll start by fitting a "simple" CFA model in `PyMC` to demonstrate how the pieces fit together, we'll then expand our focus. Here we ignore the majority of our indicator variables and focus on the idea that there are two latent constructs: (1) Social Self-efficacy and (2) Life Satisfaction.
9898

99-
We're aiming to articulate a mathematical structure where our indicator variables $x_{ij}$ are determined by a latent factor $\text{Ksi}_{j}$ through an estimated factor loading $\lambda_{ij}$. Functionally we have a set of equations with error terms $\psi_i$ for each individual.
99+
We're aiming to articulate a mathematical structure where our indicator variables $x_{ij}$ are determined by a latent factor $\text{Ksi}_{j}$ through an estimated factor loading $\lambda_{ij}$. We keep close to the notation used in Levy and Mislevy's Bayesian Psychometric Modelling. Functionally we have a set of equations with error terms $\psi_i$ for each individual.
100100

101101
$$ x_{1} = \tau_{1} + \lambda_{11}\text{Ksi}_{1} + \psi_{1} \\
102102
x_{2} = \tau_{2} + \lambda_{21}\text{Ksi}_{1} + \psi_{2} \\
@@ -108,7 +108,7 @@ or more compactly
108108

109109
$$ \mathbf{x} = \tau + \Lambda\text{Ksi} + \Psi $$
110110

111-
The goal is to articulate the relationship between the different factors in terms of the covariances between these latent terms and estimate the relationships each latent factor has with the manifest indicator variables. At a high level, we're saying the joint distribution of the observed data can be represented through conditionalisation in the following schema.
111+
We have greek letters to highlight traditional model parameters and use $\text{Ksi}$ to highlight latent constructs as a distinct kind of parameter. The goal is to articulate the relationship between the different factors in terms of the covariances between these latent terms and estimate the relationships each latent factor has with the manifest indicator variables. At a high level, we're saying the joint distribution of the observed data can be represented through conditionalisation in the following schema.
112112

113113
$$p(\mathbf{x_{i}}^{T}.....\mathbf{x_{q}}^{T} | \text{Ksi}, \Psi, \tau, \Lambda) \sim Normal(\tau + \Lambda\cdot \text{Ksi}, \Psi) $$
114114

@@ -190,7 +190,7 @@ We can now see how the covariance structure among the latent constructs is integ
190190
az.summary(idata, var_names=["lambdas1", "lambdas2"])
191191
```
192192

193-
These factor loadings are generally important to interpret in terms of construct validity. Because we've specified one of the indicator variables to be fixed at 1, the other indicators which load on that factor should have a loading coefficient in broadly the same scale as the fixed point indicator that defines the construct scale. We're looking for consistency among the loadings to assess whether the indicators are reliable measures of the construct i.e. if the indicator loadings deviates too far from unit 1 then there is an argument to be made that these indicators don't belong in the same factor construct.
193+
These factor loadings are generally important to interpret in terms of construct validity. Because we've specified one of the indicator variables to be fixed at 1, the other indicators which load on that factor should have a loading coefficient in broadly the same scale as the fixed point indicator that defines the construct scale. We're looking for consistency among the loadings to assess whether the indicators are reliable measures of the construct i.e. if the indicator loadings deviates too far from unit 1 then there is an argument to be made that these indicators don't belong in the same factor construct. The closer the factor loading parameters within a construct are to 1 the better.
194194

195195
```{code-cell} ipython3
196196
idata
@@ -202,6 +202,18 @@ Let's plot the trace diagnostics to validate the sampler has converged well to t
202202
az.plot_trace(idata, var_names=["lambdas1", "lambdas2", "tau", "Psi", "ksi"]);
203203
```
204204

205+
We also examine the energy plot to assess sampling diagnostics. This looks quite healthy.
206+
207+
```{code-cell} ipython3
208+
fig, axs = plt.subplots(1, 2, figsize=(20, 7))
209+
axs = axs.flatten()
210+
az.plot_energy(idata, ax=axs[0])
211+
az.plot_forest(idata, var_names=["lambdas1", "lambdas2"], combined=True, ax=axs[1])
212+
axs[1].set_title("Factor Loading Estimates")
213+
axs[0].set_title("Energy Plot")
214+
axs[1].axvline(1, color="black");
215+
```
216+
205217
### Sampling the Latent Constructs
206218

207219
One thing to highlight in particular about the Bayesian manner of fitting CFA and SEM models is that we now have access to the posterior distribution of the latent quantities. These samples can offer insight into particular individuals in our survey that is harder to glean from the multivariate presentation of the manifest variables.
@@ -282,7 +294,7 @@ Which shows a relatively sound recovery of the observed data.
282294

283295
### Intermediate Cross-Loading Model
284296

285-
The idea of a measurement model is maybe a little opaque when we only see models that fit well. Instead we want to briefly show how an in-apt measurement model gets reflected in the estimated parameters for the factor loadings. Here we specify a measurement model which attempts to couple the `se_social` and `sup_parents` indicators and bundle them into the same factor.
297+
The idea of a measurement model is maybe a little opaque when we only see models that fit well. Instead we want to briefly show how an inappropriate measurement model gets reflected in the estimated parameters for the factor loadings. Here we specify a measurement model which attempts to couple the `se_social` and `sup_parents` indicators and bundle them into the same factor.
286298

287299
```{code-cell} ipython3
288300
coords = {
@@ -387,7 +399,10 @@ This is similarly reflected in the diagnostic energy plots here too.
387399
fig, axs = plt.subplots(1, 2, figsize=(20, 9))
388400
axs = axs.flatten()
389401
az.plot_energy(idata, ax=axs[0])
390-
az.plot_forest(idata, var_names=["lambdas1"], combined=True, ax=axs[1]);
402+
az.plot_forest(idata, var_names=["lambdas1", "lambdas2"], combined=True, ax=axs[1])
403+
axs[1].axvline(1, color="black")
404+
axs[1].set_title("Factor Loadings Estimates")
405+
axs[0].set_title("Energy Plot");
391406
```
392407

393408
This hints at a variety of measurement model misspecification and should force us back to the drawing board. An appropriate measurement model maps the indicator variables to a consistently defined latent construct that plausibly reflects aspects of the realised indicator metrics.
@@ -741,7 +756,7 @@ def make_indirect_sem(priors):
741756
obs_idx = list(range(len(df)))
742757
with pm.Model(coords=coords) as model:
743758
744-
Psi = pm.InverseGamma("Psi", 5, 10, dims="indicators")
759+
Psi = pm.Gamma("Psi", priors["gamma"], 0.5, dims="indicators")
745760
lambdas_ = pm.Normal(
746761
"lambdas_1", priors["lambda"][0], priors["lambda"][1], dims=("indicators_1")
747762
)
@@ -772,9 +787,8 @@ def make_indirect_sem(priors):
772787
lambdas_5 = pm.Deterministic(
773788
"lambdas5", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_5")
774789
)
775-
tau = pm.Normal("tau", 3, 0.5, dims="indicators")
776790
kappa = 0
777-
sd_dist = pm.Exponential.dist(1.0, shape=2)
791+
sd_dist = pm.Gamma.dist(priors["gamma"], 0.5, shape=2)
778792
chol, _, _ = pm.LKJCholeskyCov(
779793
"chol_cov", n=2, eta=priors["eta"], sd_dist=sd_dist, compute_corr=True
780794
)
@@ -784,25 +798,25 @@ def make_indirect_sem(priors):
784798
# Regression Components
785799
beta_r = pm.Normal("beta_r", 0, priors["beta_r"], dims="latent_regression")
786800
beta_r2 = pm.Normal("beta_r2", 0, priors["beta_r2"], dims="regression")
787-
sd_dist1 = pm.Exponential.dist(1.0, shape=2)
801+
sd_dist1 = pm.Gamma.dist(1, 0.5, shape=2)
788802
resid_chol, _, _ = pm.LKJCholeskyCov(
789803
"resid_chol", n=2, eta=3, sd_dist=sd_dist1, compute_corr=True
790804
)
791-
_ = pm.Deterministic("resid_cov", chol.dot(chol.T))
792-
sigmas_resid = pm.MvNormal("sigmas_resid", kappa, chol=resid_chol)
805+
_ = pm.Deterministic("resid_cov", resid_chol.dot(resid_chol.T))
806+
sigmas_resid = pm.MvNormal("sigmas_resid", 1, chol=resid_chol)
807+
sigma_regr = pm.HalfNormal("sigma_regr", 1)
793808
794809
# SE_ACAD ~ SUP_FRIENDS + SUP_PARENTS
795810
regression_se_acad = pm.Normal(
796811
"regr_se_acad",
797812
beta_r[0] * ksi[obs_idx, 0] + beta_r[1] * ksi[obs_idx, 1],
798-
sigmas_resid[0],
813+
pm.math.abs(sigmas_resid[0]), # ensuring positivity
799814
)
800815
# SE_SOCIAL ~ SUP_FRIENDS + SUP_PARENTS
801-
802816
regression_se_social = pm.Normal(
803817
"regr_se_social",
804818
beta_r[2] * ksi[obs_idx, 0] + beta_r[3] * ksi[obs_idx, 1],
805-
sigmas_resid[1],
819+
pm.math.abs(sigmas_resid[1]), # ensuring positivity
806820
)
807821
808822
# LS ~ SE_ACAD + SE_SOCIAL + SUP_FRIEND + SUP_PARENTS
@@ -812,9 +826,10 @@ def make_indirect_sem(priors):
812826
+ beta_r2[1] * regression_se_social
813827
+ beta_r2[2] * ksi[obs_idx, 0]
814828
+ beta_r2[3] * ksi[obs_idx, 1],
815-
1,
829+
sigma_regr,
816830
)
817831
832+
tau = pm.Normal("tau", 3, 0.5, dims="indicators")
818833
m0 = tau[0] + regression_se_acad * lambdas_1[0]
819834
m1 = tau[1] + regression_se_acad * lambdas_1[1]
820835
m2 = tau[2] + regression_se_acad * lambdas_1[2]
@@ -834,30 +849,36 @@ def make_indirect_sem(priors):
834849
mu = pm.Deterministic(
835850
"mu", pm.math.stack([m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14]).T
836851
)
852+
# independent_residuals_cov = pm.Deterministic('cov_residuals', pt.diag(Psi))
853+
# _ = pm.MvNormal('likelihood', mu, independent_residuals_cov, observed=df[drivers].values)
837854
_ = pm.Normal("likelihood", mu, Psi, observed=df[drivers].values)
838855
839-
idata = pm.sample(
840-
10_000,
841-
chains=4,
842-
nuts_sampler="numpyro",
843-
target_accept=0.99,
844-
tune=2000,
845-
idata_kwargs={"log_likelihood": True},
846-
random_seed=110,
856+
idata = pm.sample_prior_predictive()
857+
idata.extend(
858+
pm.sample(
859+
2000,
860+
chains=4,
861+
nuts_sampler="numpyro",
862+
target_accept=0.99,
863+
tune=1000,
864+
idata_kwargs={"log_likelihood": True},
865+
nuts_sampler_kwargs={"chain_method": "vectorized"},
866+
random_seed=110,
867+
)
847868
)
848869
idata.extend(pm.sample_posterior_predictive(idata))
849870
850871
return model, idata
851872
852873
853874
model_sem0, idata_sem0 = make_indirect_sem(
854-
priors={"eta": 2, "lambda": [1, 1], "beta_r": 0.1, "beta_r2": 0.1}
875+
priors={"eta": 1, "lambda": [1, 2], "beta_r": 3, "beta_r2": 3, "gamma": 1},
855876
)
856877
model_sem1, idata_sem1 = make_indirect_sem(
857-
priors={"eta": 2, "lambda": [1, 1], "beta_r": 0.2, "beta_r2": 0.2}
878+
priors={"eta": 3, "lambda": [1, 2], "beta_r": 2, "beta_r2": 2, "gamma": 2}
858879
)
859880
model_sem2, idata_sem2 = make_indirect_sem(
860-
priors={"eta": 2, "lambda": [1, 1], "beta_r": 0.5, "beta_r2": 0.5}
881+
priors={"eta": 1, "lambda": [1, 2], "beta_r": 10, "beta_r2": 10, "gamma": 3}
861882
)
862883
```
863884

@@ -880,11 +901,15 @@ Residual covariances
880901
SE_Academic ~~ SE_Social
881902
```
882903

904+
+++
905+
906+
Often you will see SEM models with a multivariate normal likelihood term, but here we've specified independent Normal distributions as the model doesn't call for a richly structured covariance matrix on the residuals terms. More complicated models are possible, but it's always a question of how much structure is needed?
907+
883908
```{code-cell} ipython3
884909
pm.model_to_graphviz(model_sem0)
885910
```
886911

887-
It's worth pausing to examine the nature of the dependencies sketched in this diagram. We can see here how we've replaced the simpler measurement model structure and added three regression functions that replace the draws from the multivariate normal $Ksi$. In other words we've expressed a dependency as a series of regressions all within the one model. Next we'll see how the parameter estimates change across our prior specifications for the model. Notice the relative stability of the factor loadings compared to the regression coefficients.
912+
It's worth pausing to examine the nature of the dependencies sketched in this diagram. We can see here how we've replaced the simpler measurement model structure and added three regression functions that replace the draws from the multivariate normal $Ksi$. In other words we've expressed a dependency as a series of regressions all within the one model. Next we'll see how the parameter estimates change across our prior specifications for the model. Notice the relative stability of the factor loadings and regression coefficients indicating a robustness in these parameter estimates.
888913

889914
```{code-cell} ipython3
890915
fig, ax = plt.subplots(figsize=(15, 15))
@@ -949,7 +974,7 @@ Similar diagnostic results hold for the other models. We now continue to assess
949974

950975
### Indirect and Direct Effects
951976

952-
We now turn the additional regression structures that we've encoded into the model graph. First we pull out the regression coefficients
977+
We now turn to the additional regression structures that we've encoded into the model graph. First we pull out the regression coefficients
953978

954979
```{code-cell} ipython3
955980
fig, axs = plt.subplots(1, 2, figsize=(20, 8))
@@ -961,7 +986,7 @@ axs[1].axvline(0, color="red", label="zero-effect")
961986
axs[1].legend();
962987
```
963988

964-
The coefficients indicate a smaller relative weight accorded to the effects of peer support than we see with parental support. This is borne out as we trace out the cumulative causal effects (direct and indirect) through our DAG or chain of regression coefficients.
989+
The coefficients indicate a strong effect of social self-efficacy on life satisfaction, and smaller relative weight accorded to the effects of peer support than we see with parental support. This is borne out as we trace out the cumulative causal effects (direct and indirect) through our DAG or chain of regression coefficients.
965990

966991
```{code-cell} ipython3
967992
def calculate_effects(idata, var="SUP_P"):
@@ -996,7 +1021,7 @@ def calculate_effects(idata, var="SUP_P"):
9961021
)
9971022
```
9981023

999-
Importantly we see here the effect of priors on the implied relationships. As we pull our priors closer to 0 the total effects of parental support is pulled downwards away from .5, while the peer support remains relatively stable ~.10. However it remains clear that the impact of parental support dwarfs the effects due to peer support.
1024+
It remains clear that the impact of parental support dwarfs the effects due to peer support.
10001025

10011026
```{code-cell} ipython3
10021027
summary_p = pd.concat(
@@ -1007,7 +1032,7 @@ summary_p.index = ["SEM0", "SEM1", "SEM2"]
10071032
summary_p
10081033
```
10091034

1010-
The sensitivity of the estimated impact due to parental support varies strongly as a function of our prior on the variances. Here is a substantive example of the role of theory choice in model design. How strongly should believe that parental and peer effects have 0 effect on life-satisfaction? I'm inclined to believe we're too conservative if we try and shrink the effect toward zero and should prefer a less conservative model. However, the example here is not to dispute the issue, but demonstrate the importance of sensitivity checks.
1035+
The sensitivity of the estimated impact due to parental support does not vary strongly as a function of our prior on the variances. However, the example here is not to dispute the issue at hand, but highlight the importance of sensitivity checks. We will not always find consistency of parameter identification across model specifications.
10111036

10121037
```{code-cell} ipython3
10131038
summary_f = pd.concat(

0 commit comments

Comments
 (0)