Help Needed with LatentClassConditionalLogit Output: Missing MNL Tables for Each Group #194

asghar13 · 2024-12-01T17:01:47Z

asghar13
Dec 1, 2024

Hello everyone,

I apologize for taking your time with such a detailed question, but I would greatly appreciate your assistance.

I have a few questions regarding the output obtained after running the LatentClassConditionalLogit model. You can find my code at the end of this message. Specifically, I am looking for the following output:

My dataset has four alternatives, and ultimately, I would like to use "heat pump" as my reference category in an MNL model. Additionally, I consider three to be the optimal number of classes for my latent class analysis. Therefore, I expect to obtain nine tables in the output: three tables for the remaining alternatives across three classes (3 x 3 = 9). However, my file only contains three tables, each corresponding to a different class. For reference, I have attached the file named latent_class_conditional_logit_report.xlsx.

I would greatly appreciate any assistance in solving this issue or any documentation available regarding the functions that could help me better understand how to achieve the desired output.

Thank you very much in advance for your support.

''

code for redeading the data ,

''

Define the columns to retain for the analysis

kept_columns = [
"ID", "Gender", "Age", "Education", "Occup", # Demographic and individual information
"Gas_Capital", "Gas_Annual", "Gas_Emission", "Gas_Work", # Attributes of "Gas"
"Electric_Capital", "Electric_Annual", "Electric_Emission", "Electric_Work", # Attributes of "Electric"
"Heatpump_Capital", "Heatpump_Annual", "Heatpump_Emission", "Heatpump_Work", # Attributes of "Heatpump"
"Solid_Capital", "Solid_Annual", "Solid_Emission", "Solid_Work", # Attributes of "Solid fuel"
"Gas_Av", "Electric_Av", "Heatpump_Av", "Solid_Av", # Availability indicators
"Choice", "Card" # User choices and choice cards
]

Filter the dataset to include only the defined columns

crheating_df = data[kept_columns]

Display a preview of the filtered dataset

print("Filtered dataset:")
print(crheating_df.head())

Display the unique values in the 'Choice' column (to verify mapping)

print("Unique values in 'Choice' column:")
print(crheating_df["Choice"].unique())

Map categorical choices to integers for modeling

choice_mapping = {
"Gas": 0,
"Electric": 1,
"Heatpump": 2,
"Solid": 3
}
crheating_df["Choice"] = crheating_df["Choice"].map(choice_mapping)

Ensure the 'Choice' column is of integer type

crheating_df["Choice"] = crheating_df["Choice"].astype(int)

Convert the dataset into the required format for choice modeling

from choice_learn.data import ChoiceDataset

dataset = ChoiceDataset.from_single_wide_df(
df=crheating_df, # Filtered dataset
items_id=["Gas", "Electric", "Heatpump", "Solid"], # Names of the alternatives
choices_column="Choice", # Column representing user choices
choice_format="items_index", # Encoding format for choices
shared_features_columns=["Gender", "Age", "Education", "Occup"], # Shared features across choices
items_features_suffixes=["Capital", "Annual", "Emission", "Work"], # Attributes for each item
available_items_suffix="Av", # Suffix for availability indicators
delimiter="_", # Delimiter used in column names
)

''

code for LatentClassConditionalLogit:
''

Ensure all numerical columns are explicitly converted to float32

numerical_columns = [
"Gender", "Age", "Education", "Occup", # Shared features
"Gas_Capital", "Gas_Annual", "Gas_Emission", "Gas_Work", # Item-specific features for Gas
"Electric_Capital", "Electric_Annual", "Electric_Emission", "Electric_Work", # Electric
"Heatpump_Capital", "Heatpump_Annual", "Heatpump_Emission", "Heatpump_Work", # Heatpump
"Solid_Capital", "Solid_Annual", "Solid_Emission", "Solid_Work" # Solid fuel
]

Convert these columns to float32

crheating_df[numerical_columns] = crheating_df[numerical_columns].astype("float32")

Convert Choice column to integer (if not already)

crheating_df["Choice"] = crheating_df["Choice"].astype("int32")

Recreate the ChoiceDataset with correctly typed columns

dataset = ChoiceDataset.from_single_wide_df(
df=crheating_df,
items_id=["Gas", "Electric", "Heatpump", "Solid"], # Names of the alternatives
choices_column="Choice", # Column representing user choices
choice_format="items_index", # Encoding format for choices
shared_features_columns=["Gender", "Age", "Education", "Occup"], # Shared features across choices
items_features_suffixes=["Capital", "Annual", "Emission", "Work"], # Attributes for each item
available_items_suffix="Av", # Suffix for availability indicators
delimiter="_", # Delimiter used in column names
)

Define and fit the model

from choice_learn.models.latent_class_mnl import LatentClassConditionalLogit

Initialize the model

lc_model_2 = LatentClassConditionalLogit(
n_latent_classes=3, # Number of latent classes
fit_method="mle", # Maximum Likelihood Estimation
optimizer="lbfgs", # Optimizer
epochs=1000, # Number of epochs
lbfgs_tolerance=1e-20 # Tolerance for convergence
)

Add shared coefficients for item-specific features

lc_model_2.add_shared_coefficient(coefficient_name="Capital", feature_name="Capital", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Annual", feature_name="Annual", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Emission", feature_name="Emission", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Work", feature_name="Work", items_indexes=[0, 1, 2, 3])

Add shared coefficients for demographic/shared features

lc_model_2.add_shared_coefficient(coefficient_name="Gender", feature_name="Gender", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Age", feature_name="Age", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Education", feature_name="Education", items_indexes=[0, 1, 2, 3])
lc_model_2.add_shared_coefficient(coefficient_name="Occup", feature_name="Occup", items_indexes=[0, 1, 2, 3])

Fit the model to the dataset

hist2 = lc_model_2.fit(dataset, verbose=1)

Print Latent Class Model results

print("Latent Class Model weights:")
print("Classes Logits:", lc_model_2.latent_logits)
for i in range(3): # Assuming 3 latent classes
print("\n")
print(f"Model Nb {i}, weights:", lc_model_2.models[i].trainable_weights)

Evaluate the model's Negative Log-Likelihood (NLL)

nll_2 = lc_model_2.evaluate(dataset) * len(dataset)
print(f"Negative Log-Likelihood: {nll_2}")

Generate structured output as a DataFrame for further analysis

report_data = []
for class_idx, model in enumerate(lc_model_2.models):
class_weights = model.trainable_weights
for weight, feature_name in zip(class_weights, ["Capital", "Annual", "Emission", "Work", "Gender", "Age", "Education", "Occup"]):
coef_estimation = weight.numpy().flatten()[0]
report_data.append({
"Latent Class": class_idx + 1,
"Feature": feature_name,
"Coefficient": coef_estimation
})

Convert the results into a DataFrame

report_df = pd.DataFrame(report_data)

Save the report to a file

output_path = r"C:\Users\mohamma11\Downloads\latent_class_conditional_logit_report.csv"
report_df.to_csv(output_path, index=False)
print(f"Report saved to {output_path}")

''

The code for generating the output after running the LatentClassConditionalLogit model should provide MNL tables for each group.

''

import pandas as pd
import numpy as np
from google.colab import files

def generate_latent_class_report(model, choice_dataset, output_file):
"""
Generate a detailed report for Latent Class Conditional Logit Model.

Parameters:
- model: Trained LatentClassConditionalLogit instance.
- choice_dataset: ChoiceDataset used for training the model.
- output_file: Path to save the Excel file.
"""
try:
    # Extract latent class probabilities
    latent_class_probs = model.get_latent_classes_weights().numpy()

    # Initialize a writer for Excel output
    with pd.ExcelWriter(output_file) as writer:
        # Write an initial dummy sheet to ensure a valid workbook
        pd.DataFrame({"Status": ["Report Generation Started"]}).to_excel(writer, sheet_name="Status", index=False)

        # Summary Data
        n_classes = model.n_latent_classes
        summary_data = []

        # Iterate through each latent class
        for class_idx in range(n_classes):
            class_model = model.models[class_idx]

            # Extract coefficients (trainable weights)
            coefficients = [w.numpy().flatten()[0] for w in class_model.trainable_weights]
            features = ["Capital", "Annual", "Emission", "Work", "Gender", "Age", "Education", "Occup"]

            # Combine class-specific data
            class_df = pd.DataFrame({
                "Feature": features,
                "Coefficient": coefficients,
                "Latent Class Probability": [latent_class_probs[class_idx]] * len(features),
            })

            # Save to a separate sheet
            class_df.to_excel(writer, sheet_name=f"Class_{class_idx+1}", index=False)

            # Add summary data
            summary_data.append({
                "Latent Class": class_idx + 1,
                "Probability": latent_class_probs[class_idx]
            })

        # Summary Metrics
        nll = model.evaluate(choice_dataset) * len(choice_dataset)  # Negative Log-Likelihood
        k = sum(len(m.trainable_weights) for m in model.models)  # Total parameters
        n = len(choice_dataset)  # Total observations
        aic = 2 * k - 2 * (-nll)
        bic = k * np.log(n) - 2 * (-nll)

        # Add AIC and BIC to the first summary row
        if summary_data:
            summary_data[0]["AIC"] = aic
            summary_data[0]["BIC"] = bic

        # Save summary
        summary_df = pd.DataFrame(summary_data)
        summary_df.to_excel(writer, sheet_name="Summary", index=False)

    print(f"Report saved to {output_file}")

    # Provide a download link for the file
    files.download(output_file)

except Exception as e:
    print(f"An error occurred: {e}")
    raise

Call the function with your trained model

output_path = "latent_class_conditional_logit_report.xlsx" # Save in Colab's file system
generate_latent_class_report(lc_model_2, dataset, output_path)

''

asghar13 · 2024-12-01T17:02:10Z

asghar13
Dec 1, 2024
Author

latent_class_conditional_logit_report.xlsx

0 replies

VincentAuriau · 2024-12-02T11:04:36Z

VincentAuriau
Dec 2, 2024
Maintainer

Hello,
To be sure I have well understood you, what you want is for each class the probability of each alternative (leading to 3 x 3 probabilities) ?
Using model.get_latent_classes_weights() will return the class probabilities and not the alternative probabilities. If you want the alternative probabilities you can use the predict_probas(choice_dataset) endpoint.

you can use it with the latent class model to get the final probability i.e. $\mathbb{P}(i) = \sum_{l \in classes} \mathbb{P}(l) \cdot \mathbb{P}(i | l)$
you can use it on the MNL models that you retrive with latent_model.models and then you get the $\mathbb{P}(i | l)$ probability

Let me know if it helps !

6 replies

asghar13 Dec 2, 2024
Author

Heatpump is reference category, in fact !

VincentAuriau Dec 2, 2024
Maintainer

What you name "outputs" are the weights / coefficient values of each model ?

In this case what you can do is something like:

for mnl in model.models:
    report = mnl.compute_report(dataset)

report should be a pandas DataFrame with coefficient names, values and estimated standard errors and p values.
The name of the coefficient should be features name + potentially an index that corresponds to the alternative index (same as in choices) if you have an alternative-wise coefficient.

VincentAuriau Dec 2, 2024
Maintainer

Let me know if it is (or not) indeed what you need !

asghar13 Dec 3, 2024
Author

many many thanks,
it works for me.

VincentAuriau Dec 3, 2024
Maintainer

Great !
I have opened the issue #196 in order to create a cleaner endpoint and will keep this discussion open in the mean time.
I will be closing the other ones, feel free to use this one or open new ones if you have any other question or issue =)

VincentAuriau · 2024-12-23T09:31:25Z

VincentAuriau
Dec 23, 2024
Maintainer

Hello,
I have integrated your suggestions in the /main branch.
Thanks again for reaching out.
If the package has helped you, consider citing us and starring the repository :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help Needed with LatentClassConditionalLogit Output: Missing MNL Tables for Each Group #194

{{title}}

Replies: 3 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Help Needed with LatentClassConditionalLogit Output: Missing MNL Tables for Each Group #194

asghar13 Dec 1, 2024

Define the columns to retain for the analysis

Filter the dataset to include only the defined columns

Display a preview of the filtered dataset

Display the unique values in the 'Choice' column (to verify mapping)

Map categorical choices to integers for modeling

Ensure the 'Choice' column is of integer type

Convert the dataset into the required format for choice modeling

Ensure all numerical columns are explicitly converted to float32

Convert these columns to float32

Convert Choice column to integer (if not already)

Recreate the ChoiceDataset with correctly typed columns

Define and fit the model

Initialize the model

Add shared coefficients for item-specific features

Add shared coefficients for demographic/shared features

Fit the model to the dataset

Print Latent Class Model results

Evaluate the model's Negative Log-Likelihood (NLL)

Generate structured output as a DataFrame for further analysis

Convert the results into a DataFrame

Save the report to a file

Call the function with your trained model

Replies: 3 comments · 6 replies

asghar13 Dec 1, 2024 Author

VincentAuriau Dec 2, 2024 Maintainer

asghar13 Dec 2, 2024 Author

VincentAuriau Dec 2, 2024 Maintainer

VincentAuriau Dec 2, 2024 Maintainer

asghar13 Dec 3, 2024 Author

VincentAuriau Dec 3, 2024 Maintainer

VincentAuriau Dec 23, 2024 Maintainer

asghar13
Dec 1, 2024

Replies: 3 comments 6 replies

asghar13
Dec 1, 2024
Author

VincentAuriau
Dec 2, 2024
Maintainer

asghar13 Dec 2, 2024
Author

VincentAuriau Dec 2, 2024
Maintainer

VincentAuriau Dec 2, 2024
Maintainer

asghar13 Dec 3, 2024
Author

VincentAuriau Dec 3, 2024
Maintainer

VincentAuriau
Dec 23, 2024
Maintainer