Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs (EMNLP2025: Findings)

Catelogue

Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs
Catelogue
Abstract
🛠️ Step1: Prepare all scenarios
⚖️ Step2: Judging
🗣️ Step3: Explaining
- Model agreeability
📚 More explanations of our work and repository
- Dataset explanation
- API keys

Abstract

Large language models (LLMs) can lead to undesired consequences when misaligned with human values, especially in scenarios involving complex and sensitive social biases. Previous studies have revealed the misalignment of LLMs with human values using expert-designed or agent-based emulated bias scenarios. However, it remains unclear whether the alignment of LLMs with human values differs across different types of scenarios (e.g., scenarios containing negative vs. non-negative questions). In this study, we investigate the alignment of LLMs with human values regarding social biases (HVSB) in different types of bias scenarios. Through extensive analysis of 12 LLMs from four model families and four datasets, we demonstrate that LLMs with large model parameter scales do not necessarily have lower misalignment rate and attack success rate. Moreover, LLMs show a certain degree of alignment preference for specific types of scenarios and the LLMs from the same model family tend to have higher judgment consistency. In addition, we study the understanding capacity of LLMs with their explanations of HVSB. We find no significant differences in the understanding of HVSB across LLMs. We also find LLMs prefer their own generated explanations. Additionally, we endow smaller language models (LMs) with the ability to explain HVSB. The generation results show that the explanations generated by the fine-tuned smaller LMs are more readable, but have a relatively lower model agreeability.

🛠️ Step1: Prepare all scenarios

Our scenarios come from four sources: BBQ dataset (Q&A), BiasDPO dataset (conversation), SS, and CP datasets (emulated realistic scenarios). The scenarios are prepared using the following scripts. If you do not want to prepare all scenarios, you can skip this step and use the prepared scenarios in data/ folder (direct got to ⚖️ Step2: Judging).

Before running the scripts, please set the following parameters in preprocess/preprocess_utils.py:

Set bbq_dataset_path to the path of the BBQ dataset).
Set ss_dataset_path to the path of the SS dataset).
Set cp_dataset_path to the path of the CP dataset).

For BBQ dataset, please run the following script:

python preprocess/preprocess_bbq.py

For BiasDPO dataset, please run the following script:

python preprocess/preprocess_biasdpo.py

Then, please set openai_api_key in preprocess/preprocess_utils.py to your OpenAI API key.

For SS dataset, please run the following script:

python preprocess/preprocess_ss.py

For CP dataset, please run the following script:

python preprocess/preprocess_cp.py

More details of our dataset can be found in Dataset explanation.

⚖️ Step2: Judging

Before running the judging script, please set API keys according to API keys.

Using ChatGPT or DeepSeek API to judging BBQ/BiasDPO/SS/CP datasets, please run the following script:

python code/judgment/judge_api.py --dataset [bbq, biasdpo, ss, cp] --model_name_or_path [gpt-3.5-turbo, gpt-4o, gpt-4o-mini] --system_prompt ['', untargeted, targeted]

Using Qwen or Llama model family to judging BBQ/BiasDPO/SS/CP dataset, please run the following script:

python code/judgment/judge_llm.py --dataset [bbq, biasdpo, ss, cp] --model_name_or_path [Qwen, Llama] --system_prompt ['', untargeted, targeted]

🗣️ Step3: Explaining

Before running the explaining script, please using the following script to sample scenarios from the prepared scenarios in data/ folder:

python preprocess/gather_all_then_sample.py

Then, please set the API keys according to API keys.

Using ChatGPT or DeepSeek API to explaining BBQ/BiasDPO/SS/CP datasets, please run the following script:

python code/explanation/explain_api.py \
--dataset [bbq, biasdpo, ss, cp] \
--model_name_or_path [gpt-3.5-turbo, gpt-4o, gpt-4o-mini] \
--file_path [data/bbq_tobe_exp_500/bbq.jsonl, data/biasdpo_tobe_exp_500/biasdpo.jsonl, data/ss_tobe_exp_500/ss.jsonl, data/cp_tobe_exp_500/cp.jsonl]

Using Qwen or Llama model family to explaining BBQ/BiasDPO/SS/CP dataset, please run the following script:

python code/explanation/explain_llm.py \
--dataset [bbq, biasdpo, ss, cp] \
--model_name_or_path [Qwen, Llama] \
--file_path [data/bbq_tobe_exp_500/bbq.jsonl, data/biasdpo_tobe_exp_500/biasdpo.jsonl, data/ss_tobe_exp_500/ss.jsonl, data/cp_tobe_exp_500/cp.jsonl]

Model agreeability

Before running the model agreeability script, please set API keys according to API keys.

Using ChatGPT or DeepSeek API to explaining BBQ/BiasDPO/SS/CP datasets, please run the following script:

python code/agreeability/agree_api.py \
--dataset <DATASET_NAME> \
--model_name_or_path [gpt-3.5-turbo, gpt-4o, gpt-4o-mini] \ 
--file_path data/<DATASET_NAME>_explanation/<MODEL_NAME>/<DATASET_NAME>.jsonl

Using Qwen or Llama model family to explaining BBQ/BiasDPO/SS/CP dataset, please run the following script:

python code/agreeability/agree_llm.py \
--dataset <DATASET_NAME> \
--model_name_or_path [Qwen, Llama] \ 
--file_path data/<DATASET_NAME>_explanation/<MODEL_NAME>/<DATASET_NAME>.jsonl

📚 More explanations of our work and repository

Dataset explanation

We use function get_bbq_dataset_path(), get_ss_dataset_path(), and get_cp_dataset_path() to return the path of the BBQ dataset, SS dataset, and CP dataset path. So, please set the dataset paths in the file preprocess/preprocess_utils.py. For BBQ dataset, the path format is your/path/to/BBQ/{}.jsonl, for SS and CP datasets, the path is the SS dataset file path.

API keys

Please set the following parameters in code/judgment/judge_utils.py, code/explanation/explain_utils.py, and code/agreeability/agree_utils.py:

Set openai_api_key to your OpenAI API key.
Set deepseek_api_key to your DeepSeek API key.
Set hf_token to your Huggingface token.

Citation:

@article{liu2025llms,
  title={Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs},
  author={Liu, Yang and Chu, Chenhui},
  journal={arXiv preprint arXiv:2509.13869},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
code		code
preprocess		preprocess
source		source
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs (EMNLP2025: Findings)

Catelogue

Abstract

🛠️ Step1: Prepare all scenarios

⚖️ Step2: Judging

🗣️ Step3: Explaining

Model agreeability

📚 More explanations of our work and repository

Dataset explanation

API keys

Citation:

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

ku-nlp/Evaluate-Alignment-HVSB

Folders and files

Latest commit

History

Repository files navigation

Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs (EMNLP2025: Findings)

Catelogue

Abstract

🛠️ Step1: Prepare all scenarios

⚖️ Step2: Judging

🗣️ Step3: Explaining

Model agreeability

📚 More explanations of our work and repository

Dataset explanation

API keys

Citation:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages