Skip to content

Source code of the paper: Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs

ku-nlp/Evaluate-Alignment-HVSB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs (EMNLP2025: Findings)

align logo

Catelogue

Abstract

Large language models (LLMs) can lead to undesired consequences when misaligned with human values, especially in scenarios involving complex and sensitive social biases. Previous studies have revealed the misalignment of LLMs with human values using expert-designed or agent-based emulated bias scenarios. However, it remains unclear whether the alignment of LLMs with human values differs across different types of scenarios (e.g., scenarios containing negative vs. non-negative questions). In this study, we investigate the alignment of LLMs with human values regarding social biases (HVSB) in different types of bias scenarios. Through extensive analysis of 12 LLMs from four model families and four datasets, we demonstrate that LLMs with large model parameter scales do not necessarily have lower misalignment rate and attack success rate. Moreover, LLMs show a certain degree of alignment preference for specific types of scenarios and the LLMs from the same model family tend to have higher judgment consistency. In addition, we study the understanding capacity of LLMs with their explanations of HVSB. We find no significant differences in the understanding of HVSB across LLMs. We also find LLMs prefer their own generated explanations. Additionally, we endow smaller language models (LMs) with the ability to explain HVSB. The generation results show that the explanations generated by the fine-tuned smaller LMs are more readable, but have a relatively lower model agreeability.

🛠️ Step1: Prepare all scenarios

Our scenarios come from four sources: BBQ dataset (Q&A), BiasDPO dataset (conversation), SS, and CP datasets (emulated realistic scenarios). The scenarios are prepared using the following scripts. If you do not want to prepare all scenarios, you can skip this step and use the prepared scenarios in data/ folder (direct got to ⚖️ Step2: Judging).

Before running the scripts, please set the following parameters in preprocess/preprocess_utils.py:

  • Set bbq_dataset_path to the path of the BBQ dataset).
  • Set ss_dataset_path to the path of the SS dataset).
  • Set cp_dataset_path to the path of the CP dataset).

For BBQ dataset, please run the following script:

python preprocess/preprocess_bbq.py

For BiasDPO dataset, please run the following script:

python preprocess/preprocess_biasdpo.py

Then, please set openai_api_key in preprocess/preprocess_utils.py to your OpenAI API key.

For SS dataset, please run the following script:

python preprocess/preprocess_ss.py

For CP dataset, please run the following script:

python preprocess/preprocess_cp.py

More details of our dataset can be found in Dataset explanation.

⚖️ Step2: Judging

Before running the judging script, please set API keys according to API keys.

Using ChatGPT or DeepSeek API to judging BBQ/BiasDPO/SS/CP datasets, please run the following script:

python code/judgment/judge_api.py --dataset [bbq, biasdpo, ss, cp] --model_name_or_path [gpt-3.5-turbo, gpt-4o, gpt-4o-mini] --system_prompt ['', untargeted, targeted]

Using Qwen or Llama model family to judging BBQ/BiasDPO/SS/CP dataset, please run the following script:

python code/judgment/judge_llm.py --dataset [bbq, biasdpo, ss, cp] --model_name_or_path [Qwen, Llama] --system_prompt ['', untargeted, targeted]

🗣️ Step3: Explaining

Before running the explaining script, please using the following script to sample scenarios from the prepared scenarios in data/ folder:

python preprocess/gather_all_then_sample.py 

Then, please set the API keys according to API keys.

Using ChatGPT or DeepSeek API to explaining BBQ/BiasDPO/SS/CP datasets, please run the following script:

python code/explanation/explain_api.py \
--dataset [bbq, biasdpo, ss, cp] \
--model_name_or_path [gpt-3.5-turbo, gpt-4o, gpt-4o-mini] \
--file_path [data/bbq_tobe_exp_500/bbq.jsonl, data/biasdpo_tobe_exp_500/biasdpo.jsonl, data/ss_tobe_exp_500/ss.jsonl, data/cp_tobe_exp_500/cp.jsonl] 

Using Qwen or Llama model family to explaining BBQ/BiasDPO/SS/CP dataset, please run the following script:

python code/explanation/explain_llm.py \
--dataset [bbq, biasdpo, ss, cp] \
--model_name_or_path [Qwen, Llama] \
--file_path [data/bbq_tobe_exp_500/bbq.jsonl, data/biasdpo_tobe_exp_500/biasdpo.jsonl, data/ss_tobe_exp_500/ss.jsonl, data/cp_tobe_exp_500/cp.jsonl]

Model agreeability

Before running the model agreeability script, please set API keys according to API keys.

Using ChatGPT or DeepSeek API to explaining BBQ/BiasDPO/SS/CP datasets, please run the following script:

python code/agreeability/agree_api.py \
--dataset <DATASET_NAME> \
--model_name_or_path [gpt-3.5-turbo, gpt-4o, gpt-4o-mini] \ 
--file_path data/<DATASET_NAME>_explanation/<MODEL_NAME>/<DATASET_NAME>.jsonl 

Using Qwen or Llama model family to explaining BBQ/BiasDPO/SS/CP dataset, please run the following script:

python code/agreeability/agree_llm.py \
--dataset <DATASET_NAME> \
--model_name_or_path [Qwen, Llama] \ 
--file_path data/<DATASET_NAME>_explanation/<MODEL_NAME>/<DATASET_NAME>.jsonl 

📚 More explanations of our work and repository

Dataset explanation

We use function get_bbq_dataset_path(), get_ss_dataset_path(), and get_cp_dataset_path() to return the path of the BBQ dataset, SS dataset, and CP dataset path. So, please set the dataset paths in the file preprocess/preprocess_utils.py. For BBQ dataset, the path format is your/path/to/BBQ/{}.jsonl, for SS and CP datasets, the path is the SS dataset file path.

API keys

Please set the following parameters in code/judgment/judge_utils.py, code/explanation/explain_utils.py, and code/agreeability/agree_utils.py:

  • Set openai_api_key to your OpenAI API key.
  • Set deepseek_api_key to your DeepSeek API key.
  • Set hf_token to your Huggingface token.

Citation:

@article{liu2025llms,
  title={Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs},
  author={Liu, Yang and Chu, Chenhui},
  journal={arXiv preprint arXiv:2509.13869},
  year={2025}
}

About

Source code of the paper: Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages