Skip to content

taco-group/llm-brain-rot

 
 

Repository files navigation

LLMs Can Get "Brain Rot"!

This repository provides training, evaluation, and analysis pipelines of paper LLMs Can Get "Brain Rot"!.

Code License arXiv

teaser

Figure 1. Outline of our work: (i) Inspired by the concept of Brain Rot, we establish the hypothesis of LLM Brain Rot; (ii) We construct junk and control data from Twitter/X posts for intervention; (iii) We benchmark four different cognitive functions of the intervened LLMs; (iv) We analyze the results to identify the failure modes caused by the brain rot; and (v) Brain rot is persistent after various mitigation.

🔍 Key Highlights

  • Controlled Data Intervention: We simulate “brain rot” by continually pretraining LLMs on junk versus clean datasets under matched training conditions to isolate the impact of low-quality data.

  • Cognitive Degradation: Junk exposure leads to significant declines in reasoning, long-context understanding, and safety, while inducing thought-skipping—shallow, impulsive reasoning patterns.

  • Persistent Effects: Even after reflective reasoning, instruction tuning or additional clean pretraining, performance only partially recovers, revealing irreversible representational drift.

  • Cognitive Hygiene for LLMs: The study reframes data curation as mental health maintenance for models, advocating periodic “cognitive checkups” to prevent long-term degradation.

📰 News

  • [2025/10/15] 🔥We released LLMs Can Get “Brain Rot”!, a large-scale study revealing how continual exposure to junk data leads to lasting cognitive degradation in large language models. Explore our paper and website for more details.

🚀 Installation

First, clone and install the dependencies from LLaMA-Factory:

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation

📦 Dataset Configuration

Next, update the dataset configuration file (./data/dataset_info.json) in the LLaMA-Factory repository to include the control/junk training data under ./datasets. Here’s an example configuration for the M1 junk dataset named junk_tweet_1m_en:

"junk_tweet_1m_en": {
    "file_name": "./datasets/M1/train_data_low_quality_1m_en.json",
    "columns": {
      "prompt": "text"
    }
},

This tells LLaMA-Factory where to locate the dataset and how to interpret its fields.

The source code for generating the M1 control and junk datasets is provided at datasets/preprocess/M1/M1_data_process.py. We also provide a the output filtered raw dataset containing only English samples, which can be downloaded here. For the preprocessing of M2 data, utilize GPT to classify the tweets to junk and high-quality categories based on the provided filtered dataset:

python datasets/preprocess/M2/M2_data_process_gpt.py \
    --input-file [FILTERED_DATA_DIR] \
    --save-dir [SAVE_DIR] \
    --api-key [OPENAI_API_KEY]

🤗 Model

We release our CPT+IT models across four LLMs (Llama3-8B-Instruct, Qwen3-4B-Thinking-2507, Qwen2.5-7B-Instruct, Qwen2.5-0.5B-Instruct) on HuggingFace:
👉 Models Collection

Each model variant includes 2 metrics and 5 junk-ratio settings.

Metric naming:
M1 metric → model names ending with en-sft
M2 metric → model names ending with en-gpt-sft

Junk ratio settings:
control: 0% junk ratio
mix-low: 20% junk ratio
mix-mid: 50% junk ratio
mix-high: 80% junk ratio
junk: 100% junk ratio

🏋️ Training

To begin training, create a configuration file for your dataset. For example, for the dataset junk_tweet_1m_en mentioned above, an example configuration file is provided at ./datasets/llama3_lora_pretrain_junk_1m_en.yaml.

Below is the content of the example configuration file:

### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
trust_remote_code: true

### method
stage: pt
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]

### dataset
dataset: junk_tweet_1m_en
cutoff_len: 2048
max_samples: 100000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: ./outputs/llama3-8b-pretrain-junk-tweet-1m-en
logging_steps: 1
save_strategy: epoch
save_total_limit: 1
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
packing: False

Once the configuration file is ready, you can launch training using the LLaMA-Factory training script:

CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train ./datasets/llama3_lora_pretrain_junk_1m_en.yaml

To perform instruction tuning, use the same training command as above, but replace the dataset with alpaca_en_demo and update the output_dir accordingly.

📊 Evaluation

To evaluate the performance of the trained models, refer to the corresponding benchmark repositories:

For reasoning and long-context understanding benchmarks, i.e. ARC and RULER, refer to the official lm-evaluation-harness documentation. For instance, to evaluate the models performance on ARC:

lm_eval --model hf \
    --model_args pretrained=[MODEL_DIR] \
    --tasks ai2_arc \
    --device cuda:0 \
    --batch_size 8 \
    --apply_chat_template

For personality analysis using TRAIT, follow the official setup and usage guide from the TRAIT repository:

CUDA_VISIBLE_DEVICES=0  python ./trait/src/run.py \
    --model_name [MODEL_DIR] \
    --model_name_short [MODEL_SHORT_NAME] \
    --inference_type chat \
    --prompt_type 1

For safety/jailbreak evaluation, we use the same method from LLMs-Finetuning-Safety. You can utilize the code in the eval-adv directory to measure the Attack Success Rate (ASR):

# export openai_key or add in the utils.py first
cd eval

python -m eval \
--eval_examples 100 \
--n_shots 0 \
--save_dir [RESULTS_DIR] \
--model_name_or_path [MODEL_DIR] \
--eval \
--metric gpt4o \
--eval_batch_size 1 

🧠 Failure Mode Analysis

Use eval-arc-llama-cot.py to evaluate LLaMA models on the ARC-challenge dataset with zero-shot chain-of-thought prompting:

python eval-arc-llama-cot.py [--model MODEL_ID] [--output OUTPUT_FILE]

Arguments:

  • --model: Hugging Face model ID / the path to your locally trained model directory (default: meta-llama/Meta-Llama-3-8B-Instruct)
  • --output: Output JSONL file (default: eval-arc-llama.jsonl)

Then, utilize ./extract_acc_gpt.py to evaluate the performance by GPT:

python extract_acc_gpt.py \
    --input_file [OUTPUT_JSONL_FILE] \
    --gpt_model gpt-4.1-2025-04-14 \
    --api_key [OPENAI_API_KEY] \
    --boxed

To analyze failure modes on the ARC dataset:

python analysis/analyze_arc_failures.py --input_file [PATH_TO_OUTPUT_ACC_FILE]

🧩 Training-free Mitigation

🔁 Self-Reflect

First, conduct failure mode analysis:

python analyze_arc_judge_llama.py \
    --model [MODEL_DIR] \
    --input_file [PATH_TO_OUTPUT_ACC_FILE] \
    --output [OUTPUT_JUDGE_FILE]

Then perform self-reflection on ARC results using guided critiques derived from failure analysis:

python self-reflect-arc-failure-guided.py \
    --model [MODEL_DIR] \
    --input_file [OUTPUT_JUDGE_FILE] \
    --output [OUTPUT_FILE]

Finally, re-evaluate the reflected outputs using ./extract_acc_gpt.py.

🌐 Ext-Reflect

For extended reflection using GPT-based feedback:

python analyze_arc_judge_gpt.py \
    --api_key [API_KEY] \
    --input_file [INPUT_FILE] \
    --output [OUTPUT_JUDGE_FILE]

Repeat the same process to generate critiques and evaluate the resulting performance.

📖 Citation

We are more than happy if this code is helpful to your work. If you use our code or extend our work, please consider citing our paper:

@article{xing2025brainrot,
    title={LLMs Can Get "Brain Rot"!},
    author={Xing, Shuo and Hong, Junyuan and Wang, Yifan and Chen, Runjin and Zhang, Zhenyu and Grama, Ananth and Tu, Zhengzhong and Wang, Zhangyang},
    journal={arXiv:2510.13928},
    year={2025},
}

About

LLM Can Get "Brain Rot"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Shell 0.3%