Reward Is Enough: LLMs Are In-Context Reinforcement Learners

This repository contains the code for reproducing the results in the paper "Reward Is Enough: LLMs Are In-Context Reinforcement Learners" (ICLR 2026).

Repository Structure

├── experiments/                          # Core ICRL experiments
│   ├── game24/                           # Game of 24
│   ├── creative_writing/                 # Creative Writing
│   ├── math/                             # Math Competitions (AIME/HMMT)
│   └── sciworld/                         # ScienceWorld
├── analysis/                             # Analysis experiments & visualization
│   ├── attention_analysis/               # Reward-sensitive attention heads
│   ├── beyond_parametric_knowledge/      # ArXiv abstract generation
│   └── data_analysis/                    # Plotting & post-processing
├── requirements/                         # Dependencies
│   ├── requirements_sciworld_math.txt    # ScienceWorld & Math experiments
│   └── requirements_creative_writing_game24.txt  # Creative Writing & Game of 24
└── README.md

Shared Setup

Install the dependencies. We recommend using uv.

For ScienceWorld and Math experiments:

uv venv --python 3.11
source .venv/bin/activate
uv pip install -r requirements/requirements_sciworld_math.txt

For Creative Writing and Game of 24 experiments:

uv pip install -r requirements/requirements_creative_writing_game24.txt

Game of 24

Configure the OpenAI API key and specify which ICRL method or ablation to run in the file, then run:

cd experiments/game24
python llm_game24_api.py

Run reflexion baseline:

python llm_game24_api_reflexion.py

Run self-refine baseline:

python llm_game24_api_self-refine.py

Run Best-of-N baseline:

python llm_game24_api_rejection.py

Run long CoT baseline:

python llm_game24_api_CoT.py

Creative Writing

Configure the OpenAI API key and specify which ICRL method or ablation to run in the file, then run:

cd experiments/creative_writing
python llm_creative_writing_api.py

Run reflexion baseline:

python llm_creative_writing_api_reflexion.py

Run self-refine baseline:

python llm_creative_writing_api_self-refine.py

Run long CoT baseline:

python llm_creative_writing_api_CoT.py

ScienceWorld

Setup

Make sure you have Java 1.8+ installed

javac -version

Clone the ScienceWorld repository and install it

git clone https://github.com/allenai/ScienceWorld.git
cd ScienceWorld
pip install -e .

Running the experiments

cd experiments/sciworld

Run ICRL preset:

python3 sciworld.py icrl_mode=ICRL num_envs=29

Run ICRL ablations, e.g. explore_only:

python3 sciworld.py icrl_mode=ICRL num_envs=29 explore_only=true

Run other baselines, e.g. random sampling:

python3 sciworld.py icrl_mode=RANDOM_SAMPLING num_envs=29 max_env_steps=200

For all the other options available including the ablations and baselines, refer to the SciWorldConfig class in experiments/sciworld/sciworld.py.

Math Competitions (AIME/HMMT)

cd experiments/math
python math_bench.py

Beyond Parametric Knowledge (ArXiv Abstract Generation)

cd analysis/beyond_parametric_knowledge

Run ICRL:

python beyond_parametric_knowledge.py

Run Best-of-N baseline:

python beyond_parametric_knowledge.py --rejection_sampling

Run exploitation-only ablation:

python beyond_parametric_knowledge.py --exploitation_only

Run exploration-only ablation:

python beyond_parametric_knowledge.py --explore_only

Attention Analysis (Reward-Sensitive Heads)

Analyzes attention patterns in Qwen3-32B to identify reward-sensitive heads. Requires 2 GPUs.

cd analysis/attention_analysis

Run the initial analysis (layers -1 to -4, 64 heads each):

bash test_layers_heads.sh <path_to_output_list.json>

Run the extended analysis across all 32 layers:

bash run_all_layers.sh <path_to_output_list.json>

Generate the significant heads figure:

python plot_significant_heads_bar.py

Acknowledgement: We have borrowed code from the ScienceWorld, ARMAP, and CLIN repositories.

Citation

@inproceedings{song2026reward,
      title={Reward Is Enough: LLMs Are In-Context Reinforcement Learners}, 
      author={Kefan Song and Amir Moeini and Peng Wang and Lei Gong and Rohan Chandra and Shangtong Zhang and Yanjun Qi},
      booktitle={International Conference on Learning Representations (ICLR)},
      year={2026},
      eprint={2506.06303},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2506.06303}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
analysis		analysis
assets		assets
experiments		experiments
requirements		requirements
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reward Is Enough: LLMs Are In-Context Reinforcement Learners

Repository Structure

Shared Setup

Game of 24

Creative Writing

ScienceWorld

Setup

Running the experiments

Math Competitions (AIME/HMMT)

Beyond Parametric Knowledge (ArXiv Abstract Generation)

Attention Analysis (Reward-Sensitive Heads)

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reward Is Enough: LLMs Are In-Context Reinforcement Learners

Repository Structure

Shared Setup

Game of 24

Creative Writing

ScienceWorld

Setup

Running the experiments

Math Competitions (AIME/HMMT)

Beyond Parametric Knowledge (ArXiv Abstract Generation)

Attention Analysis (Reward-Sensitive Heads)

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages