Skip to content

Comments

Add Termination/Truncation#303

Open
Aditya-Gupta26 wants to merge 1 commit into3.0_betafrom
aditya/add_truncations
Open

Add Termination/Truncation#303
Aditya-Gupta26 wants to merge 1 commit into3.0_betafrom
aditya/add_truncations

Conversation

@Aditya-Gupta26
Copy link

Reimplementation of already merged work .

Bifurcates between truncations and termination to let RL policy use bootstrapped value in case of truncation, potentially aiding training.

Small modification -

  • In case of STOP/REMOVE collision behavior setting, we mark the episode terminated for the corresponding agent

@greptile-apps
Copy link

greptile-apps bot commented Feb 17, 2026

Greptile Summary

This PR implements proper separation of termination and truncation signals to enable bootstrapped value estimation in RL training. The changes allow the policy to distinguish between true episode endings (reaching goals, collisions) and artificial timeouts (episode length limits), which can improve training by using bootstrapped values on truncations.

Key Changes

  • Added truncations array tracking throughout C and Python code
  • Set terminal flags for goal-reaching and collision events (when using STOP/REMOVE behaviors)
  • Set truncation flags when episode reaches time limit or all agents have respawned
  • Implemented truncation bootstrapping in pufferl.py using previous step's value as proxy for terminal state value
  • Updated config comment clarifying goal_radius behavior with reward randomization

Issues Found

  • Critical logic error in timestep comparison on line 2635 of drive.h - uses (env->timestep + 1) >= env->episode_length instead of env->timestep >= env->episode_length, causing episodes to truncate one step earlier than intended

Confidence Score: 3/5

  • Contains a logic bug that will cause incorrect episode truncation timing
  • The off-by-one error in the timestep check will cause all episodes to end one step early, affecting training behavior and metrics. The core truncation/termination separation logic is sound, but this bug needs fixing before merge.
  • Fix the timestep comparison in pufferlib/ocean/drive/drive.h line 2635

Important Files Changed

Filename Overview
pufferlib/ocean/drive/drive.h Added truncation tracking and terminal flags for collisions/goals. Contains logic error in timestep comparison that causes early truncation.
pufferlib/pufferl.py Implements truncation bootstrapping using previous value as proxy. Logic appears correct for handling auto-reset environments.
pufferlib/ocean/env_binding.h Uncommented truncations array binding. Simple change to enable truncation tracking.
pufferlib/ocean/drive/drive.py Added truncations reset in reset() and step() methods. Straightforward change.
pufferlib/config/ocean/drive.ini Updated comment to clarify goal_radius only active when reward_randomization = 0.

Flowchart

flowchart TD
    A[c_step starts] --> B[Increment timestep<br/>Reset terminals and truncations arrays]
    B --> C[Move agents and compute collision states]
    C --> D{Collision detected?}
    D -->|Vehicle/Offroad collision<br/>with STOP/REMOVE behavior| E[Set terminal flag]
    D -->|No collision| F[Continue]
    E --> F
    F --> G{Goal reached?}
    G -->|GOAL_RESPAWN mode| H[Set terminal flag<br/>and respawn_agent]
    G -->|GOAL_STOP mode| I[Set terminal flag<br/>and stop agent]
    G -->|GOAL_GENERATE_NEW| J[sample_new_goal<br/>Set terminal flag]
    G -->|Not reached| K[Check episode termination]
    H --> K
    I --> K
    J --> K
    K --> L{timestep+1 >= episode_length<br/>OR all agents respawned?}
    L -->|Yes| M[Set ALL truncation flags<br/>add_log and c_reset]
    L -->|No| N[compute_observations<br/>Return to Python]
    N --> O[PufferRL: Bootstrap on truncations<br/>Add gamma times V from previous step]
Loading

Last reviewed commit: 615d812

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

break;
}
}
int reached_time_limit = (env->timestep + 1) >= env->episode_length;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off-by-one error: should be env->timestep >= env->episode_length (not +1). original code used ==, but this causes truncation one step early

Suggested change
int reached_time_limit = (env->timestep + 1) >= env->episode_length;
int reached_time_limit = env->timestep >= env->episode_length;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant