Add Termination/Truncation by Aditya-Gupta26 · Pull Request #303 · Emerge-Lab/PufferDrive

Aditya-Gupta26 · 2026-02-17T21:29:32Z

Reimplementation of already merged work .

Bifurcates between truncations and termination to let RL policy use bootstrapped value in case of truncation, potentially aiding training.

Small modification -

In case of STOP/REMOVE collision behavior setting, we mark the episode terminated for the corresponding agent

greptile-apps · 2026-02-17T21:33:17Z

Greptile Summary

This PR implements proper separation of termination and truncation signals to enable bootstrapped value estimation in RL training. The changes allow the policy to distinguish between true episode endings (reaching goals, collisions) and artificial timeouts (episode length limits), which can improve training by using bootstrapped values on truncations.

Key Changes

Added truncations array tracking throughout C and Python code
Set terminal flags for goal-reaching and collision events (when using STOP/REMOVE behaviors)
Set truncation flags when episode reaches time limit or all agents have respawned
Implemented truncation bootstrapping in pufferl.py using previous step's value as proxy for terminal state value
Updated config comment clarifying goal_radius behavior with reward randomization

Issues Found

Critical logic error in timestep comparison on line 2635 of drive.h - uses (env->timestep + 1) >= env->episode_length instead of env->timestep >= env->episode_length, causing episodes to truncate one step earlier than intended

Confidence Score: 3/5

Contains a logic bug that will cause incorrect episode truncation timing
The off-by-one error in the timestep check will cause all episodes to end one step early, affecting training behavior and metrics. The core truncation/termination separation logic is sound, but this bug needs fixing before merge.
Fix the timestep comparison in pufferlib/ocean/drive/drive.h line 2635

Important Files Changed

Filename	Overview
pufferlib/ocean/drive/drive.h	Added truncation tracking and terminal flags for collisions/goals. Contains logic error in timestep comparison that causes early truncation.
pufferlib/pufferl.py	Implements truncation bootstrapping using previous value as proxy. Logic appears correct for handling auto-reset environments.
pufferlib/ocean/env_binding.h	Uncommented truncations array binding. Simple change to enable truncation tracking.
pufferlib/ocean/drive/drive.py	Added truncations reset in reset() and step() methods. Straightforward change.
pufferlib/config/ocean/drive.ini	Updated comment to clarify goal_radius only active when reward_randomization = 0.

Flowchart

flowchart TD
    A[c_step starts] --> B[Increment timestep<br/>Reset terminals and truncations arrays]
    B --> C[Move agents and compute collision states]
    C --> D{Collision detected?}
    D -->|Vehicle/Offroad collision<br/>with STOP/REMOVE behavior| E[Set terminal flag]
    D -->|No collision| F[Continue]
    E --> F
    F --> G{Goal reached?}
    G -->|GOAL_RESPAWN mode| H[Set terminal flag<br/>and respawn_agent]
    G -->|GOAL_STOP mode| I[Set terminal flag<br/>and stop agent]
    G -->|GOAL_GENERATE_NEW| J[sample_new_goal<br/>Set terminal flag]
    G -->|Not reached| K[Check episode termination]
    H --> K
    I --> K
    J --> K
    K --> L{timestep+1 >= episode_length<br/>OR all agents respawned?}
    L -->|Yes| M[Set ALL truncation flags<br/>add_log and c_reset]
    L -->|No| N[compute_observations<br/>Return to Python]
    N --> O[PufferRL: Bootstrap on truncations<br/>Add gamma times V from previous step]

_{Last reviewed commit: 615d812}

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-17T21:33:21Z

pufferlib/ocean/drive/drive.h

+            break;
+        }
+    }
+    int reached_time_limit = (env->timestep + 1) >= env->episode_length;


off-by-one error: should be env->timestep >= env->episode_length (not +1). original code used ==, but this causes truncation one step early

Suggested change

int reached_time_limit = (env->timestep + 1) >= env->episode_length;

int reached_time_limit = env->timestep >= env->episode_length;

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

Adding truncations/proper terminations

c4a0e02

Aditya-Gupta26 force-pushed the aditya/add_truncations branch from 615d812 to c4a0e02 Compare February 23, 2026 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add Termination/Truncation#303

Add Termination/Truncation#303
Aditya-Gupta26 wants to merge 1 commit into3.0_betafrom
aditya/add_truncations

Aditya-Gupta26 commented Feb 17, 2026

Uh oh!

greptile-apps bot commented Feb 17, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	int reached_time_limit = (env->timestep + 1) >= env->episode_length;
	int reached_time_limit = env->timestep >= env->episode_length;

Comments

Conversation

Aditya-Gupta26 commented Feb 17, 2026

Uh oh!

greptile-apps bot commented Feb 17, 2026

Greptile Summary

Key Changes

Issues Found

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant