Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased 

Added

log_std schedule for fresh training: linearly anneals the policy’s log_std from init to end across total_timesteps via a new LogStdScheduleCallback, mitigating the stochastic-vs-deterministic train/eval gap that arises when SB3’s default log_std_init=0 (σ=1.0 over normalized actions) lets the actor mean drift to a noise-dependent attractor — i.e. a policy that scores well on noisy rollouts but fails under deterministic eval. Configured via the log_std_schedule block in train/config/rl_config.yaml (keys init, end); comment out the block to disable. log_std is frozen at training start so the schedule fully controls action noise. Applies only to fresh training (--m t); --m c and --m f continue to use the existing transfer_reset_log_std knob. Scheduled target is logged to wandb as train/log_std_scheduled. Default n_eval_episodes bumped from 5 to 15 to reduce variance in eval metrics when verifying the fix.
Domain randomization for vehicle parameters: per-episode multiplicative Gaussian perturbation θ' = θ · X, X ~ N(1, σ) clipped at ±dr_clip_k·σ. Configured via the domain_randomization gym key as a flat {param: sigma} dict; wired into training configs only (eval uses nominal params). Coupled-param support: randomizing lf/s_max/sv_max auto-derives lr/s_min/sv_min to preserve wheelbase and symmetry. See docs/plan/DOMAIN_RANDOMIZATION.md.
First-order steering actuator lag: optional exact zero-order-hold first-order lag on SteeringAngleAction, gated by the T_steer YAML key (seconds). When enabled, replaces the bang-bang controller with the exact closed-form solution of δ̇ = (δ_ref − δ)/T_steer, encoded as a steering velocity so the existing constant-sv integrator reproduces it without any dynamics-model changes. Cascades correctly with the existing steer_buffer dead-time and steering_constraint rate-clip. Currently enabled for the STD model (f1tenth_std.yaml, T_steer = 0.025 as a starting estimate); other vehicle YAMLs ship with a commented hint for opt-in once bench-identified. Omitting or zeroing the key preserves bang-bang behaviour. See docs/plan/FIRST_ORDER_ACT_LAG.md for derivation, sim2real rationale, and the wheel-marker bench-ID protocol.
drift_config preset: public starter config for STD-model drift environments — from gymkhana import drift_config; gym.make("gymkhana:gymkhana-v0", config=drift_config(map="Drift")). Bundles the physics, control, observation, normalization, and reset settings used for drift training. Keyword args override individual fields. Available to PyPI users without access to train/config/. train/config/env_config.py::_base_config now layers training-specific keys on top of this preset to keep a single source of truth.
render_config gym option: pass a dict of field overrides for the packaged gymkhana/envs/rendering/rendering.yaml (e.g. {"window_size": 1200, "render_type": "pygame", "show_ctr_debug": True}) directly through gym.make(config={...}). Omitted fields keep their yaml defaults. Makes renderer fields (window size, render backend, debug toggles, vehicle palette) tunable without editing the packaged file — needed for PyPI users who can’t modify the installed source.
Explicit YAML packaging: pyproject.toml now declares include = [{path = "gymkhana/**/*.yaml", ...}] so vehicle parameter and rendering YAMLs are guaranteed to ship in built wheels regardless of git tracking state.
Optimal raceline generation: maps/extract_raceline.py produces an optimized racing line for any map via ForzaETH’s fork of TUM’s global_racetrajectory_optimization (mincurv_iqp). The emitted <map>_raceline.csv now carries the optimizer’s full corner-aware vx/ax profile (previously overridden with constants), which Raceline.from_raceline_file and examples/waypoint_follow.py already consume as Pure Pursuit target speeds.
STP (Single Track Pacejka) model: dynamic single-track bicycle with lateral-only Pacejka Magic Formula tire model, ported from the ForzaETH f110-simulator. Shares ST’s 7-element state layout. Selectable via model='stp' with parameters from GKEnv.f1tenth_stp_vehicle_params() (f1tenth_stp.yaml). Supports drift_st observation type alongside ST.
Aggregated min/max observation tracking across parallelized environments (works with both normalize_obs=True and normalize_obs=False) for tuning normalization bounds. Now backed by an ObsMinMaxSnapshotCallback that periodically writes merged per-subproc trackers to outputs/config/<run_id>/obs_min_max.yaml and streams cumulative per-feature bounds-violation magnitudes to wandb under obs_bounds/<feature>/over and .../under for live monitoring during training.
Configurable instability prevention: opt-in via prevent_instability gym config flag. When enabled, post-RK4 sanity checks on the standardized state revert blow-ups and truncate the episode; cumulative event count is logged to wandb under instability/total via a new InstabilityCountCallback, with end-of-run per-env breakdown printed to stdout. Detection bounds are exposed as instability_yaw_rate_bound and instability_slip_bound.

Changed

ObsMinMaxSnapshotCallback logs via SB3 logger: bounds-violation metrics now go through self.logger.record + dump instead of direct wandb.log, aligning steps with PPO’s own writes.
Per-step Frenet projection cache: cartesian_to_frenet was called 2–3× per env step from independent sites (observation, boundary check, recovery success, reward, done). Now projected once per agent in GKEnv._update_frenet_cache immediately after sim.step, and all consumers read (s, ey, ephi) from self._frenet_cache. Measured ~20% end-to-end speedup (3.03 → 2.44 ms/step) with observation.observe cost halved (647 → 325 µs) on the drift training config. Behaviour preserved — cached values are identical to the previous independent recomputations.

1.2.0 - 2026-04-11

Added

Control debug panel: real-time steering/throttle visualization (PyQt6 renderer)
Observation debug overlay: live observation values overlay on the map (PyQt6 renderer)
ONNX export: convert SB3 models to ONNX for sim-to-real transfer via OnnxPolicyRunner
Norm bounds extraction: save normalization bounds to config for reuse in external packages
Custom neural network architecture: configurable layer sizes via net_arch in RL config
Morris sensitivity analysis: parameter sensitivity analysis script for STD model
Observation sensitivity analysis: script for analyzing observation feature importance
Regression tests: tests for action ordering and other previously encountered bugs
Trajectory comparison script: compare KS, ST, STD model trajectories for sim2real validation

Fixed

Action ordering bug where control_input order affected action array mapping
Rendering rgb_array mode bugfix
Kinematic model yaw_rate bugfix
Removed unused abstractmethod decorator on CarAction.act method

Changed

Refactored docstrings for consistency, correctness, and RTD readability
Clarified supported Python versions in README
Reorganized README for clearer table of contents
Made unit test suite more efficient
Moved commonroad submodule to analysis subfolder

Removed

Docker support (Poetry and PyPI are sufficient)
Euclidean reward option for recovery learning

1.1.1 - 2026-03-15

Fixed

Version bump only (packaging fix)

1.1.0 - 2026-03-15

Added

Recovery training: dedicated training mode (training_mode: "recover") with recovery-specific rewards, resets, and evaluation
Curriculum learning: CurriculumLearningCallback that gradually expands recovery state initialization ranges as agent success rate improves
Transfer learning: transfer pretrained model weights to new tasks with optional critic reset and log_std re-initialization
MPC controllers: Kinematic MPC (KMPC) and Single-Track MPC (STMPC) via acados integration
Multi-map training: train across multiple maps in parallel environments
Sparse width observations: sparse_width_obs config to reduce observation dimensionality when track width varies little
P/PD controllers: simple proportional controllers for benchmarking recovery performance
Performance metrics: beta-r phase plane analysis, recovery trajectory plotting, controller comparison metrics
PyPI publishing via publish.yml GitHub Actions workflow (triggered on tag push)
ReadTheDocs documentation site with Sphinx RTD theme
Logos, favicon, and branding assets

Changed

Refactored package for PyPI publishing (renamed f1tenth_gym to gymkhana)
Migrated from Black to Ruff for linting, formatting, and import sorting with pre-commit hooks
Refactored controllers with abstract base class
Refactored training scripts with shared train_common.py and train_utils.py
Updated map loading to use maintained source for better compatibility
Reorganized project structure (analysis, figures, plans into dedicated directories)

Fixed

Mexico City track naming issue
Eval env should use curriculum max ranges

1.0.0 - 2026-01-30

Added

Drift dynamics: Single Track Drift (STD) model with PAC2002 tire physics
Reverse/random driving direction: track_direction config option (normal, reverse, random) for balanced cornering training
State reset: env.reset(options={"states": ...}) for full 7-d state initialization (STD model)
Arc-length visualization: render arc-length annotations along the centerline for Frenet coordinate debugging
Observation types: FeaturesObservation (drift) with slip angle, lookahead curvatures/widths; observation and action normalization
Predictive collision: TTC-based collision checking as alternative to Frenet-based
Wall deflection mode: configurable track boundary behavior (boundary vs. wall collision)
Progress reward gain: configurable progress_gain multiplier for forward progress reward
Frenet projection debugging: debug_frenet_projection visualization option
Lookahead curvature visualization: render lookahead sampling points ahead of the vehicle
Wandb integration: experiment tracking, model logging, download, and resume from wandb
Performance metrics: beta-r phase plane analysis, recovery trajectory plotting
PPO training infrastructure with SB3 and parallel environments

Fixed

Wraparound bug in progress tracking (robust fix with unit tests)
Normalization logic when both actions and observations are normalized
Reverse direction reset bug where random reset could point vehicle the wrong way due to cached reference line
Reverse direction normalization fix where min/max curvature bounds need to be symmetric

Changed

Configuration logic simplified and centralized in train/config/
Moved plotting files to analysis/ folder

Changelog

Unreleased

Added

Changed

1.2.0 - 2026-04-11

Added

Fixed

Changed

Removed

1.1.1 - 2026-03-15

Fixed

1.1.0 - 2026-03-15

Added

Changed

Fixed

1.0.0 - 2026-01-30

Added

Fixed

Changed

Unreleased 