Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
Added
log_stdschedule for fresh training: linearly anneals the policy’slog_stdfrominittoendacrosstotal_timestepsvia a newLogStdScheduleCallback, mitigating the stochastic-vs-deterministic train/eval gap that arises when SB3’s defaultlog_std_init=0(σ=1.0 over normalized actions) lets the actor mean drift to a noise-dependent attractor — i.e. a policy that scores well on noisy rollouts but fails under deterministic eval. Configured via thelog_std_scheduleblock intrain/config/rl_config.yaml(keysinit,end); comment out the block to disable.log_stdis frozen at training start so the schedule fully controls action noise. Applies only to fresh training (--m t);--m cand--m fcontinue to use the existingtransfer_reset_log_stdknob. Scheduled target is logged to wandb astrain/log_std_scheduled. Defaultn_eval_episodesbumped from 5 to 15 to reduce variance in eval metrics when verifying the fix.Domain randomization for vehicle parameters: per-episode multiplicative Gaussian perturbation
θ' = θ · X,X ~ N(1, σ)clipped at±dr_clip_k·σ. Configured via thedomain_randomizationgym key as a flat{param: sigma}dict; wired into training configs only (eval uses nominal params). Coupled-param support: randomizinglf/s_max/sv_maxauto-deriveslr/s_min/sv_minto preserve wheelbase and symmetry. Seedocs/plan/DOMAIN_RANDOMIZATION.md.First-order steering actuator lag: optional exact zero-order-hold first-order lag on
SteeringAngleAction, gated by theT_steerYAML key (seconds). When enabled, replaces the bang-bang controller with the exact closed-form solution ofδ̇ = (δ_ref − δ)/T_steer, encoded as a steering velocity so the existing constant-sv integrator reproduces it without any dynamics-model changes. Cascades correctly with the existingsteer_bufferdead-time andsteering_constraintrate-clip. Currently enabled for the STD model (f1tenth_std.yaml,T_steer = 0.025as a starting estimate); other vehicle YAMLs ship with a commented hint for opt-in once bench-identified. Omitting or zeroing the key preserves bang-bang behaviour. Seedocs/plan/FIRST_ORDER_ACT_LAG.mdfor derivation, sim2real rationale, and the wheel-marker bench-ID protocol.drift_configpreset: public starter config for STD-model drift environments —from gymkhana import drift_config; gym.make("gymkhana:gymkhana-v0", config=drift_config(map="Drift")). Bundles the physics, control, observation, normalization, and reset settings used for drift training. Keyword args override individual fields. Available to PyPI users without access totrain/config/.train/config/env_config.py::_base_confignow layers training-specific keys on top of this preset to keep a single source of truth.render_configgym option: pass a dict of field overrides for the packagedgymkhana/envs/rendering/rendering.yaml(e.g.{"window_size": 1200, "render_type": "pygame", "show_ctr_debug": True}) directly throughgym.make(config={...}). Omitted fields keep their yaml defaults. Makes renderer fields (window size, render backend, debug toggles, vehicle palette) tunable without editing the packaged file — needed for PyPI users who can’t modify the installed source.Explicit YAML packaging:
pyproject.tomlnow declaresinclude = [{path = "gymkhana/**/*.yaml", ...}]so vehicle parameter and rendering YAMLs are guaranteed to ship in built wheels regardless of git tracking state.Optimal raceline generation:
maps/extract_raceline.pyproduces an optimized racing line for any map via ForzaETH’s fork of TUM’sglobal_racetrajectory_optimization(mincurv_iqp). The emitted<map>_raceline.csvnow carries the optimizer’s full corner-aware vx/ax profile (previously overridden with constants), whichRaceline.from_raceline_fileandexamples/waypoint_follow.pyalready consume as Pure Pursuit target speeds.STP (Single Track Pacejka) model: dynamic single-track bicycle with lateral-only Pacejka Magic Formula tire model, ported from the ForzaETH f110-simulator. Shares ST’s 7-element state layout. Selectable via
model='stp'with parameters fromGKEnv.f1tenth_stp_vehicle_params()(f1tenth_stp.yaml). Supportsdrift_stobservation type alongside ST.Aggregated min/max observation tracking across parallelized environments (works with both
normalize_obs=Trueandnormalize_obs=False) for tuning normalization bounds. Now backed by anObsMinMaxSnapshotCallbackthat periodically writes merged per-subproc trackers tooutputs/config/<run_id>/obs_min_max.yamland streams cumulative per-feature bounds-violation magnitudes to wandb underobs_bounds/<feature>/overand.../underfor live monitoring during training.Configurable instability prevention: opt-in via
prevent_instabilitygym config flag. When enabled, post-RK4 sanity checks on the standardized state revert blow-ups and truncate the episode; cumulative event count is logged to wandb underinstability/totalvia a newInstabilityCountCallback, with end-of-run per-env breakdown printed to stdout. Detection bounds are exposed asinstability_yaw_rate_boundandinstability_slip_bound.
Changed
ObsMinMaxSnapshotCallbacklogs via SB3 logger: bounds-violation metrics now go throughself.logger.record+dumpinstead of directwandb.log, aligning steps with PPO’s own writes.Per-step Frenet projection cache:
cartesian_to_frenetwas called 2–3× per env step from independent sites (observation, boundary check, recovery success, reward, done). Now projected once per agent inGKEnv._update_frenet_cacheimmediately aftersim.step, and all consumers read(s, ey, ephi)fromself._frenet_cache. Measured ~20% end-to-end speedup (3.03 → 2.44 ms/step) withobservation.observecost halved (647 → 325 µs) on the drift training config. Behaviour preserved — cached values are identical to the previous independent recomputations.
1.2.0 - 2026-04-11
Added
Control debug panel: real-time steering/throttle visualization (PyQt6 renderer)
Observation debug overlay: live observation values overlay on the map (PyQt6 renderer)
ONNX export: convert SB3 models to ONNX for sim-to-real transfer via
OnnxPolicyRunnerNorm bounds extraction: save normalization bounds to config for reuse in external packages
Custom neural network architecture: configurable layer sizes via
net_archin RL configMorris sensitivity analysis: parameter sensitivity analysis script for STD model
Observation sensitivity analysis: script for analyzing observation feature importance
Regression tests: tests for action ordering and other previously encountered bugs
Trajectory comparison script: compare KS, ST, STD model trajectories for sim2real validation
Fixed
Action ordering bug where
control_inputorder affected action array mappingRendering
rgb_arraymode bugfixKinematic model yaw_rate bugfix
Removed unused
abstractmethoddecorator onCarAction.actmethod
Changed
Refactored docstrings for consistency, correctness, and RTD readability
Clarified supported Python versions in README
Reorganized README for clearer table of contents
Made unit test suite more efficient
Moved commonroad submodule to analysis subfolder
Removed
Docker support (Poetry and PyPI are sufficient)
Euclidean reward option for recovery learning
1.1.1 - 2026-03-15
Fixed
Version bump only (packaging fix)
1.1.0 - 2026-03-15
Added
Recovery training: dedicated training mode (
training_mode: "recover") with recovery-specific rewards, resets, and evaluationCurriculum learning:
CurriculumLearningCallbackthat gradually expands recovery state initialization ranges as agent success rate improvesTransfer learning: transfer pretrained model weights to new tasks with optional critic reset and
log_stdre-initializationMPC controllers: Kinematic MPC (KMPC) and Single-Track MPC (STMPC) via acados integration
Multi-map training: train across multiple maps in parallel environments
Sparse width observations:
sparse_width_obsconfig to reduce observation dimensionality when track width varies littleP/PD controllers: simple proportional controllers for benchmarking recovery performance
Performance metrics: beta-r phase plane analysis, recovery trajectory plotting, controller comparison metrics
PyPI publishing via
publish.ymlGitHub Actions workflow (triggered on tag push)ReadTheDocs documentation site with Sphinx RTD theme
Logos, favicon, and branding assets
Changed
Refactored package for PyPI publishing (renamed
f1tenth_gymtogymkhana)Migrated from Black to Ruff for linting, formatting, and import sorting with pre-commit hooks
Refactored controllers with abstract base class
Refactored training scripts with shared
train_common.pyandtrain_utils.pyUpdated map loading to use maintained source for better compatibility
Reorganized project structure (analysis, figures, plans into dedicated directories)
Fixed
Mexico City track naming issue
Eval env should use curriculum max ranges
1.0.0 - 2026-01-30
Added
Drift dynamics: Single Track Drift (STD) model with PAC2002 tire physics
Reverse/random driving direction:
track_directionconfig option (normal,reverse,random) for balanced cornering trainingState reset:
env.reset(options={"states": ...})for full 7-d state initialization (STD model)Arc-length visualization: render arc-length annotations along the centerline for Frenet coordinate debugging
Observation types:
FeaturesObservation(drift) with slip angle, lookahead curvatures/widths; observation and action normalizationPredictive collision: TTC-based collision checking as alternative to Frenet-based
Wall deflection mode: configurable track boundary behavior (boundary vs. wall collision)
Progress reward gain: configurable
progress_gainmultiplier for forward progress rewardFrenet projection debugging:
debug_frenet_projectionvisualization optionLookahead curvature visualization: render lookahead sampling points ahead of the vehicle
Wandb integration: experiment tracking, model logging, download, and resume from wandb
Performance metrics: beta-r phase plane analysis, recovery trajectory plotting
PPO training infrastructure with SB3 and parallel environments
Fixed
Wraparound bug in progress tracking (robust fix with unit tests)
Normalization logic when both actions and observations are normalized
Reverse direction reset bug where random reset could point vehicle the wrong way due to cached reference line
Reverse direction normalization fix where min/max curvature bounds need to be symmetric
Changed
Configuration logic simplified and centralized in
train/config/Moved plotting files to
analysis/folder