This project implements a machine learning pipeline to predict vehicle lane change intent using the NGSIM (US-101 and I-80) trajectory datasets. The objective is to predict lane change intent as "Left" and "Right" before they occur using a 5-second prediction horizon. Data can be downloaded from here https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Vehicle-Trajector/8ect-6jqj/about_data
data/: Raw NGSIM trajectory data (US-101 and I-80).preprocess.py: Loads raw data, adds lane change intent labels with a 5s horizon, and applies physical lane-boundary constraints.models/:rfclassifier.py: OOP-based Random Forest implementation including threshold optimization and model persistence.ltsm.py: Sequential trajectory modeling (Next Steps).
notebooks/: Playground area for jupyter notebooks and data exploration.results/: Logged runs containing Feature Importance plots, Confusion Matrices, andmetrics_report.png.utils/:data_prep.py: Handles vehicle-based splitting andsampling_keep_factorto prevent data leakage.visualize.py: BEV (Bird’s Eye View) trajectory visualization.
main.py: Orchestration script for the end-to-end pipeline.
The model underwent iterative tuning to address the high noise floor in the NGSIM dataset and significant class imbalance.
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| None (0) | 0.98 | 0.99 | 0.98 | 1,698,261 |
| Left (1) | 0.49 | 0.43 | 0.46 | 38,805 |
| Right (2) | 0.35 | 0.26 | 0.30 | 14,462 |
- Optimal Thresholds: Left: 0.8052 | Right: 0.8442.
- Best Weights:
{0: 1.0, 1: 3.0, 2: 10.0}.
In highway trajectory data, Left lane changes are typically more aggressive (overtaking), resulting in higher lateral velocity (
- The Sensor Paradox: Raw, unclipped
v_latyielded better precision than filtered data, as the Random Forest utilized sensor "spikes" as early indicators of boundary crossing. - Contextual Logic: Adding
Lane_IDand binary flags likecan_go_rightsignificantly improved precision by acting as physical logic gates. - Temporal Features: The addition of `v_lat_lag
To move beyond the current 0.46 (Left) and 0.30 (Right) F1-scores, the project is transitioning to a Long Short-Term Memory (LSTM) model to:
- Process trajectories as continuous temporal sequences rather than independent frames.
- Utilize hidden states to maintain long-term driving context.
- Improve "Interaction Awareness" by modeling how surrounding vehicle gaps influence intent over time.