output.mp4
This project was was done for OpenCV under the guidance of Gary Bradski and Reza Amayeh supported by Google Summer of Code(GSOC) 2025. This project presents a methodological approach to designing, implementing, and documenting a Simultaneous Localization and Mapping(SLAM) framework from scratch in Python. The primary objective is to build a fully functional, OpenCV-based SLAM system with clear documentation to facilitate reproducibility, future extensions, and adoption by the broader computer vision community. Traditional SLAM systems that rely on hand-crafted features (e.g., ORB, SIFT) often exhibit reduced robustness under challenging conditions such as viewpoint shifts, illumination changes, or motion blur. To address these limitations, we integrate modern learned features—ALIKED keypoints and descriptors (Zhao et al., 2022) combined with the LightGlue matcher (Lindenberger et al., 2023)—into a streamlined SLAM pipeline. This integration improves tracking stability and relocalization performance, especially in environments with significant photometric and geometric variations.
In addition, the system extends feature extraction and matching using LightGlue (Lindenberger et al., 2023) and leverages PyCeres for non-linear optimization, enabling efficient bundle adjustment and graph optimization within the pipeline.
Python feature-based SLAM / visual odometry experiments built around OpenCV, ALIKED + LightGlue, PyCeres, and Open3D.
The current primary runtime path is slam/monocular/main_revamped.py. It implements an incremental monocular pipeline with delayed two-view bootstrap, frame-to-map tracking, keyframe insertion, triangulation, local bundle adjustment, and live visualization. Older scripts are still present in the repository as legacy or experimental variants.
This repository has evolved beyond the original standalone SfM prototypes. The code that best reflects the current project state is:
slam/monocular/main_revamped.py: main monocular SLAM entrypointslam/core/*.py: reusable tracking, mapping, bootstrap, BA, and visualization modulestests/: focused unit tests for geometry and helper utilities
The project is still experimental and educational in nature. It is useful for understanding and iterating on a Python SLAM pipeline, but it is not a production-ready SLAM system.
slam/monocular/main_revamped.py performs the following steps:
- Load a dataset, camera intrinsics, and ground truth if available.
- Extract features using either:
- OpenCV detectors and matchers (
ORB,SIFT,AKAZE,BF,FLANN), or ALIKED + LightGlue
- OpenCV detectors and matchers (
- Delay initialization until a good two-view pair is found.
- Bootstrap the initial map by competing homography vs. fundamental-matrix models.
- Track each new frame against the map using:
- constant-velocity pose prediction
- reprojection of active landmarks
- small-window 2D-3D matching
solvePnPRansac
- Fall back to frame-to-frame 2D-2D tracking if PnP fails.
- Decide whether the current frame should become a new keyframe.
- Triangulate new landmarks between the latest keyframes.
- Run local bundle adjustment when enough new landmarks were added.
- Visualize:
- track-debug reprojection overlays
- current frame plus recent keyframes
- frame-to-frame feature matches
- 2D trajectory
(x-z)against ground truth when GT is available - 3D map view with Open3D
At the end of a run, the script always saves a trajectory image named trajectory_<dataset>.png.
.
├── slam/
│ ├── core/
│ │ ├── ba_utils.py # local/global bundle-adjustment helpers
│ │ ├── dataloader.py # sequence, calibration, and GT loaders
│ │ ├── features_utils.py # feature extraction and matching
│ │ ├── keyframe_utils.py # keyframe policy + thumbnail helpers
│ │ ├── landmark_utils.py # Map / MapPoint data structures
│ │ ├── pnp_utils.py # frame-to-map tracking and PnP helpers
│ │ ├── pose_utils.py # SE(3) pose conversions/utilities
│ │ ├── trajectory_utils.py # GT alignment helpers
│ │ ├── triangulation_utils.py # keyframe-to-keyframe triangulation
│ │ ├── two_view_bootstrap.py # monocular initialization
│ │ ├── visualization_utils.py # 2D/3D visualization utilities
│ │ └── visualize_ba.py # BA visualization helpers
│ ├── monocular/
│ │ ├── main_revamped.py # current monocular entrypoint
│ │ ├── main.py # older monocular pipeline
│ │ ├── main4.py # alternate legacy monocular path
│ │ └── Notes.txt
│ └── stereo/
│ └── ROUGHstereo_tracker.py # early stereo experiment
├── config/
│ ├── calibrate_camera/ # scripts and example files for custom calibration
│ ├── monocular.yaml
│ └── stereo.yaml
├── docs/
│ └── system_design.md
├── scripts/
│ └── run_tracker_visualization.sh # example launch commands
├── tests/ # utility and geometry-focused tests
├── tools/ # misc tooling scripts
├── refrences/ # older reference / prototype code
├── requirements.txt
└── setup.py
Recommended:
- Python
3.10+ opencv-pythonnumpyscipymatplotlibtqdmtorchlightgluelz4open3dpycerespycolmap
Important notes about the current codebase:
slam/core/features_utils.pyimportsLightGlueandALIKEDat module import time, soLightGluemust currently be installed even if you intend to run with OpenCV features only.slam/core/ba_utils.pyimportspyceresandpycolmapat module import time, so those packages are also required formain_revamped.py.Open3Dis optional in practice if you always use--no_viz3dor--headless; the visualizer degrades gracefully whenopen3dis unavailable.
Create an environment and install the package:
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .If lightglue==0.0 does not resolve cleanly in your environment, install it from upstream and then continue:
pip install "lightglue @ git+https://github.com/cvg/LightGlue.git"The current loader in slam/core/dataloader.py expects specific folder names and, for some datasets, specific hardcoded sequences.
<base_dir>/kitti/
├── 05/
│ └── image_0/
│ └── *.png
└── poses/
└── 05.txt
Notes:
- The current monocular loader is hardcoded to KITTI sequence
05.
<base_dir>/malaga/
├── malaga-urban-dataset-extract-07_rectified_800x600_Images/
│ ├── *_left.jpg
│ └── *_right.jpg
└── malaga-urban-dataset-extract-07_all-sensors_GPS.txt
Notes:
- The monocular path uses the left images.
- The current loader is hardcoded to Malaga extract
07.
<base_dir>/tum-rgbd/
└── rgbd_dataset_freiburg3_long_office_household/
├── rgb/
│ └── *.png
├── rgb.txt
└── groundtruth.txt
Notes:
- The current loader is hardcoded to
rgbd_dataset_freiburg3_long_office_household. - The monocular pipeline only uses the RGB frames.
<base_dir>/custom/
├── custom_compress.mp4
└── calibration.pkl
Notes:
calibration.pklis loaded byslam/core/dataloader.pyand is expected to contain the camera matrix as the first item in the pickled tuple.- The scripts in
config/calibrate_camera/are intended to help generate the calibration files for custom data.
The recommended way to run the current code is from the repository root:
python3 -m slam.monocular.main_revamped \
--dataset kitti \
--base_dir ./Dataset \
--use_lightglueThis disables all GUI windows and saves the trajectory plot at the end.
python3 -m slam.monocular.main_revamped \
--dataset kitti \
--base_dir ./Dataset \
--use_lightglue \
--headless \
--no_viz3dpython3 -m slam.monocular.main_revamped \
--dataset tum-rgbd \
--base_dir ./Dataset \
--detector akaze \
--matcher bf \
--no_viz3dpython3 -m slam.monocular.main_revamped \
--dataset custom \
--base_dir ./Dataset \
--use_lightglue \
--no_viz3dThe shell script scripts/run_tracker_visualization.sh also contains example launch commands.
The main entrypoint is slam/monocular/main_revamped.py. Key options:
| Argument | Meaning |
|---|---|
--dataset {kitti,malaga,tum-rgbd,custom} |
Select dataset loader |
--base_dir PATH |
Root folder that contains the dataset subdirectory |
--use_lightglue |
Use ALIKED + LightGlue instead of OpenCV detectors/matchers |
--detector {orb,sift,akaze} |
OpenCV detector when not using LightGlue |
--matcher {bf,flann} |
OpenCV matcher when not using LightGlue |
--max_features INT |
Max features / keypoints for OpenCV detectors and ALIKED |
--min_conf FLOAT |
Minimum LightGlue confidence |
--ransac_thresh FLOAT |
Shared RANSAC reprojection threshold |
--pnp_min_inliers INT |
Minimum inliers required to accept PnP |
--proj_radius FLOAT |
Small-window matching radius for landmark reprojection |
--kf_min_inliers INT |
Keyframe insertion threshold on matched inliers |
--kf_min_ratio FLOAT |
Minimum inlier ratio to previous keyframe |
--kf_max_disp FLOAT |
Keyframe insertion threshold based on image displacement |
--kf_min_rot_deg FLOAT |
Keyframe insertion threshold based on rotation |
--local_ba_window INT |
Local BA window size |
--local_ba_min_new_points INT |
Only run local BA when enough new landmarks were triangulated |
--local_ba_max_points INT |
Cap landmarks used by local BA |
--local_ba_max_iters INT |
Max Ceres iterations for local BA |
--no_viz3d |
Disable the Open3D 3D map window |
--headless |
Disable all interactive visualization and only save final trajectory |
--gba_every INT |
Global BA milestone hook; the full GBA call is currently scaffolded in code but not actively executed in the main loop |
When not running headless, the current pipeline may open:
Track debug: current-frame reprojection overlayStrip: img2 + last 3 KFs: current frame plus recent keyframe thumbnailsimg2 + prev→cur matches: frame-to-frame matched featuresTrajectory 2D (x-z): estimated trajectory over ground truth when GT existsSLAM Map: Open3D view of landmarks and camera path
At the end of a run:
- the trajectory figure is saved as
trajectory_<dataset>.png - in non-headless mode, the final trajectory figure is also shown interactively
The repository includes focused utility and geometry tests under tests/. A representative example:
pytest -q tests/test_pnp_utils.pyThe tests are mainly unit-level checks for geometry helpers and matching / pose utilities, not a full end-to-end dataset regression suite.
Not every file in the repository is part of the active path:
slam/monocular/main.pyandslam/monocular/main4.pyare older monocular variants.slam/stereo/ROUGHstereo_tracker.pyis an early stereo experiment.refrences/contains reference and prototype scripts.- Some files in
docs/andtools/are placeholders or work-in-progress.
If you are extending the current code, start from:
- Python Prototyping: Great for algorithmic experimentation but can be slower for large-scale or real-time tasks.
- Production-Grade: Offload heavy steps (bundle adjustment, large-scale optimization) to C++.
- Loop Closure & Full SLAM: These scripts focus on Visual Odometry. Future expansions may include place recognition, pose graph optimization, etc.