GitHub - IS2AI/MMHA-28: MMHA-28: Human Action Recognition Across RGB, Depth, Thermal, and Event Modalities

MMHAR-28: Human Action Recognition Across RGB, Depth, Thermal, and Event Modalities

This repository provides the official implementation, dataset, and training scripts for MMHAR-28: Human Action Recognition Across RGB, Depth, Thermal, and Event Modalities paper.

This repository contains two model pipelines for the same MMHAR-28 dataset:

videomamba/: the VideoMamba-based implementation
tsm_timeSformer/: an MMAction2-based TSM/TimeSformer training pipeline for MMHAR-28

This repository builds on top of VideoMamba and includes the training code, evaluation scripts, and the local mamba and causal-conv1d dependencies required by the model implementation.

Our MMHAR-28 Dataset

The project expects MMHA-28 data to be organized by split and modality. The CSV files in videomamba/video_sm/data show the expected path format.

Examples:

data/train/session_1/sub_18/d_rgb/28/rgb_images,13
data/train/session_1/sub_7/d_rgb/26/depth_images,12
data/train/session_1/sub_33/thermal/9_1_0,8
data/train/session_1/sub_55/event-streams/15,7

Available split files include:

videomamba/video_sm/data/train.csv
videomamba/video_sm/data/val.csv
videomamba/video_sm/data/test.csv
modality-specific test files under videomamba/video_sm/data

1.Download the MMHA-28 dataset from the official source Hugging Face:

hf download issai/MMHA_28 --repo-type dataset --local-dir .

Alternatively, a mini-sample version is available, containing data from one subject in session_1 and session_2, across all human actions. This is option for testing and visualization: mini-mmha-28

hf download tomirisss/mini-mmha --repo-type dataset

2. Visualization

To visualize data from the mini-sample, run the following script with appropriate parameters:

   python vis.py --path PATH_TO_DATA --session session_1 --exp_num EXP_NUMBER

VideoMamba Pipeline

Installation

1. Install the dependencies:

pip install -r requirements.txt
pip install -e ./mamba
pip install -e ./causal-conv1d

2. Optional: pull the Docker image

If you use the provided container workflow, pull the published image:

docker pull mmhm28/mmha-28:latest

3. Training

Training is launched from videomamba/video_sm/run.py.

Before running training, review these values in that file:

--nproc_per_node to match the number of GPUs on your machine
MODEL_PATH if you want to start from a specific checkpoint
DATA_PATH and PREFIX if your dataset is stored outside the default relative layout
output directories such as OUTPUT_DIR

Start training with:

cd videomamba/video_sm
python run.py

4. Evaluation

To test a pretrained model, first download the final Multimodal VideoMamba checkpoint: MV-Mamba or using this code:

   huggingface-cli upload tomirisss/MV-Mamba .

MV-Mamba is the final multimodal model. The filename also indicates the number of frames used during training (e.g., MV-Mamba_f16.pth was trained with --num_frames=16). Then, run the script, updating the --num_frames parameter and specifying the appropriate paths for MODEL_PATH.

   python3 run_test.py

TSM / TimeSformer Pipeline

Installation

The tsm directory is a local MMAction2-based codebase. Install it from this repository, not from a separate external checkout.

Recommended setup:

cd tsm
python -m venv .venv
source .venv/bin/activate
pip install -U pip setuptools wheel
pip install -r requirements/build.txt
pip install -r requirements/mminstall.txt
pip install -e .

Important dependency notes:

local MMAction version: 1.2.0
required mmcv: >=2.0.0rc4,<2.2.0
required mmengine: >=0.7.1,<1.0.0

If you already have PyTorch installed, keep it compatible with your CUDA setup before installing the MMAction stack.

Main entrypoints:

Run TSM training:

cd tsm
source .venv/bin/activate
python tools/train.py configs/recognition/tsm/tsm_multimodal_mmha28_32.py

Run TSM evaluation:

cd tsm
source .venv/bin/activate
python tools/test.py configs/recognition/tsm/tsm_multimodal_mmha28_32.py work_dirs/tsm_multimodal_mmha28_32/best_acc_top1_epoch_95.pth

MMAction2 reference:

OpenMMLab/mmaction2

If you use the dataset/source code/pre-trained models in your research, please cite our work:

@ARTICLE{11447325,
  author={Rakhimzhanova, Tomiris and Kuzdeuov, Askat and Muratov, Artur and Varol, Huseyin Atakan},
  journal={IEEE Transactions on Biometrics, Behavior, and Identity Science}, 
  title={MMHAR-28: Human Action Recognition Across RGB, Thermal, Depth, and Event Modalities}, 
  year={2026},
  volume={},
  number={},
  pages={1-1},
  keywords={Videos;Cameras;Event detection;Thermal sensors;Sensors;Web sites;Video on demand;Three-dimensional displays;Software;Lighting;Human action recognition (HAR);multimodal learning;RGB;depth;thermal;event-based camera;multimodal dataset;video classification;deep learning},
  doi={10.1109/TBIOM.2026.3675639}}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
causal-conv1d		causal-conv1d
mamba		mamba
videomamba/video_sm		videomamba/video_sm
LICENSE		LICENSE
README.md		README.md
Teaser Image.png		Teaser Image.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMHAR-28: Human Action Recognition Across RGB, Depth, Thermal, and Event Modalities

Our MMHAR-28 Dataset

1.Download the MMHA-28 dataset from the official source Hugging Face:

2. Visualization

VideoMamba Pipeline

Installation

1. Install the dependencies:

2. Optional: pull the Docker image

3. Training

4. Evaluation

TSM / TimeSformer Pipeline

Installation

If you use the dataset/source code/pre-trained models in your research, please cite our work:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MMHAR-28: Human Action Recognition Across RGB, Depth, Thermal, and Event Modalities

Our MMHAR-28 Dataset

1.Download the MMHA-28 dataset from the official source Hugging Face:

2. Visualization

VideoMamba Pipeline

Installation

1. Install the dependencies:

2. Optional: pull the Docker image

3. Training

4. Evaluation

TSM / TimeSformer Pipeline

Installation

If you use the dataset/source code/pre-trained models in your research, please cite our work:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages