The curated dataset for Tracker contains:
- 9,780 total sequences
- 9,758 AMASS sequences
- 22 LAFAN1 sequences
The curated dataset for RobotMDAR contains:
- 8,285 total sequences
- 6,209 training sequences
- 2,076 validation sequences
- 105,395 seconds of total motion duration
The following guidance shows how we obtain our dataset.
Download AMASS and BABEL-TEACH data from their websites.
Use GMR to retarget whole AMASS data to G1:
- Install
GMRand additionaljoblib. - Run
dataset/smplx_to_robot_dataset.py, a slightly modified script.
python dataset/smplx_to_robot_dataset.py --src_folder <path_to_dir_of_smplx_data> --tgt_folder <path_to_dir_to_save_robot_data> --robot unitree_g1Post process retargeted data:
- This script interpolates the motion data from 30fps to 50fps, computes the feet contact mask, and saves the data in a PBHC-pkl format.
python dataset/process_retarget_data.py --input_dir <Path To Retargeted Data Dir> --output_dir <Path To Output Dir> --robot_config "TextOpRobotMDAR/robotmdar/config/skeleton/g1.yaml"We manually remove some unsuitable data in AMASS for RL training.
Pack the motion directory to a single file
cd TextOpTracker
python scripts/motion_package.py <folder_with_pkl_files>Transform the data format to meet Tracker's requirement:
- You can also use
dataset/pkl_to_npz.pyto transform the files one by one.
# Activate the Tracker's environment
python scripts/pklpack_to_npz.py --input_file /path/to/aaa.pkl \
--output_dir ./artifacts/unpacked_motions --input_fps 50 --output_fps 50
We also select and add some high-quality motions from LAFAN1. These parts of data can be transformed from .csv format to .npz format by:
python scripts/csv_to_npz.py --input_file LAFAN/dance1_subject2.csv --input_fps 30 --frame_range 122 722 \
--output_file ./artifacts/dance1_subject2/motion.npz --output_fps 50Organize the data files as following for Tracker loading:
TextOpTracker/artifacts/
├── Dataset/
│ ├── motion1
│ │ └── motion.npz
│ ├── motion1
│ │ └── motion.npz
│ └── ...
└── ...
Pack motion and text label dataset to meet RobotMDAR's requirement:
- It'll split the dataset into training and validation set according to the annotation from
BABEL. The training settrain.pklcomprises 6,209 sequences, the validation setval.pklcontains 2,076 sequences, and the entire dataset has a total duration of 105,395 seconds.
python dataset/pack_dataset.py --amass_robot <Path To Retargeted Data Dir> --babel <Path to BABEL Dir>Calculate data sampling weights for RobotMDAR training:
python dataset/cal_action_statistics.py --data_folder <Path To Packaged Data Dir> --trg_filename <Path To Save Json File>