Skip to content

Comments

Update lerobot recorder#145

Merged
yuecideng merged 10 commits intomainfrom
yueci/update-lerobot-recorder
Feb 21, 2026
Merged

Update lerobot recorder#145
yuecideng merged 10 commits intomainfrom
yueci/update-lerobot-recorder

Conversation

@yuecideng
Copy link
Contributor

@yuecideng yuecideng commented Feb 21, 2026

Description

This PR update LerobotDataset Recorder in the following aspects:

  • Remove observation key in lerobot dataset functor param. Now we use observation space directly from env.
  • Support record more data (obj pose, vel, mass, robot qvel and qf)

Type of change

  • Enhancement (non-breaking change which improves an existing functionality)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have run the black . command to format the code base.
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • Dependencies have been updated, if applicable.

Copilot AI review requested due to automatic review settings February 21, 2026 10:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the LeRobot dataset recorder to improve its flexibility and data collection capabilities. The main goal is to remove the hardcoded observation configuration from robot_meta and instead directly use the environment's observation space, enabling automatic recording of additional data types like object poses, velocities, masses, and robot joint velocities and forces.

Changes:

  • Refactored dataset feature building to use environment observation space directly instead of robot_meta configuration
  • Added support for recording additional observation types including object velocities (new function), robot qvel, and qf
  • Enhanced observation function get_rigid_object_pose with optional matrix/quaternion return format

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
embodichain/lab/gym/utils/gym_utils.py Added torch.uint8 and torch.uint16 type mappings to support additional image/data formats
embodichain/lab/gym/envs/managers/observations.py Added to_matrix parameter to get_rigid_object_pose and new get_rigid_object_velocity function for extended data recording
embodichain/lab/gym/envs/managers/datasets.py Major refactoring: removed robot_meta observation config dependency, rebuilt feature construction to use env observation space, added extra episode metadata support, improved progress feedback with tqdm
embodichain/lab/gym/envs/embodied_env.py Moved dataset_manager initialization to init after observation space is available, removed unnecessary warning message
embodichain/lab/gym/envs/base_env.py Simplified observation space properties by removing num_envs==1 special case, adjusted reset order to set _elapsed_steps after episode initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 21, 2026 11:58
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +413 to +420
color_data = obs["sensor"][sensor_name]["color"]
color_img = color_data[:, :, :3].cpu()
frame[f"{sensor_name}.color"] = color_img

if is_stereo:
color_right_data = obs["sensor"][camera_name]["color_right"]
if isinstance(color_right_data, torch.Tensor):
color_right_img = color_right_data[:, :, :3].cpu().numpy()
else:
color_right_img = np.array(color_right_data)[:, :, :3]

if color_right_img.dtype in [np.float32, np.float64]:
color_right_img = (color_right_img * 255).astype(np.uint8)

frame[get_right_name(camera_name)] = color_right_img
color_right_data = obs["sensor"][sensor_name]["color_right"]
color_right_img = color_right_data[:, :, :3].cpu()
frame[f"{sensor_name}.color_right"] = color_right_img
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Images are now stored as tensors on CPU instead of numpy arrays. LeRobot datasets typically expect numpy uint8 arrays for images. The tensors returned from .cpu() will be torch tensors, which may not be compatible with LeRobot's internal image processing. Consider converting to numpy and ensuring uint8 dtype: color_img.cpu().numpy().astype(np.uint8) if the tensor is normalized float, or just .cpu().numpy() if already uint8.

Copilot uses AI. Check for mistakes.
Comment on lines +376 to +386
names = key
if "vel" in key:
names = ["lin_x", "lin_y", "lin_z", "ang_x", "ang_y", "ang_z"]
elif "pose" in key:
names = ["x", "y", "z", "qw", "qx", "qy", "qz"]

features[f"observation.{key}"] = {
"dtype": str(space.dtype),
"shape": space.shape,
"names": names,
}
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature names assignment uses a simple string for observation keys (line 376), but specific patterns for 'vel' and 'pose' keys (lines 377-380). This heuristic approach could lead to incorrect feature names if observation keys don't follow these exact naming patterns. Consider documenting the expected naming conventions or making this more robust by checking the shape of the observation space to determine appropriate names.

Copilot uses AI. Check for mistakes.
Comment on lines +342 to +360
for frame_name, space in value.items():
# TODO: Support depth (uint16) and mask (also uint16 or uint8)
if frame_name not in ["color", "color_right"]:
logger.log_error(
f"Only support 'color' frame for vision sensors, but got '{frame_name}' in sensor '{sensor_name}'"
)

features[f"{sensor_name}.{frame_name}"] = {
"dtype": "video" if self.use_videos else "image",
"shape": (sensor.cfg.height, sensor.cfg.width, 3),
"names": ["height", "width", "channel"],
}

if is_stereo:
features[f"{sensor_name}.{frame_name}_right"] = {
"dtype": "video" if self.use_videos else "image",
"shape": (sensor.cfg.height, sensor.cfg.width, 3),
"names": ["height", "width", "channel"],
}
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stereo camera handling logic appears incorrect. When iterating over frames at line 342, for stereo cameras the observation space will contain both 'color' and 'color_right'. The code then checks for stereo at line 355 and tries to add a '_right' suffix, which would create 'sensor_name.color_right' again (redundant) and also create an incorrect 'sensor_name.color_right_right' when frame_name is already 'color_right'. The stereo check should skip the _right suffix addition if frame_name already contains '_right', or the loop should only process 'color' frames and handle right frames separately.

Copilot uses AI. Check for mistakes.
Comment on lines +435 to +437
action = action[self._joint_ids]
if isinstance(action, torch.Tensor):
action_data = action.cpu().numpy()
elif isinstance(action, np.ndarray):
action_data = action
action_data = action.cpu()
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action indexing on line 435 assumes action is a tensor, but the following code checks if action is a torch.Tensor or dict. If action is already a dict, attempting to index it with self._joint_ids will raise a TypeError. The indexing should only happen after verifying action is a tensor, or should be done after extracting the tensor from the dict.

Copilot uses AI. Check for mistakes.
Comment on lines 438 to 453
@@ -372,13 +448,7 @@ def _convert_frame_to_lerobot(
break

if isinstance(action_tensor, torch.Tensor):
action_data = action_tensor.cpu().numpy()
elif isinstance(action_tensor, np.ndarray):
action_data = action_tensor
else:
action_data = np.array(action_tensor)
else:
action_data = np.array(action)
action_data = action_tensor.cpu()

frame["action"] = action_data
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action_data variable is used in line 453 but may be undefined if action_tensor is None or not a torch.Tensor. If the action is a dict and no valid tensor is found, or if action_tensor is not a torch.Tensor, the code will raise an UnboundLocalError. Add an else clause or error handling to ensure action_data is always defined before use.

Copilot uses AI. Check for mistakes.
Comment on lines +423 to +432
frame["observation.qpos"] = obs["robot"]["qpos"][self._joint_ids].cpu()
frame["observation.qvel"] = obs["robot"]["qvel"][self._joint_ids].cpu()
frame["observation.qf"] = obs["robot"]["qf"][self._joint_ids].cpu()

# Add extra observation features if they exist
for key in obs:
if key in ["robot", "sensor"]:
continue

frame["observation.state"] = state_data
frame[f"observation.{key}"] = obs[key].cpu()
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robot observation data (qpos, qvel, qf) and extra observations are stored as torch tensors on CPU instead of numpy arrays. LeRobot datasets expect numpy arrays for all data. These tensors should be converted to numpy arrays using .cpu().numpy() instead of just .cpu() to ensure compatibility with LeRobot's dataset format.

Copilot uses AI. Check for mistakes.
@@ -405,8 +406,6 @@ def _initialize_episode(
mode="save",
env_ids=successful_env_ids_tensor,
)
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed log message "No successful episodes to save" provided useful debugging information when save operations are triggered but no data needs to be saved. Consider adding equivalent logging in the parent if-statement to maintain visibility into save operations, or document why this logging was removed.

Suggested change
)
)
else:
# Log when a save is requested but there are no successful episodes
logger.debug(
"Save requested but no successful episodes to save for env_ids=%s",
env_ids_to_process,
)

Copilot uses AI. Check for mistakes.
]

if successful_env_ids:

Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra blank line at line 400 appears to be an unintentional formatting change. Consider removing it to maintain consistent code formatting.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines 579 to +581
# Reset hook for user to perform any custom reset logic.
self._initialize_episode(reset_ids, **options)
self._elapsed_steps[reset_ids] = 0
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reset of _elapsed_steps was moved to after _initialize_episode. This changes the semantics: previously, elapsed_steps would be 0 during _initialize_episode for the reset environments. Now it retains the old value during _initialize_episode and only gets reset afterwards. This could affect any logic in _initialize_episode that depends on elapsed_steps being 0 for newly reset environments.

Suggested change
# Reset hook for user to perform any custom reset logic.
self._initialize_episode(reset_ids, **options)
self._elapsed_steps[reset_ids] = 0
# Reset elapsed steps for the environments being reset before initialization.
self._elapsed_steps[reset_ids] = 0
# Reset hook for user to perform any custom reset logic.
self._initialize_episode(reset_ids, **options)

Copilot uses AI. Check for mistakes.
@yuecideng yuecideng merged commit b2c08ef into main Feb 21, 2026
5 checks passed
@yuecideng yuecideng deleted the yueci/update-lerobot-recorder branch February 21, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant