Update lerobot recorder#145

Merged

yuecideng merged 10 commits intomainfrom

yueci/update-lerobot-recorder

Feb 21, 2026

Contributor

yuecideng commented Feb 21, 2026 •

edited

Loading

Description

This PR update LerobotDataset Recorder in the following aspects:

Remove observation key in lerobot dataset functor param. Now we use observation space directly from env.
Support record more data (obj pose, vel, mass, robot qvel and qf)

Type of change

Enhancement (non-breaking change which improves an existing functionality)
New feature (non-breaking change which adds functionality)

Checklist

I have run the black . command to format the code base.
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
Dependencies have been updated, if applicable.

yuecideng added 8 commits

February 19, 2026 09:22

wip

5a18d94

wip

78c6495


          Merge remote-tracking branch 'origin/main' into yueci/physics-step

3f2b303

wip

536c17a

wip

2cac45f


          Merge branch 'main' into yueci/update-lerobot-recorder

3d58d98

wip

b659ad1

wip

a1c2e4b

Copilot AI review requested due to automatic review settings

February 21, 2026 10:57

yuecideng added the dataset label

Copilot started reviewing on behalf of yuecideng

February 21, 2026 10:58

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull request overview

This PR refactors the LeRobot dataset recorder to improve its flexibility and data collection capabilities. The main goal is to remove the hardcoded observation configuration from robot_meta and instead directly use the environment's observation space, enabling automatic recording of additional data types like object poses, velocities, masses, and robot joint velocities and forces.

Changes:

Refactored dataset feature building to use environment observation space directly instead of robot_meta configuration
Added support for recording additional observation types including object velocities (new function), robot qvel, and qf
Enhanced observation function get_rigid_object_pose with optional matrix/quaternion return format

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
embodichain/lab/gym/utils/gym_utils.py	Added torch.uint8 and torch.uint16 type mappings to support additional image/data formats
embodichain/lab/gym/envs/managers/observations.py	Added to_matrix parameter to get_rigid_object_pose and new get_rigid_object_velocity function for extended data recording
embodichain/lab/gym/envs/managers/datasets.py	Major refactoring: removed robot_meta observation config dependency, rebuilt feature construction to use env observation space, added extra episode metadata support, improved progress feedback with tqdm
embodichain/lab/gym/envs/embodied_env.py	Moved dataset_manager initialization to init after observation space is available, removed unnecessary warning message
embodichain/lab/gym/envs/base_env.py	Simplified observation space properties by removing num_envs==1 special case, adjusted reset order to set _elapsed_steps after episode initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

embodichain/lab/gym/envs/embodied_env.py Show resolved Hide resolved

embodichain/lab/gym/envs/base_env.py Show resolved Hide resolved

embodichain/lab/gym/envs/managers/datasets.py Outdated Show resolved Hide resolved

embodichain/lab/gym/envs/managers/observations.py Outdated Show resolved Hide resolved

embodichain/lab/gym/envs/managers/observations.py Show resolved Hide resolved

embodichain/lab/gym/envs/managers/datasets.py Show resolved Hide resolved

embodichain/lab/gym/envs/managers/datasets.py Show resolved Hide resolved

embodichain/lab/gym/envs/managers/observations.py Outdated Show resolved Hide resolved

embodichain/lab/gym/envs/managers/datasets.py Outdated Show resolved Hide resolved

embodichain/lab/gym/envs/managers/datasets.py Show resolved Hide resolved

wip

f2f9fd0

Copilot AI review requested due to automatic review settings

February 21, 2026 11:58

Copilot started reviewing on behalf of yuecideng

February 21, 2026 11:58

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

embodichain/lab/gym/envs/managers/datasets.py

Comment on lines +413 to +420

+                              color_data = obs["sensor"][sensor_name]["color"]
+                              color_img = color_data[:, :, :3].cpu()
+                              frame[f"{sensor_name}.color"] = color_img
                               if is_stereo:
-                                  color_right_data = obs["sensor"][camera_name]["color_right"]
-                                  if isinstance(color_right_data, torch.Tensor):
-                                      color_right_img = color_right_data[:, :, :3].cpu().numpy()
-                                  else:
-                                      color_right_img = np.array(color_right_data)[:, :, :3]
-                                  if color_right_img.dtype in [np.float32, np.float64]:
-                                      color_right_img = (color_right_img * 255).astype(np.uint8)
-                                  frame[get_right_name(camera_name)] = color_right_img
+                                  color_right_data = obs["sensor"][sensor_name]["color_right"]
+                                  color_right_img = color_right_data[:, :, :3].cpu()
+                                  frame[f"{sensor_name}.color_right"] = color_right_img

Copilot AI Feb 21, 2026

Images are now stored as tensors on CPU instead of numpy arrays. LeRobot datasets typically expect numpy uint8 arrays for images. The tensors returned from .cpu() will be torch tensors, which may not be compatible with LeRobot's internal image processing. Consider converting to numpy and ensuring uint8 dtype: color_img.cpu().numpy().astype(np.uint8) if the tensor is normalized float, or just .cpu().numpy() if already uint8.

Copilot uses AI. Check for mistakes.

embodichain/lab/gym/envs/managers/datasets.py

Comment on lines +376 to +386

+                          names = key
+                          if "vel" in key:
+                              names = ["lin_x", "lin_y", "lin_z", "ang_x", "ang_y", "ang_z"]
+                          elif "pose" in key:
+                              names = ["x", "y", "z", "qw", "qx", "qy", "qz"]
+                          features[f"observation.{key}"] = {
+                              "dtype": str(space.dtype),
+                              "shape": space.shape,
+                              "names": names,
+                          }

Copilot AI Feb 21, 2026

The feature names assignment uses a simple string for observation keys (line 376), but specific patterns for 'vel' and 'pose' keys (lines 377-380). This heuristic approach could lead to incorrect feature names if observation keys don't follow these exact naming patterns. Consider documenting the expected naming conventions or making this more robust by checking the shape of the observation space to determine appropriate names.

Copilot uses AI. Check for mistakes.

embodichain/lab/gym/envs/managers/datasets.py

Comment on lines +342 to +360

+                              for frame_name, space in value.items():
+                                  # TODO: Support depth (uint16) and mask (also uint16 or uint8)
+                                  if frame_name not in ["color", "color_right"]:
+                                      logger.log_error(
+                                          f"Only support 'color' frame for vision sensors, but got '{frame_name}' in sensor '{sensor_name}'"
+                                      )
+                                  features[f"{sensor_name}.{frame_name}"] = {
+                                      "dtype": "video" if self.use_videos else "image",
+                                      "shape": (sensor.cfg.height, sensor.cfg.width, 3),
+                                      "names": ["height", "width", "channel"],
+                                  }
+                                  if is_stereo:
+                                      features[f"{sensor_name}.{frame_name}_right"] = {
+                                          "dtype": "video" if self.use_videos else "image",
+                                          "shape": (sensor.cfg.height, sensor.cfg.width, 3),
+                                          "names": ["height", "width", "channel"],
+                                      }

Copilot AI Feb 21, 2026

The stereo camera handling logic appears incorrect. When iterating over frames at line 342, for stereo cameras the observation space will contain both 'color' and 'color_right'. The code then checks for stereo at line 355 and tries to add a '_right' suffix, which would create 'sensor_name.color_right' again (redundant) and also create an incorrect 'sensor_name.color_right_right' when frame_name is already 'color_right'. The stereo check should skip the _right suffix addition if frame_name already contains '_right', or the loop should only process 'color' frames and handle right frames separately.

Copilot uses AI. Check for mistakes.

embodichain/lab/gym/envs/managers/datasets.py

Comment on lines +435 to +437

+                      action = action[self._joint_ids]
                       if isinstance(action, torch.Tensor):
-                          action_data = action.cpu().numpy()
-                      elif isinstance(action, np.ndarray):
-                          action_data = action
+                          action_data = action.cpu()

Copilot AI Feb 21, 2026

The action indexing on line 435 assumes action is a tensor, but the following code checks if action is a torch.Tensor or dict. If action is already a dict, attempting to index it with self._joint_ids will raise a TypeError. The indexing should only happen after verifying action is a tensor, or should be done after extracting the tensor from the dict.

Copilot uses AI. Check for mistakes.

embodichain/lab/gym/envs/managers/datasets.py

Comment on lines 438 to 453

@@ @@ -372,13 +448,7 @@ def _convert_frame_to_lerobot( @@
                                       break
                           if isinstance(action_tensor, torch.Tensor):
-                              action_data = action_tensor.cpu().numpy()
-                          elif isinstance(action_tensor, np.ndarray):
-                              action_data = action_tensor
-                          else:
-                              action_data = np.array(action_tensor)
-                      else:
-                          action_data = np.array(action)
+                              action_data = action_tensor.cpu()
                       frame["action"] = action_data

Copilot AI Feb 21, 2026

The action_data variable is used in line 453 but may be undefined if action_tensor is None or not a torch.Tensor. If the action is a dict and no valid tensor is found, or if action_tensor is not a torch.Tensor, the code will raise an UnboundLocalError. Add an else clause or error handling to ensure action_data is always defined before use.

Copilot uses AI. Check for mistakes.

embodichain/lab/gym/envs/managers/datasets.py

Comment on lines +423 to +432

    
                      frame["observation.qpos"] = obs["robot"]["qpos"][self._joint_ids].cpu()

                      frame["observation.qvel"] = obs["robot"]["qvel"][self._joint_ids].cpu()

                      frame["observation.qf"] = obs["robot"]["qf"][self._joint_ids].cpu()

                      # Add extra observation features if they exist

                      for key in obs:

                          if key in ["robot", "sensor"]:

                              continue

                      frame["observation.state"] = state_data

                          frame[f"observation.{key}"] = obs[key].cpu()

Copilot AI Feb 21, 2026

Robot observation data (qpos, qvel, qf) and extra observations are stored as torch tensors on CPU instead of numpy arrays. LeRobot datasets expect numpy arrays for all data. These tensors should be converted to numpy arrays using .cpu().numpy() instead of just .cpu() to ensure compatibility with LeRobot's dataset format.

Copilot uses AI. Check for mistakes.

embodichain/lab/gym/envs/embodied_env.py

@@ @@ -405,8 +406,6 @@ def _initialize_episode( @@
                                       mode="save",
                                       env_ids=successful_env_ids_tensor,
                                   )

Copilot AI Feb 21, 2026

The removed log message "No successful episodes to save" provided useful debugging information when save operations are triggered but no data needs to be saved. Consider adding equivalent logging in the parent if-statement to maintain visibility into save operations, or document why this logging was removed.

Suggested change

      
                                )
          
                                )
          
                            else:
          
                                # Log when a save is requested but there are no successful episodes
          
                                logger.debug(
          
                                    "Save requested but no successful episodes to save for env_ids=%s",
          
                                    env_ids_to_process,
          
                                )

Copilot uses AI. Check for mistakes.

embodichain/lab/gym/envs/embodied_env.py

		]

		if successful_env_ids:

Copilot AI Feb 21, 2026

The extra blank line at line 400 appears to be an unintentional formatting change. Consider removing it to maintain consistent code formatting.

Suggested change

Copilot uses AI. Check for mistakes.

embodichain/lab/gym/envs/base_env.py

Comment on lines 579 to +581

                       # Reset hook for user to perform any custom reset logic.
                       self._initialize_episode(reset_ids, **options)
+                      self._elapsed_steps[reset_ids] = 0

Copilot AI Feb 21, 2026

The reset of _elapsed_steps was moved to after _initialize_episode. This changes the semantics: previously, elapsed_steps would be 0 during _initialize_episode for the reset environments. Now it retains the old value during _initialize_episode and only gets reset afterwards. This could affect any logic in _initialize_episode that depends on elapsed_steps being 0 for newly reset environments.

Suggested change

      
                    # Reset hook for user to perform any custom reset logic.
          
                    self._initialize_episode(reset_ids, **options)
          
                    self._elapsed_steps[reset_ids] = 0
          
                    # Reset elapsed steps for the environments being reset before initialization.
          
                    self._elapsed_steps[reset_ids] = 0
          
                    # Reset hook for user to perform any custom reset logic.
          
                    self._initialize_episode(reset_ids, **options)

Copilot uses AI. Check for mistakes.

wip

4926e1f

yuecideng merged commit b2c08ef into main

5 checks passed

yuecideng deleted the yueci/update-lerobot-recorder branch

February 21, 2026 13:46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels