add video saving and uploading support to `train_*` scripts by yawen-d · Pull Request #524 · HumanCompatibleAI/imitation

yawen-d · 2022-08-11T01:16:59Z

Description

Closes #523.

Problem

I personally found logging videos during training is really useful as another dimension of explaining experiment results.
Concretely, this issue advocates for adding support for saving videos of policies on a environment for evaluation during and after training, including scripts.train_rl, scripts.train_preference_comparisons, scripts.train_adversarial and scripts.train_bc.
Also, it would be nice to add support for uploading saved videos to Weights & Biases during and after training.

Solution

Write a record_and_save_video() function in imitation.util.video_wrapper that takes in a policy, eval_venv, and a logger to save the video of a policy evaluated on an environment to a designated path.

def record_and_save_video(
    output_dir: str,
    policy: policies.BasePolicy,
    eval_venv: vec_env.VecEnv,
    video_kwargs: Mapping[str, Any],
    logger: Optional[sb_logger.Logger] = None,
) -> None:
    ...

Upload the video to weights & biases within WandbOutputFormat.write() by adding the following:

if key != "video":
    self.wandb_module.log({key: value}, step=step)
else:
    self.wandb_module.log({"video": self.wandb_module.Video(value)})

Testing

Add video_saving tests in tests/scripts/test_scripts.py
Add video uploading test in tests/util/test_wb_logger.py

AdamGleave

Took a quick look, only skimmed as still in draft mode. Seems like a useful feature, couple of suggestions.

src/imitation/util/video_wrapper.py

src/imitation/scripts/common/train.py

…eo-saving-during-training

codecov · 2022-08-23T06:35:46Z

Codecov Report

Merging #524 (9b9ea3d) into master (de36306) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head 9b9ea3d differs from pull request most recent head 5ca705f. Consider uploading reports for the commit 5ca705f to get more accurate results

@@            Coverage Diff             @@
##           master     #524      +/-   ##
==========================================
- Coverage   96.95%   96.93%   -0.03%     
==========================================
  Files          84       84              
  Lines        7460     7369      -91     
==========================================
- Hits         7233     7143      -90     
+ Misses        227      226       -1

Impacted Files	Coverage Δ
src/imitation/algorithms/preference_comparisons.py	`98.98% <100.00%> (-0.19%)`	⬇️
src/imitation/scripts/common/common.py	`97.22% <100.00%> (ø)`
src/imitation/scripts/common/train.py	`100.00% <100.00%> (ø)`
...ion/scripts/config/train_preference_comparisons.py	`84.72% <100.00%> (-0.62%)`	⬇️
src/imitation/scripts/train_adversarial.py	`96.29% <100.00%> (+1.62%)`	⬆️
src/imitation/scripts/train_imitation.py	`94.11% <100.00%> (+0.17%)`	⬆️
.../imitation/scripts/train_preference_comparisons.py	`98.38% <100.00%> (+0.02%)`	⬆️
src/imitation/scripts/train_rl.py	`100.00% <100.00%> (ø)`
src/imitation/util/logger.py	`100.00% <100.00%> (ø)`
src/imitation/util/video_wrapper.py	`100.00% <100.00%> (ø)`
... and 10 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

yawen-d · 2022-08-23T07:13:13Z

src/imitation/scripts/train_rl.py

+                )
+                callback_objs.append(save_policy_callback)
+
+                if _config["train"]["videos"]:


Here we need to init a video_wrapper.SaveVideoCallback instead of using train.save_video like other scripts do. A bit unsatisfying.

An alternative could be passing a save_video partial function into the callback.

Yes, this is strange, why is that the case? I would advocate for using the callback class everywhere or using a partial / closure+wrapper defined in this file for this specific instance. Currently the existence of the class is confusing and not documented.

yawen-d · 2022-08-23T07:39:37Z

src/imitation/scripts/train_rl.py

-        rl_algo.set_logger(custom_logger)
-        rl_algo.learn(total_timesteps, callback=callback)
-
+        with common.make_venv(num_vec=1, log_dir=None) as eval_venv:


Create an eval_venv

with num_vec=1.

without having creating monitors from here by setting log_dir=None.

…eo-saving-during-training

AdamGleave · 2022-09-02T05:44:54Z

I'm still a bit backlogged, @Rocamonde could you review this please?

Rocamonde · 2022-09-03T09:22:23Z

src/imitation/scripts/config/train_preference_comparisons.py

    total_timesteps = int(1e6)  # total number of environment timesteps
    total_comparisons = 5000  # total number of comparisons to elicit
-    num_iterations = 5  # Arbitrary, should be tuned for the task
+    num_iterations = 50  # Arbitrary, should be tuned for the task


Apologies if this has been discussed, but why are you doing this?

Rocamonde · 2022-09-03T09:23:10Z

src/imitation/scripts/config/train_preference_comparisons.py

    cross_entropy_loss_kwargs = {}
    reward_trainer_kwargs = {
        "epochs": 3,
+        "weight_decay": 0.0,


I'll have to remember changing this as I have a PR that replaces weight decay with a general regularization API (#481). @AdamGleave what do you think, should we merge my PR or this one first?

Probably best to merge your PR first, though really depends which one is ready earlier.

#481 is ready and passing all the tests AFAIK.

Thanks for proposing #481. #481 seems to be the feature wanted. I'll make changes accordingly.

Rocamonde · 2022-09-03T10:29:05Z

src/imitation/scripts/train_rl.py

+                )
+                callback_objs.append(save_policy_callback)
+
+                if _config["train"]["videos"]:


Yes, this is strange, why is that the case? I would advocate for using the callback class everywhere or using a partial / closure+wrapper defined in this file for this specific instance. Currently the existence of the class is confusing and not documented.

Rocamonde · 2022-09-03T11:00:05Z

src/imitation/algorithms/preference_comparisons.py

        total_timesteps: int,
        total_comparisons: int,
-        callback: Optional[Callable[[int], None]] = None,
+        callback: Optional[Callable[[int, int], None]] = None,


Probably should add in the docstring what the callback type signature represents.

Rocamonde · 2022-09-03T11:02:35Z

src/imitation/scripts/common/train.py



+@train_ingredient.capture
+def save_video(


When you call this function it self-documents as if the video were always saved. (but a flag indicating whether this should happen is magically injected through a decorator). I don't have an immediately better alternative, but perhaps a more explanatory function name could help.

Rocamonde · 2022-09-03T11:04:17Z

src/imitation/scripts/train_adversarial.py

+    round_str: str,
+) -> None:
    """Save discriminator and generator."""
+    save_path = os.path.join(log_dir, "checkpoints", round_str)


I have a PR for replacing os.path with pathlib in most places, but might as well keep it consistent for now until that's merged.

Rocamonde · 2022-09-03T11:07:56Z

src/imitation/util/video_wrapper.py

        """
        super().__init__(env)
-        self.episode_id = 0
+        self._episode_id = 0


Why make it private?

Rocamonde · 2022-09-03T11:12:45Z

src/imitation/util/video_wrapper.py

+        directory=video_dir,
+        **(video_kwargs or dict()),
+    )
+    sample_until = rollout.make_sample_until(min_timesteps=None, min_episodes=1)


I understand where the name of this function is coming from ("make the function called sample_until"), but how it actually reads IMO is "make the sample (until...?)". I think that refactoring this to something like "get_stopping_conditions_callback" or "get_sampling_termination_fn" would be much more readable.

Rocamonde · 2022-09-03T11:15:29Z

src/imitation/util/video_wrapper.py

+    sample_until = rollout.make_sample_until(min_timesteps=None, min_episodes=1)
+    # video.{:06}.mp4".format(VideoWrapper.episode_id) will be saved within
+    # rollout.generate_trajectories()
+    rollout.generate_trajectories(policy, video_venv, sample_until)


For some reason I was expecting that the video that would be saved would be one of the real training trajectories instead of a newly sampled one.

AdamGleave · 2022-10-28T22:01:31Z

Closing in favor of #597

add video saving and uploading support to add train_* scripts

00592b2

yawen-d changed the title ~~add video saving and uploading support to add train_* scripts~~ add video saving and uploading support to train_* scripts Aug 11, 2022

add tests for video saving to test_scripts.py

0f8c7d7

AdamGleave reviewed Aug 18, 2022

View reviewed changes

src/imitation/util/video_wrapper.py Outdated Show resolved Hide resolved

src/imitation/util/video_wrapper.py Outdated Show resolved Hide resolved

src/imitation/util/video_wrapper.py Outdated Show resolved Hide resolved

src/imitation/scripts/common/train.py Show resolved Hide resolved

yawen-d added 4 commits August 22, 2022 22:06

add train.save_video to simplify video saving activities

904430e

add minor comment

e8ab769

Merge remote-tracking branch 'origin/master' into yawen-d/feature/vid…

1bc1946

…eo-saving-during-training

fix bugs and code format

b5daea6

add test coverage

0d10793

yawen-d marked this pull request as ready for review August 23, 2022 07:11

yawen-d requested a review from AdamGleave August 23, 2022 07:11

yawen-d commented Aug 23, 2022

View reviewed changes

Merge remote-tracking branch 'origin/master' into yawen-d/feature/vid…

9b9ea3d

…eo-saving-during-training

AdamGleave requested a review from Rocamonde September 2, 2022 05:44

Rocamonde reviewed Sep 3, 2022

View reviewed changes

yawen-d added 2 commits September 6, 2022 21:55

Merge branch 'master' into yawen-d/feature/video-saving-during-training

5ca705f

Merge branch 'master' into yawen-d/feature/video-saving-during-training

9563a27

AdamGleave mentioned this pull request Oct 25, 2022

Add support for saving videos of policies on a environment for evaluation during and after training #523

Open

AdamGleave closed this Oct 28, 2022

AdamGleave deleted the yawen-d/feature/video-saving-during-training branch November 3, 2022 22:59

Conversation

yawen-d commented Aug 11, 2022 • edited by dfilan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Testing

Uh oh!

AdamGleave left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Aug 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yawen-d Aug 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AdamGleave commented Sep 2, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AdamGleave commented Oct 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yawen-d commented Aug 11, 2022 •

edited by dfilan

Loading

codecov bot commented Aug 23, 2022 •

edited

Loading

yawen-d Aug 23, 2022 •

edited

Loading