Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Source Code Paper

Abstract

Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task distributions that are very narrow. For example, a commonly used meta-reinforcement learning benchmark uses different running velocities for a simulated robot as different tasks. When policies are meta-trained on such narrow task distributions, they cannot possibly generalize to more quickly acquire entirely new tasks. Therefore, if the aim of these methods is enable faster acquisition of entirely new behaviors, we must evaluate them on task distributions that are sufficiently broad to enable generalization to new behaviors. In this paper, we propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation environments, with the aim of making it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks. We evaluate 7 state-of-the-art meta-reinforcement learning and multi-task learning algorithms on these tasks. Surprisingly, while each task and its variations (e.g., with different object positions) can be learned with reasonable success, these algorithms struggle to learn with multiple tasks at the same time, even with as few as nine distinct training tasks. Our analysis and open-source environments pave the way for future research in multi-task learning and meta-learning that can enable meaningful generalization, thereby unlocking the full potential of these methods.


About

In multi-task learning and meta-learning, the goal is not just to learn one skill, but to learn a number of skills. In multi-task RL, we assume that we want to learn a fixed set of skills with minimal data, while in meta-RL, we want to use experience from a set of skills such that we can learn to solve new skills quickly.

To evaluate state of the art multi-task and meta-learning algorithms, we need a diverse yet structured set of skills to evaluate them on. The Meta-World benchmark contains 50 manipulation tasks, designed to be diverse yet carry shared structure that can be leveraged for efficient multi-task RL and transfer to new tasks via meta-RL. The Meta-World benchmark has three different difficulty modes for evaluation, described next.

Meta-Learning 1 (ML1)

Meta-Learning 1 (ML1) is our easiest meta-learning evaluation mode. The setting is a single task manipulation tasks, where we try to reach, push, and pick and place an object to variable goals. At test time, we present goals not seen during training.

Multi-Task 1 (MT1)

Multi-Task 1 (MT1) is our easiest evaluation mode. The setting is a single task manipulation tasks, where we try to reach, push, and pick and place an object to variable goals, without testing generalization.

Meta-Learning 10 (ML10)

ML10 is a harder meta-learning task, where we train on 10 manipulation tasks, and are given 5 new ones at test time.

Multi-Task 10 (MT10)

MT10 tests multi-task learning- that is, simply learning a policy that can succeed on a diverse set of tasks, without testing generalization.

Meta-Learning 45 (ML45)

ML45 is a meta-learning mode with 45 train environments and 5 test environments.

Multi-Task 50 (MT50)

Finally, MT50 evaluates the ability to efficiently learn all 50 of the above manipulation environments.

Authors

Tianhe Yu
Stanford University

Deirdre Quillen
UC Berkeley

Zhanpeng He
Columbia University

Ryan Julian
University Of Southern California

Avnish Narayan
University Of Southern California

Hayden Shively
University Of Southern California

Adithya Bellathur
University Of Southern California

Karol Hausman
Robotics at Google

Chelsea Finn
Stanford University

Sergey Levine
UC Berkeley