Tucker Hermans, Robotics Ph.D Student
January 29, 2013 (Tuesday)
Autonomous robots deployed in complex, natural human environments such as homes and offices need to manipulate numerous objects throughout their lifetimes. For an autonomous robot to operate effectively in such a setting and not require excessive training on part of a human operator, it should be capable of discovering how to reliably manipulate novel objects in the environment. We characterize the possible methods by which a robot can act on an object using the concept of affordances. Psychologist J.J. Gibson originally defined affordances as the action possibilities available in the environment to an agent. In the context of this work we define affordance-based behaviors as object manipulation strategies available to the robot, which correspond to specific semantic actions over which a task-level planner or end user of the robot can operate.
This thesis proposal concerns itself with developing the representation of these affordance-based behaviors and learning algorithms which make use of these representations. Specifically we identify two learning problems. The first asks which affordance-based behaviors the robot can successfully apply to a given object, including ones seen for the first time. Second, we investigate the problem of improving over time the manipulation capabilities of the robot when operating on a specific object.
We claim that by decomposing affordance-based behaviors into three separate components: a control policy, a perceptual proxy, and a behavior primitive an autonomous robot can efficiently transfer knowledge of successful manipulation strategies between different objects, as well as between qualitatively different affordance-based manipulation behaviors on a single object. By sharing information between objects of similar shape and appearance we can more quickly discover which affordance-based behaviors perform effectively on a novel object. We can further accelerate affordance learning by transferring knowledge about objects which produce similar outcomes when manipulated. Additionally for a single object knowing that the robot successfully performed an action with a given component, say a specific perceptual proxy, then this component can first be used when attempting different affordance-based behaviors, say with a different behavior primitive and controller. Finally policy-gradient reinforcement learning methods enable the robot to improve the performance of particular control policies, allowing for specialized adaptation to specific objects over time.