Selective Perception

Optimizing State Descriptions with Reinforcement Learning for Language Model Actors

Kolby Nottingham Yasaman Razeghi Kyungmin Kim

JB Lanier Pierre Baldi Roy Fox Sameer Singh

Large language models (LLMs) have been widely applied as actors for sequential decision making tasks such as robotics and games, utilizing their general world knowledge and planning abilities. However, previous work does little to explore what environment state information is provided to LLM actors via language. Exhaustively describing high-dimensional states can cause impaired performance and high inference costs for LLM actors. To avoid this, previous LLM actors rely on hand-engineered, task-specific protocols to determine which features to communicate about a state and which to leave out. In this work, we propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions by learning a value function for optimal task-conditioned state feature sets. We evaluate BLINDER on the challenging videogame NetHack and a robotic manipulation task, improving LLM actor success rate while reducing the size of LLM input.

Paper» Actor Code» BLINDER Code (Coming Soon)» Video»

BLINDER

We present BLINDER, a method for automatically selecting state descriptions from a set of state features for LLM actors. Rather than use manually constructed language observations that require painstaking prompt engineering, we propose using a learned value function for state descriptions. Using this value function and the set of all language features for the current state, BLINDER constructs inputs for an LLM actor to maximize task performance. Resulting state descriptions are relevant and intuitive, generalize well, and improve performance and computational efficiency.

State Value Fuction

We annotate sampled state descriptions using the liklihood of expert actions from a frozen LLM actor. These liklihoods are used as a reward for training BLINDER's value function for state descriptions. At test time, BLINDER generalizes to new tasks and new LLM actors. We are able to train using smaller actors and then deploy with LLM actors orders of magnitude larger.

Robotic Demo

In addition to performing well on NetHack, BLINDER improves the performance of LLM actors for real world robotic tasks. We use visual object detection to label objects and spatial relationships in a scene to use as state features ("The donut is to the left and behind the ball"). We then use BLINDER to select which of these object relationships to provide to the LLM actor, given the current task ("Place the objects in the order: ball, soda, doughnut"). The LLM actor then provides high-level actions to the robot to complete the task ("Move the ball to position A"). See our paper for additional details.

Kolby Nottingham	Yasaman Razeghi	Kyungmin Kim
JB Lanier	Pierre Baldi	Roy Fox	Sameer Singh