MIT Researchers Use LLMs to Enhance Robots’ Comprehension of Vague Instructions

Envision a future scenario in a warehouse or office where you are tasked with training a new robotic colleague. The process involves demonstrating tasks physically while explaining them. For instance, when instructing the robot to place coffee on your desk without disrupting a Zoom meeting, it’s crucial to train it with clear data on the entire task to prevent it from getting too close to you or the laptop.

Traditionally, explaining manipulation tasks to robots has involved numerous demonstrations or detailed written instructions. Without both, robots may misinterpret tasks. To address this, MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers have automated the teaching process. Their “Masked Inverse Reinforcement Learning” (Masked IRL) method uses a large language model (LLM) to clarify ambiguous instructions from demonstration data, significantly reducing the amount of data needed.

MIT PhD student and lead author Minyoung Hwang explains that this method minimizes human effort by helping robots understand user intentions without detailed instructions. Masked IRL assists robots in navigating environments where critical elements may not be explicitly stated, such as avoiding obstacles while fetching items.

To facilitate learning, Masked IRL employs the robot’s sensors to record its surroundings and log movements from kinesthetic demonstrations, where a human guides the robot through tasks. The system uses an LLM to compare motion sequences to the optimal path, clarifying prompts to enhance understanding.

A second LLM assesses environmental details, ignoring irrelevant aspects and prioritizing essential ones. This masking technique allowed Masked IRL to outperform other methods by teaching robots to prioritize information, enabling them to skillfully navigate around obstacles and address user preferences more accurately.

CSAIL researchers found Masked IRL required fewer demonstrations to learn tasks such as moving a mug, and robots performed better with clearer instructions. This approach also worked effectively in real-world scenarios, with robots executing tasks not seen during training, like moving a cup without hitting a computer or delivering chips while avoiding humans and tables.

Future enhancements may include equipping robots with cameras to dynamically assess and focus on specific elements in their environment. This initiative, supported by the Tata Group and the Department of Defense, will be presented at the 2026 IEEE International Conference on Robotics and Automation.

Original Source: news.mit.edu

Leave a Reply

Your email address will not be published. Required fields are marked *