@inproceedings{TEMP,
author = {},
title = {TEMP},
booktitle = {TEMP},
year = {2025},
}
Recent advances have showcased the opportunity of leveraging the broad semantic understanding learned by vision-language models (VLMs) in robot learning; however, connecting VLMs effectively to robot control remains an open question since physical robot data is relatively sparse and narrow compared to internet-scale VLM training data. We propose STEER, a system for bridging this gap by learning flexible, low- level manipulation skills that can be modulated or repurposed to adapt to new situations. We show that training low-level learned policies on structured, dense re-annotation of existing robot datasets exposes an intuitive and flexible interface for humans or VLMs to guide them in unfamiliar scenarios or to perform new tasks using common-sense reasoning. We demonstrate the skills learned via STEER can be combined to synthesize novel behaviors to achieve held-out tasks without additional training.
We show that we can automate STEER with an off-the-shelf VLM (in this case we use Gemini 1.5 Pro). In our experiments, we use the same system prompt as provided below. In the results section, we show the VLM outputs which are automatically parsed for code that is subsequently executed on the real robot.
You are a helpful robot with one right arm. You are equipped with a large parallel jaw gripper end-effector. You will be asked to perform different tasks that involve interacting with the objects in the workspace. You are provided with an API to execute actions in the physical world to complete the task. These are the only actions you can perform. The procedure to perform a task is as follows:
class RobotAPI(object):
def reset(self):
'''
Robot will reset, meaning it will open its gripper and return its arm to a retracted position.
'''
def grasp_object(self, object_name: str, grasp_approach: str):
'''
Robot will attempt to grasp the object using the approach specified in grasp_approach.
Args:
object_name: The name of the object to grasp. Objects should be referred to by some defining feature (e.g. color, brand, texture, etc.) and object type (e.g. cup, can, bowl, bag, etc.).
grasp_approach: One of "top-down", "from the side" or "diagonally".
"top-down" means the robot will descend from above the object and grasp. The object will be held with a vertical gripper orientation, with the fingers pointing down (i.e. 6pm on a clock).
"from the side" means the robot will approach the object from the right side and grasp. The object will be held with the fingers oriented horizontally pointing to the left (i.e. 9pm on a clock).
"diagonally" means the robot will approach the object neither perfectly top-down or from the side, the fingers will be pointed diagonally.
'''
def reorient(self, desired_gripper_orientation: str):
'''
Robot will attempt to reorient the object by turning its end-effector to the desired_gripper_orientation while maintaining its grasp on the object.
If the robot's gripper is vertical and reorients 90 degrees to horizontal, the object will also be reoriented by 90 degrees clockwise.
If the robot's gripper is horizontal and reorients 90 degrees to vertical, the object will also be reoriented by 90 degrees counterclockwise.
Args:
desired_gripper_orientation: One of "vertical" or "horizontal".
"vertical" means having its fingers on the same plane, parallel to the left and right walls, pointing straight down (i.e. 6pm on a clock).
"horizontal" means having its fingers on the same plane, parallel to the ground, and pointing to the left (i.e. 9pm on a clock).
'''
def place_object(self, object_name: str, location: str = "here"):
'''
Robot will attempt to place the object at the specified location.
Args:
object_name: The name of the object to place.
location: One of "here", "left", "right", "front", "back", "center".
Default is "here" meaning the robot will set the object straight down where the arm currently is, releasing it from its grasp.
If one of [left/right/front/back], the robot will move the object to the specified edge (or center) and then release the object there.
'''
def lift_object(self, object_name: str):
'''
Robot will maintain its grasp on the object and lift it, maintaining the x-y position and orientation of the object.
'''
I see the orange flower pot and will pick it up carefully.
Here are the steps I will take:
Here is the code:
robot_api.grasp_object(object_name="orange flower pot", grasp_approach="from the side")
robot_api.lift_object(object_name="orange flower pot")
The pot is centered on the table. I will grab the pot from the side, lift it up, and hold.
Here are the steps to complete the task:
robot_api.grasp_object(object_name='orange pot', grasp_approach='from the side')
robot_api.lift_object(object_name='orange pot')
That's it! I am now holding the orange pot.
The flower pot appears to be made of clay and is centrally located on the table. I will approach the pot carefully from the top and grab it, making sure I do not touch the plant.
Steps:
robot_api.grasp_object("orange flower pot", "top-down")
robot_api.lift_object("orange flower pot")
The robot needs to grasp the flower pot from the side, as a top-down grasp would crush the plant. Once the pot is grasped, the robot can lift it up.
robot_api.grasp_object(object_name="orange flower pot", grasp_approach="from the side")
robot_api.lift_object(object_name="orange flower pot")
robot_api.grasp_object(object_name="orange flower pot", grasp_approach="from the side")
robot_api.lift_object(object_name="orange flower pot")
The flower pot has a lip around the top, so I can grasp it from the outside. I will grasp the pot from the side, lift it slightly, then hold it up.
Steps:
robot_api.grasp_object(object_name="orange flower pot", grasp_approach="from the side")
robot_api.lift_object(object_name="orange flower pot")
That's it! The robot should now be holding up the orange flower pot.
The flower pot is centered on the table. I will grasp the pot from the side and lift it up.
Here are the steps to complete the task:
robot_api.grasp_object(object_name='orange flower pot', grasp_approach='from the side')
robot_api.lift_object(object_name='orange flower pot')
To pick up the flower pot without disturbing the plant, I need to grasp the pot firmly from the sides. Since I only have one arm, I cannot stabilize the plant from the top.
robot_api.grasp_object(object_name="orange flower pot", grasp_approach="from the side")
robot_api.lift_object(object_name="orange flower pot")
The robot should grasp the orange pot from the sides and lift it up.
Here are the steps to complete the task:
robot_api.grasp_object(object_name='orange flower pot', grasp_approach='from the side')
robot_api.lift_object(object_name='orange flower pot')
That's it! Let me know how it goes.
The robot should grasp the pot from the sides and lift it up.
Here are the steps:
robot_api.grasp_object(object_name='orange flower pot', grasp_approach='from the side')
robot_api.lift_object(object_name='orange flower pot')
The robot should be able to grasp the orange flower pot from the sides and lift it up.
robot_api.grasp_object(object_name="orange flower pot", grasp_approach="from the side")
robot_api.lift_object(object_name="orange flower pot")
The apple is in the center of the table, with a few plants surrounding it. I will grasp the apple from the top and lift it up.
Here are the steps to complete the task:
robot_api.grasp_object(object_name='red apple', grasp_approach='top-down')
robot_api.lift_object(object_name='red apple')
That's it! The apple should now be held up by the robot.
The grapes are in the center of the table, with a plant in a white pot to the left and a plant in an orange pot to the right.
Here are the steps to complete the task:
robot_api.grasp_object(object_name='grapes', grasp_approach='top-down')
robot_api.lift_object(object_name='grapes')
To hold the orange up, I need to grasp it and then lift it. Since my gripper is large, I need to make sure to approach the orange carefully to avoid hitting the nearby plants. A top-down grasp would be best to ensure a stable grip.
robot_api.grasp_object(object_name="orange", grasp_approach="top-down")
robot_api.lift_object(object_name="orange")
The robot should grasp the apple from the top and lift it up.
Here are the steps to complete the task:
robot_api.grasp_object(object_name="red apple", grasp_approach="top-down")robot_api.lift_object(object_name="red apple")Let me know if you would like to proceed with the execution.
To hold the orange up, I need to grasp it and then lift it. Since I only have one arm, I need to make sure the grasp is secure before lifting. A top-down grasp is best for this. I also need to make sure I don’t knock over the plants on either side of the orange.
robot_api.grasp_object(object_name='orange', grasp_approach='top-down')
robot_api.lift_object(object_name='orange')
I see the bunch of grapes in the center of the table, between the three white pots to the left and the white pot to the right. I will grasp the grapes from the top and lift them up.
Here are the steps to complete this task:
robot_api.grasp_object(object_name="grapes", grasp_approach="top-down")
After this step, the robot will be grasping the grapes with its gripper oriented vertically, the fingers pointing down.
robot_api.lift_object(object_name="grapes")
After this step, the robot will be holding the grapes in the air. All other objects will remain in their initial state.
To hold the apple up, I need to grasp it and then lift it. Since the apple is unobstructed from the top, a top-down grasp is best.
robot_api.grasp_object(object_name='red apple', grasp_approach='top-down')
robot_api.lift_object(object_name='red apple')
The orange is in the center of the table, with two plants on either side of it and two gardening tools in front of it. To complete this task, I will grasp the orange from the top, lift it up, and hold it there.
Here are the steps I will take:
robot_api.grasp_object(object_name='orange', grasp_approach='top-down')
robot_api.lift_object(object_name='orange')
To hold the grapes up, I need to grasp them and then lift my arm. Since I only have one arm, I need to make sure the grapes are the only object in my grasp. To avoid hitting the plants or the tools, I should approach the grapes from the side.
robot_api.grasp_object(object_name='grapes', grasp_approach='from the side')
robot_api.lift_object(object_name='grapes')
To hold the apple up, I need to grasp it and then lift it. I will use a top-down grasp to pick up the apple directly.
robot_api.grasp_object(object_name='red apple', grasp_approach='top-down')
robot_api.lift_object(object_name='red apple')
The robot should grasp the orange from the top and lift it up while avoiding the other objects.
robot_api.grasp_object(object_name="orange", grasp_approach="top-down")robot_api.lift_object(object_name="orange")
The grapes are in the center of the table, with a kettle to the left, a whisk to the right, and a mug at the top. The drawer is currently closed.
robot_api.grasp_object(object_name='grapes', grasp_approach='top-down')
robot_api.lift_object(object_name='grapes')
The apple is in the center of the table, surrounded by other objects. To grasp it, the robot needs to approach from the top to avoid collisions. Once grasped, the apple needs to be lifted to be considered "held up".
robot_api.grasp_object(object_name="red apple", grasp_approach="top-down")
robot_api.lift_object(object_name="red apple")
The orange is in the center of the table, surrounded by plants and gardening tools. I will grasp the orange from the top and lift it up.
robot_api.grasp_object(object_name='orange', grasp_approach='top-down')
robot_api.lift_object(object_name='orange')
To hold the grapes up, I need to grasp them and then lift them. Since the grapes are in the middle of the table, I need to make sure to grasp them from the top to avoid hitting other objects.
robot_api.grasp_object(object_name="grapes", grasp_approach="top-down")
robot_api.lift_object(object_name="grapes")
The kettle appears to be upright with its handle oriented to the left.
Here are the steps to pick up the kettle:
robot_api.grasp_object(object_name='white kettle with black handle', grasp_approach='top-down')
robot_api.lift_object(object_name='white kettle with black handle')
Let me know how I did!
robot_api.grasp_object(object_name='black and white kettle', grasp_approach='top-down')
robot_api.lift_object(object_name='black and white kettle')
The robot can complete this task. Here are the steps to pick up the kettle:
robot_api.grasp_object(object_name="black and white kettle", grasp_approach="top-down")
robot_api.lift_object(object_name="black and white kettle")
Let me know when you would like me to proceed.
The kettle is sitting on the table. My end-effector is also above the table, so I can reach down to grasp it from the top and lift it up to hold it.
Here are the steps to complete the task:
robot_api.grasp_object(object_name="black and white kettle", grasp_approach="top-down")
robot_api.lift_object(object_name="black and white kettle")
The kettle is sitting upright on the table. I will grasp the kettle from the top and lift it up.
Here are the steps I will take:
robot_api.grasp_object(object_name='black and white kettle', grasp_approach='top-down')
robot_api.lift_object(object_name='black and white kettle')
The kettle's handle is sticking up, so it would be easiest to grasp the kettle from the side and then reorient to have the handle upright.
Steps:
grasp_object(object_name="black and white kettle", grasp_approach="from the side"): The robot will move its arm to the kettle and grasp the body of it from the side. The kettle's handle will be horizontal, and spout facing forward.reorient(desired_gripper_orientation="vertical"): The robot will maintain its grasp on the kettle and rotate its gripper to be vertical. The kettle will now be held upright with the spout facing to the right and handle to the back.lift_object(object_name="black and white kettle"): The robot will lift the kettle straight up, maintaining the kettle's orientation.robot_api.grasp_object(object_name="black and white kettle", grasp_approach="from the side")
robot_api.reorient(desired_gripper_orientation="vertical")
robot_api.lift_object(object_name="black and white kettle")
The kettle is sitting upright on a table. I will grasp the kettle from the top, then I will lift it up.
Here are the steps:
robot_api.grasp_object(object_name="black and white kettle", grasp_approach="top-down")
robot_api.lift_object(object_name="black and white kettle")
The robot needs to grasp the kettle from above and lift it up.
robot_api.grasp_object(object_name='black and white kettle', grasp_approach='top-down')
robot_api.lift_object(object_name='black and white kettle')
The robot should be able to grasp the kettle from the top and lift it up.
robot_api.grasp_object(object_name="black and white kettle", grasp_approach="top-down")
robot_api.lift_object(object_name="black and white kettle")
The robot needs to grasp the kettle from the top and lift it up. The handle orientation doesn't matter since the task is just to hold it up.
grasp_object("black and white kettle", "top-down")lift_object("black and white kettle")robot_api.grasp_object("black and white kettle", "top-down")
robot_api.lift_object("black and white kettle")
Sounds good, I can help with that! I will carefully grasp the pink cup, tilt it over so that the contents fall out, and then place the cup back on the table upright. Here are the steps I will take:
robot_api.grasp_object(object_name='pink cup', grasp_approach='from the side')
robot_api.lift_object(object_name='pink cup')
robot_api.reorient(desired_gripper_orientation='vertical')
# Since there is no shake function in the API, we can try to achieve the desired outcome by reorienting the gripper back and forth.
robot_api.reorient(desired_gripper_orientation='horizontal')
robot_api.reorient(desired_gripper_orientation='vertical')
robot_api.reorient(desired_gripper_orientation='horizontal')
robot_api.place_object(object_name='pink cup', location='here')
To pour the contents of the pink cup, we need to grasp it, reorient it until the opening is facing down, and then turn it back upright and place it on the table.
robot_api.grasp_object(object_name="pink cup", grasp_approach="from the side")
robot_api.reorient(desired_gripper_orientation="vertical")
robot_api.reorient(desired_gripper_orientation="horizontal")
robot_api.place_object(object_name="pink cup", location="here")
This sequence of actions should successfully pour the contents of the pink cup and place it back on the table.
Here are the steps to complete this task:
robot_api.grasp_object(object_name="pink cup", grasp_approach="from the side")
robot_api.lift_object(object_name="pink cup")
robot_api.reorient(desired_gripper_orientation="vertical")
robot_api.reorient(desired_gripper_orientation="horizontal")
robot_api.place_object(object_name="pink cup", location="here")
robot_api.grasp_object(object_name='pink cup', grasp_approach='from the side') # Cup is grasped from the side, fingers oriented horizontally.
robot_api.lift_object(object_name='pink cup') # Cup is now held in the air.
robot_api.reorient(desired_gripper_orientation='vertical') # Cup is now tilted, contents poured out.
robot_api.reorient(desired_gripper_orientation='horizontal') # Cup is now upright again.
robot_api.place_object(object_name='pink cup', location='here') # Cup is placed back on the table.
While a "top-down" grasp might seem intuitive for picking up a cup, it would require more complex manipulations to achieve the desired pouring angle, potentially increasing the risk of spilling or dropping the cup.
I see the pink cup and understand the task. Here are the steps to accomplish it:
robot_api.grasp_object(object_name="pink cup", grasp_approach="from the side")
robot_api.lift_object(object_name="pink cup")
robot_api.reorient(desired_gripper_orientation="vertical")
robot_api.reorient(desired_gripper_orientation="horizontal")
robot_api.reorient(desired_gripper_orientation="vertical")
robot_api.place_object(object_name="pink cup", location="here")
The robot should be able to pour the contents out of the pink cup. However, without knowing what the contents of the cup are, it is difficult to predict how they will behave when the cup is tilted. Assuming the contents are loose, like marbles or small toys, the robot should be able to pour them out.
Here are the steps to complete the task:
robot_api.grasp_object(object_name='pink cup', grasp_approach='from the side')
robot_api.lift_object(object_name='pink cup')
robot_api.reorient(desired_gripper_orientation='vertical')
robot_api.reorient(desired_gripper_orientation='horizontal')
robot_api.place_object(object_name='pink cup', location='here')
robot_api.reset()
I can do that. I will need to grasp the cup, turn it upside down, and then place it back on the table. Here are the steps I will take:
robot_api.grasp_object(object_name="pink cup", grasp_approach="from the side")
robot_api.reorient(desired_gripper_orientation="vertical")
robot_api.reorient(desired_gripper_orientation="horizontal")
robot_api.place_object(object_name="pink cup", location="here")
Here are the steps to complete the task:
Here is the python code to execute these steps:
robot_api.grasp_object(object_name='pink cup', grasp_approach='from the side')
robot_api.reorient(desired_gripper_orientation='vertical')
robot_api.reorient(desired_gripper_orientation='horizontal')
robot_api.place_object(object_name='pink cup', location='here')
Okay, here's how I'll approach this:
Here's the Python code using the provided API:
robot_api.grasp_object(object_name="pink cup", grasp_approach="from the side")
robot_api.reorient(desired_gripper_orientation="vertical")
# Assuming there's no specific API call for pouring,
# the following is a hypothetical way to achieve it:
robot_api.reorient(desired_gripper_orientation="horizontal")
robot_api.reorient(desired_gripper_orientation="vertical")
robot_api.place_object(object_name="pink cup", location="here")
Here are the steps to pour the contents out of the pink cup and place it back upright on the table:
Here is the Python code using the provided API:
robot_api.grasp_object(object_name='pink cup', grasp_approach='from the side')
robot_api.lift_object(object_name='pink cup')
robot_api.reorient(desired_gripper_orientation='horizontal')
robot_api.reorient(desired_gripper_orientation='vertical')
robot_api.place_object(object_name='pink cup', location='here')
robot_api.reset()
This website was heavily inspired by Brent Yi's.