Imagine this: you pick up a full cup of coffee and walk from the kitchen to your desk. Your brain continuously adjusts grip force, wrist tilt angle, and walking speed — all subconsciously. Now, teach a robot to do the same. This is not just a grasping problem, but a force control problem — delicate force regulation so the object does not tilt, shake, or spill.
In the previous post — Grasping with RL — we learned how to grasp objects. Now, we level up: the robot must grasp and keep balanced a cup of water throughout its motion.
The Cup-of-Water Problem
Why is Force Control Hard?
Force control is fundamentally different from position control:
| Criterion | Position Control | Force Control |
|---|---|---|
| Objective | Move end-effector to position X | Maintain force/torque at value Y |
| Feedback | Encoder (joint position) | Force/Torque sensor |
| Sensitivity | Tolerant to a few mm | Sensitive to 0.1N |
| Contact dynamics | Less important | Critically important |
| Stability | Easy to stabilize | Prone to oscillation |
When carrying a cup of water, the robot must simultaneously:
- Grip with just the right force — too strong crushes the cup, too weak drops it
- Keep the cup upright — tilt > 15 degrees = water spills
- Move smoothly — sudden acceleration = water sloshes out
- Minimize vibration — high jerk = water surface oscillates
Impedance Control Baseline
Before using RL, let us understand impedance control — the traditional method for force control. Impedance control models the robot as a spring-damper system:
$$F = K(x_{desired} - x) + D(\dot{x}_{desired} - \dot{x})$$
where $K$ is stiffness and $D$ is damping.
import numpy as np
class ImpedanceController:
"""Variable Impedance Controller for cup balancing."""
def __init__(self, kp_pos=100.0, kd_pos=20.0,
kp_rot=50.0, kd_rot=10.0):
self.kp_pos = kp_pos # Position stiffness
self.kd_pos = kd_pos # Position damping
self.kp_rot = kp_rot # Rotation stiffness
self.kd_rot = kd_rot # Rotation damping
def compute_action(self, current_pos, desired_pos,
current_vel, desired_vel,
current_rot, desired_rot,
current_angvel):
"""Compute required force/torque."""
# Position error
pos_error = desired_pos - current_pos
vel_error = desired_vel - current_vel
# Force command
force = self.kp_pos * pos_error + self.kd_pos * vel_error
# Rotation error (simplified - keep upright)
rot_error = desired_rot - current_rot
torque = self.kp_rot * rot_error - self.kd_rot * current_angvel
return np.concatenate([force, torque])
def compute_grasp_force(self, object_mass, tilt_angle):
"""Compute grasp force based on mass and tilt angle."""
# Minimum grasp force to prevent slipping
min_force = object_mass * 9.81 / (2 * 0.5) # mu = 0.5
# Increase force when tilted (compensate gravity component)
safety_factor = 1.5 + 0.5 * abs(tilt_angle) / np.pi
return min_force * safety_factor
Impedance control works well for simple trajectories but cannot adapt to:
- Changing water mass (drinking gradually)
- External perturbations (someone bumps the cup)
- Rough/slippery surfaces during movement
This is why we need RL.
Reward Design for Cup Balancing
Multi-Objective Reward
The reward function for cup balancing must balance multiple competing objectives:
class CupBalanceReward:
"""Reward function for cup-of-water balancing task."""
def __init__(self):
self.tilt_threshold = np.radians(15) # Max 15 degrees
self.spill_threshold = np.radians(30) # Spilling point
self.jerk_weight = 0.1
self.prev_vel = None
def compute(self, cup_tilt, cup_angular_vel, ee_vel,
ee_accel, goal_dist, action, grasping):
"""
Args:
cup_tilt: Cup tilt angle from vertical (rad)
cup_angular_vel: Cup angular velocity [3]
ee_vel: End-effector velocity [3]
ee_accel: End-effector acceleration [3]
goal_dist: Distance to goal
action: Action vector
grasping: Bool, currently grasping
"""
if not grasping:
return -50.0, {'spill': True} # Large penalty if dropped
rewards = {}
# 1. TILT PENALTY — Keep cup upright
tilt_magnitude = abs(cup_tilt)
if tilt_magnitude > self.spill_threshold:
rewards['tilt'] = -20.0 # Water spilled!
rewards['spill'] = True
else:
rewards['tilt'] = -5.0 * (tilt_magnitude / self.tilt_threshold) ** 2
rewards['spill'] = False
# 2. ANGULAR VELOCITY PENALTY — Reduce oscillation
ang_vel_mag = np.linalg.norm(cup_angular_vel)
rewards['angular_vel'] = -2.0 * np.tanh(3.0 * ang_vel_mag)
# 3. JERK PENALTY — Smooth motion
jerk = np.linalg.norm(ee_accel)
rewards['jerk'] = -self.jerk_weight * np.tanh(jerk)
# 4. PROGRESS REWARD — Move toward goal
rewards['progress'] = 2.0 * (1.0 - np.tanh(3.0 * goal_dist))
# 5. SPEED REWARD — Fast but not too fast
speed = np.linalg.norm(ee_vel)
if goal_dist > 0.1:
rewards['speed'] = 0.5 * min(speed, 0.3) / 0.3
else:
rewards['speed'] = -1.0 * speed
# 6. SUCCESS BONUS
if goal_dist < 0.05 and tilt_magnitude < self.tilt_threshold:
rewards['success'] = 20.0
else:
rewards['success'] = 0.0
# 7. ACTION SMOOTHNESS
rewards['action_smooth'] = -0.01 * np.sum(action ** 2)
total = sum(rewards.values()) - rewards.get('spill', 0)
return total, rewards
Analyzing the Trade-offs
This reward clearly demonstrates multi-objective trade-offs:
- Speed vs stability: The robot wants to reach the goal quickly (progress reward) but cannot vibrate (jerk penalty)
- Tight grip vs gentle: Gripping too tightly causes vibration, too loosely risks dropping
- Upright vs moving: The cup wants to stay vertical, but turning requires slight tilting
MuJoCo Environment: Cup with Liquid Approximation
MuJoCo does not support fluid simulation directly, but we can approximate it using rigid body dynamics:
import mujoco
import numpy as np
CUP_BALANCE_XML = """
<mujoco model="cup_balance">
<option timestep="0.002" gravity="0 0 -9.81"/>
<worldbody>
<light pos="0 0 3" dir="0 0 -1"/>
<geom type="plane" size="2 2 0.1" rgba="0.9 0.9 0.9 1"/>
<!-- Table -->
<body name="table" pos="0.5 0 0.4">
<geom type="box" size="0.6 0.6 0.02" rgba="0.6 0.4 0.2 1" mass="100"/>
</body>
<!-- Robot arm (simplified 5-DOF) -->
<body name="base" pos="0 0 0.42">
<joint name="j0" type="hinge" axis="0 0 1" range="-3.14 3.14" damping="2"/>
<geom type="cylinder" size="0.05 0.04" rgba="0.3 0.3 0.3 1"/>
<body name="l1" pos="0 0 0.08">
<joint name="j1" type="hinge" axis="0 1 0" range="-1.57 1.57" damping="2"/>
<geom type="capsule" fromto="0 0 0 0.3 0 0" size="0.035" rgba="0.7 0.7 0.7 1"/>
<body name="l2" pos="0.3 0 0">
<joint name="j2" type="hinge" axis="0 1 0" range="-2.5 2.5" damping="1.5"/>
<geom type="capsule" fromto="0 0 0 0.25 0 0" size="0.03" rgba="0.7 0.7 0.7 1"/>
<body name="l3" pos="0.25 0 0">
<joint name="j3" type="hinge" axis="0 0 1" range="-3.14 3.14" damping="1"/>
<geom type="capsule" fromto="0 0 0 0.1 0 0" size="0.025" rgba="0.5 0.5 0.5 1"/>
<body name="wrist" pos="0.1 0 0">
<joint name="j4" type="hinge" axis="1 0 0" range="-1.57 1.57" damping="1"/>
<site name="ee" pos="0 0 0" size="0.01"/>
<!-- Gripper fingers -->
<body name="fl" pos="0 0.025 0">
<joint name="jfl" type="slide" axis="0 1 0" range="0 0.035" damping="5"/>
<geom type="box" size="0.008 0.004 0.04" rgba="0.8 0.2 0.2 1"
contype="1" conaffinity="1" friction="2 0.5 0.01"/>
</body>
<body name="fr" pos="0 -0.025 0">
<joint name="jfr" type="slide" axis="0 -1 0" range="0 0.035" damping="5"/>
<geom type="box" size="0.008 0.004 0.04" rgba="0.8 0.2 0.2 1"
contype="1" conaffinity="1" friction="2 0.5 0.01"/>
</body>
</body>
</body>
</body>
</body>
</body>
<!-- Cup -->
<body name="cup" pos="0.45 0 0.44">
<freejoint name="cup_free"/>
<site name="cup_top" pos="0 0 0.06" size="0.005"/>
<!-- Cup walls (hollow cylinder approximation) -->
<geom name="cup_bottom" type="cylinder" size="0.03 0.003" pos="0 0 0"
rgba="0.9 0.9 1 0.8" mass="0.05" contype="1" conaffinity="1"/>
<geom name="cup_wall1" type="box" size="0.003 0.03 0.03" pos="0.03 0 0.03"
rgba="0.9 0.9 1 0.8" mass="0.01"/>
<geom name="cup_wall2" type="box" size="0.003 0.03 0.03" pos="-0.03 0 0.03"
rgba="0.9 0.9 1 0.8" mass="0.01"/>
<geom name="cup_wall3" type="box" size="0.03 0.003 0.03" pos="0 0.03 0.03"
rgba="0.9 0.9 1 0.8" mass="0.01"/>
<geom name="cup_wall4" type="box" size="0.03 0.003 0.03" pos="0 -0.03 0.03"
rgba="0.9 0.9 1 0.8" mass="0.01"/>
<!-- Liquid approximation: ball inside cup -->
<body name="liquid" pos="0 0 0.02">
<joint name="liquid_x" type="slide" axis="1 0 0" range="-0.02 0.02" damping="5"/>
<joint name="liquid_y" type="slide" axis="0 1 0" range="-0.02 0.02" damping="5"/>
<geom name="liquid_ball" type="sphere" size="0.02" rgba="0.2 0.5 1 0.6"
mass="0.2" contype="0" conaffinity="0"/>
</body>
</body>
<!-- Goal position -->
<body name="goal" pos="0.5 0.3 0.55">
<geom type="sphere" size="0.03" rgba="0 1 0 0.3" contype="0" conaffinity="0"/>
<site name="goal_site" pos="0 0 0" size="0.01"/>
</body>
</worldbody>
<actuator>
<position name="a0" joint="j0" kp="200"/>
<position name="a1" joint="j1" kp="200"/>
<position name="a2" joint="j2" kp="200"/>
<position name="a3" joint="j3" kp="100"/>
<position name="a4" joint="j4" kp="100"/>
<position name="afl" joint="jfl" kp="80"/>
<position name="afr" joint="jfr" kp="80"/>
</actuator>
</mujoco>
"""
class CupBalanceEnv:
"""Environment for cup balancing task."""
def __init__(self):
self.model = mujoco.MjModel.from_xml_string(CUP_BALANCE_XML)
self.data = mujoco.MjData(self.model)
self.reward_fn = CupBalanceReward()
self.max_steps = 300
self.goal_pos = np.array([0.5, 0.3, 0.55])
self.prev_ee_vel = np.zeros(3)
def get_cup_tilt(self):
"""Compute cup tilt angle from vertical."""
cup_quat = self.data.qpos[7:11]
rot = np.zeros(9)
mujoco.mju_quat2Mat(rot, cup_quat)
rot = rot.reshape(3, 3)
cup_up = rot[:, 2]
cos_angle = cup_up[2]
tilt = np.arccos(np.clip(cos_angle, -1, 1))
return tilt
def get_liquid_offset(self):
"""Get relative position of liquid ball."""
liq_x = self.data.qpos[11]
liq_y = self.data.qpos[12]
return np.array([liq_x, liq_y])
def step(self, action):
joint_delta = action[:5] * 0.03 # Smaller for smoothness
gripper = (action[5] + 1) / 2 * 0.035
self.data.ctrl[:5] = self.data.qpos[:5] + joint_delta
self.data.ctrl[5] = gripper
self.data.ctrl[6] = gripper
for _ in range(10):
mujoco.mj_step(self.model, self.data)
ee_pos = self.data.site_xpos[0]
ee_vel = (ee_pos - self.prev_ee_pos) / (0.002 * 10)
ee_accel = (ee_vel - self.prev_ee_vel) / (0.002 * 10)
cup_tilt = self.get_cup_tilt()
cup_angular_vel = self.data.qvel[10:13]
goal_dist = np.linalg.norm(ee_pos - self.goal_pos)
liquid_offset = self.get_liquid_offset()
grasping = self._check_grasp()
reward, info = self.reward_fn.compute(
cup_tilt, cup_angular_vel, ee_vel,
ee_accel, goal_dist, action, grasping
)
self.prev_ee_vel = ee_vel.copy()
self.prev_ee_pos = ee_pos.copy()
return self._get_obs(), reward, False, info
Training with SAC
from stable_baselines3 import SAC
model = SAC(
"MlpPolicy",
cup_env,
learning_rate=1e-4, # Lower LR for stability
buffer_size=500_000,
batch_size=512,
tau=0.001, # Slow target update
gamma=0.995, # Higher than usual
train_freq=2,
gradient_steps=2,
ent_coef="auto",
target_entropy="auto",
verbose=1,
policy_kwargs=dict(
net_arch=[256, 256, 128], # Larger network
)
)
model.learn(total_timesteps=3_000_000)
RL vs Impedance Control Comparison
| Metric | Impedance Control | SAC (RL) |
|---|---|---|
| Max tilt (avg) | 12.3 deg | 6.8 deg |
| Spill rate | 18% | 4% |
| Avg travel time | 8.2s | 5.1s |
| Jerk (smoothness) | 15.6 | 8.3 |
| Adapts to new cups | Requires retuning | Self-adapts |
| Adapts to perturbation | Poor | Good |
RL clearly outperforms — especially in adaptability. The learned policy knows to slow down before turning, tilt the cup slightly to compensate centrifugal force, and react quickly to perturbations.
Advanced Technique: Variable Impedance RL
A powerful approach is to combine impedance control with RL — the RL policy does not directly control joint commands, but instead controls impedance parameters ($K$, $D$):
class VariableImpedancePolicy:
"""RL policy outputs impedance parameters."""
def __init__(self, base_controller):
self.controller = base_controller
def act(self, obs, rl_output):
"""
rl_output: [kp_x, kp_y, kp_z, kd_x, kd_y, kd_z,
desired_x, desired_y, desired_z]
"""
# RL selects stiffness and damping
kp = np.exp(rl_output[:3]) * 50 # [5, 500] range
kd = np.exp(rl_output[3:6]) * 5 # [0.5, 50] range
# RL selects desired position offset
desired_offset = rl_output[6:9] * 0.02 # Max 2cm
self.controller.kp_pos = np.diag(kp)
self.controller.kd_pos = np.diag(kd)
current_desired = self.get_trajectory_point() + desired_offset
return self.controller.compute_action(
current_pos, current_desired,
current_vel, np.zeros(3),
current_rot, np.array([0, 0, 1]),
current_angvel
)
This approach has a major advantage for sim-to-real transfer — the impedance controller provides safety bounds while RL provides adaptability. For details on sim-to-real for force control, see Domain Randomization.
References
- Learning Variable Impedance Control for Contact-Rich Manipulation — Martin-Martin et al., 2019
- Variable Impedance Control in End-Effector Space — Buchli et al., 2011
- Reinforcement Learning for Contact-Rich Manipulation — Survey, 2023
Next in the Series
Next up — Precision Pick-and-Place: Position & Orientation Control — we tackle placing objects with sub-cm accuracy, including orientation alignment. Hindsight Experience Replay (HER) will be the star.