Building humanoid robots is often framed as a data problem: collect enough real-world examples, and systems will learn to operate in human environments.
But more data has not translated into consistent real-world execution yet. The industry is not bottlenecked by data collection; it’s bottlenecked by turning that data into reliable physical action. Robot makers’ ambitions to build general-purpose humanoids — rather than single-purpose machines — make that challenge even harder, requiring systems to operate across unpredictable environments and tasks.
Despite the surge in data collection, there has not been a convergence on which training methods actually work.
Three of the most prominent strategies — scaling first-person data, capturing teleoperated demonstrations, and learning from deployed systems — highlight the tradeoffs between scale, precision, and real-world performance.
One of the fastest-growing approaches relies on egocentric, first-person data — video captured from wearable cameras as people perform everyday tasks.
Companies like Objectways, and Micro1 are building large-scale egocentric data pipelines, capturing first-person video of everyday tasks and supplying that data to humanoid robotics companies. Figure AI is collecting similar data to train Helix, the AI system powering its humanoid robots.
“When you say general-purpose, you need generalist data — you need everyday actions,” VP of Robotics Data at Micro1 Arian Sadeghi said.
The strategy mirrors approaches used by Tesla and others in building self-driving systems, where scaling real-world data has been central to improving performance. Compared to robot-collected data, egocentric capture is faster and cheaper, enabling the collection of large volumes of diverse, real-world behavior.
But most of that data is difficult to use.
“Structuring the data is very important to us,” Sadeghi said. About 70% of footage is usable on the first pass, rising to roughly 95% after iterative review and feedback from Micro1’s human data managers.
Without that process, large portions of the datasets become unusable.
That often means collecting footage that lacks key signals — such as depth or consistent visual quality — making it difficult to interpret details like hand position or movement. Variability in lighting, camera placement, and motion adds further inconsistency that must be filtered out.
“You could be paying hundreds of thousands of dollars…to capture data that you did not realize is useless,” Cosima du Pasquier, a postdoctoral research fellow at Stanford’s CHARM Lab, said.
More fundamentally, egocentric data captures what a task looks like — not how to execute it.
It lacks information about force, control, and physical interaction, making it difficult to translate human behavior into reliable robot actions — helping robots understand what to do, but not how to do it.
A second approach centers on teleoperation — humans control robots remotely while every movement, force signal, and sensor input is recorded.
These demonstrations capture not just what a task looks like, but how it is physically executed on the robot itself, making teleoperation some of the highest-fidelity data available.
“In some circles, it’s considered the best kind of data…the one you need the least data to successfully train your robot,” du Pasquier said.
The advantage is precision. Engineers can train models directly on robot-specific actions, avoiding the translation gap between human behavior and machine control.
The tradeoff is scale and diversity. Each hour of teleoperated data requires both a human operator and a physical robot, making it difficult to collect volume and variability for general-purpose behavior.
But in structured environments such as warehouses and manufacturing lines, that limitation is less severe. Because teleoperation data is so precise, relatively small numbers of demonstrations can still produce reliable performance when tasks remain repetitive and conditions stable.
Companies such as Boston Dynamic and 1X use human operators to guide robots through tasks while collecting training data for automation. But because both companies are targeting highly variable environments — homes and field operations — teleoperation alone is not enough to generalize behavior. Boston Dynamics, for example, is combining teleoperation with other approaches rather than relying on it alone.
Each hour of teleoperated data requires both a human operator and a physical robot, making it difficult to collect the volume and diversity needed for general-purpose behavior. More importantly, the data is tightly coupled to a specific robot and setup.
“Teleoperation is very limited to the exact robot your operator is controlling,” du Pasquier said.
Teleoperation works well for structured tasks where consistency matters more than adaptability, but struggles when robots must operate across new environments and conditions.
This method teaches a robot how to perform a task, but not how to adapt when conditions change.
A third approach focuses on data collected from robots operating in real-world environments.
Companies like Agility Robotics, which deploys its humanoid Digit in live warehouse environments, use telemetry as their primary improvement loop. Sensor readings, joint movement, force data, and task outcomes feed directly into training and system updates.
“The constraint hasn’t been data volume — it’s meeting production-grade reliability and safety standards,” Agility CTO Pras Velagapudi said.
Rather than relying on demonstration or human video, this approach uses real-world operation as the primary feedback loop, generating data from successes, failures, and edge cases that are difficult to capture in controlled settings.
“Physical AI requires reliable sensing, control, and safety systems,” Velagapudi said. “Egocentric data alone doesn’t solve the long-tail edge cases found in real warehouses.”
The advantage is realism — but it comes with a tradeoff.
Deployment data is inherently reactive — only capturing problems after they occur. This slows the rate at which systems improve, particularly in early deployment, where data is still limited.
Each of these approaches captures a different slice of the problem.
Together, they reveal the central challenge: getting humanoid systems to execute tasks reliably as objects shift, environments vary, and interactions unfold unpredictably.
Once collected, the data is translated into training examples that help humanoid systems generalize behaviors across new environments — a process still being figured out.
The deeper issue is that humanoid systems are improving faster at observing the world than physically adapting to it. Reliable execution still depends on dexterity, force control, recovery behavior, and the ability to respond when conditions deviate from expectation.
Despite growing investment in data pipelines, the industry has not converged on a clear answer.
“There are a lot of people who are posturing and saying that they have answered the question,” du Pasquier said. “But the academic consensus…is that there isn’t one.”
Companies are pursuing multiple approaches in parallel — collecting egocentric data, running teleoperation rigs, and deploying robots in controlled environments.
“They’re just trying everything,” du Pasquier said.
At the same time, not all failures are data problems. Some are still basic control and hardware limitations — such as locomotion, dexterity, and physical interaction. “You don’t need data for that…it’s kinematics,” du Pasquier said.
The result is an industry that is still searching for a solution to produce humanoids that can reliably operate in their intended environments.
Until that gap is solved, humanoid robotics will be limited less by how intelligent the models are, and more by whether robots can operate reliably over long periods in messy, unpredictable environments.
The challenge is no longer getting humanoids to imitate human behavior once; it’s getting them to keep working when the real world stops matching the training data.