Why Humanoid Network¶

Why the real bottleneck in Physical AI is motion data, not models, and how we close the gap.

The hard part of teaching a robot to move was never the model. It's the data. Language models learned to reason because the internet already existed: a near-infinite, openly available record of how people write and think. Robots have nothing like it. There is no shared library of how a humanoid keeps its balance, plants a foot, or carries a load, and a robot can't learn a motion it has never seen. Humanoid Network exists to build that library, one verified motion at a time.

The data gap in Physical AI¶

Physical AI inherited the model breakthroughs of the last decade. It did not inherit the data. A robot has to act in a physical world that has no index, no crawl, no corpus you can simply download. The motion that matters, the actual joint-by-joint record of a body moving through space and staying physically valid, is scattered across private labs, locked to specific hardware, or never recorded at all.

So the field keeps scaling the part that's already strong. Better architectures, bigger policies, faster compute. But a stronger model trained on thin, brittle, or physically implausible motion is still a stronger model trained on bad data. The constraint that holds Physical AI back is not how well robots can learn. It's that there is almost nothing trustworthy for them to learn from. Close the motion-data gap and the rest of the stack finally has something worth training on.

Where the data falls short¶

The gap isn't one problem. It's three, and each one compounds the others.

Scarcity. Collecting real robot motion is slow and expensive. It needs hardware, an operator, a controlled space, and a teardown for every failed run. Most teams never gather enough motion to generalize, and what they do gather rarely leaves the building. The world's supply of shareable humanoid motion is a rounding error next to what training needs.
Physical validity. A motion that looks right on screen can still be impossible. Feet that slide across the floor, contacts that violate the physics they're supposed to obey, balance that only holds because nothing pushed back. Train on motion like that and a robot learns habits that fall apart the moment a real body and real gravity are involved. Without a physics gate, more data just means more ways to be wrong.
Fragmentation. Motion is usually captured for one robot, in one stack, under one set of assumptions, and stays there. There's no shared frame that lets a behavior collected for one embodiment inform another. Every team starts cold, and the field never compounds.

These aren't details to optimize later. They're the reason a general motion foundation for robots doesn't exist yet.

Our answer: a motion data layer¶

Humanoid Network is the motion data layer for Physical AI. Instead of waiting for an internet of robot motion to appear, we built the place where it gets made, validated, and kept honest. The idea is simple: open contribution at the front, a physics gate in the middle, a verified library at the end. Here's how each piece closes a specific part of the gap.

Anyone can contribute, from a browser. You generate a robot motion in a browser-based simulator. No hardware to buy, no GPU of your own, no lab to book. That removes the cost wall that keeps motion scarce and turns data collection into something a global community can do, not just a handful of funded teams. The barrier to producing real motion drops to a tab.

Hydra makes sure only valid motion gets in. Every motion you generate runs through Hydra, our validation layer. Hydra scores it against physics on five signals: stability, tracking quality, foot skate, duration, and validation tier. Only motions that pass earn a place in the verified library. This is the physics gate. A Hydra score is a claim anyone can inspect, not a number we assert, which is what keeps the library trustworthy as it grows.

The Foundation Dataset gives that judgment context. Hydra doesn't score in a vacuum. Behind it sits the Foundation Dataset: an independent corpus of 46,893 episodes, grouped into behavior families and embedded in a shared 3D behavior space. It gives Hydra the grounding to place a brand-new motion among known behavior instead of judging it blind. It's read-only in the app and reference material, not a feed of contributor work and not a data-export portal.

The library is open to inspect. Accepted motions join a verified library you can browse: the motion itself, its Hydra score, its replay, its contributor. The shared behavior space is what fights fragmentation, a common frame where motion from many contributors becomes one growing, comparable record instead of scattered private piles.

Provenance is anchored on-chain. Accepted motions can receive a proof record on Base Sepolia, our rehearsal chain, during testing. The point is a verifiable trail for who contributed what, so credit and provenance don't depend on trusting us.

Put together, that's the loop: create a motion in the browser, Hydra scores it against physics, valid motion joins a verified library that Physical AI can train on. Any robot, any task, one network: a single, growing, physically-valid record of motion that any robot and any task can draw from, contributed by everyone.

Next: Getting Started →