Skip to content

How Hydra Scores Motions

A plain-language tour of the physics check that decides which motions earn a place in the verified library.

Humanoid Network is the motion data layer for Physical AI. You generate a robot motion in a browser simulator, and the motions that survive a physics check join a verified library that Physical AI can train on. Hydra is that check. It is the reason the library is worth training on at all: a Hydra score is a claim anyone can inspect, not a number we ask you to take on faith.

This page is the methodology behind that claim. If you want the product-level walkthrough of how a single motion gets scored and proofed, start with Hydra Proof. If you want to understand what Hydra actually measures and why those measurements keep the library trustworthy, you are in the right place.

Why physics-scoring matters

A motion can look right and still be impossible. A clip can render a humanoid walking smoothly across your screen while, under the hood, its feet glide without traction, its joints snap to positions no real actuator could reach, or its balance only holds because nothing is pushing back. Eyeballing a motion tells you whether it is pleasant. It tells you nothing about whether a real robot could perform it.

Physical AI does not learn from how a motion looks. It learns from the underlying trajectory: where every joint was, frame by frame, and whether the body stayed upright while it got there. If that trajectory is physically incoherent, every model that trains on it inherits the lie. So the gate cannot be aesthetic. It has to be physical. Hydra replays each motion against a simulated body and a simulated world and asks one question: would this actually hold up? Motions that pass keep the library trustworthy. Motions that do not are kept out, and that exclusion is the whole point.

The five signals

Hydra reads five signals off the replay. Each one isolates a different way a motion can be physically wrong, and together they decide a motion's fate. The definitions below describe what each signal means; the thresholds Hydra applies are deterministic, so the same motion always scores the same way.

Stability

Did the body stay up? Hydra runs the motion as a physics rollout and records whether the humanoid survived the attempt rather than toppling, collapsing, or being thrown off balance. It also tracks how far the body's height drifts from where it should be over the run. A motion that cannot keep itself standing fails here first, because nothing downstream matters if the robot is on the floor.

Tracking quality

How faithfully did the simulated body reproduce the intended motion? Hydra measures the whole-body mean per-joint-position-error in millimetres: across the run, how far each joint landed from where the target trajectory said it should be, averaged over the body. Lower is better. A tight number means the motion is one a real articulated body can track; a loose one means the trajectory drifted away from anything physically reachable. This is the signal that separates a clean, high-fidelity motion from a sloppy approximation, and it carries most of the weight in a passing score.

Foot skate

Are the feet cheating? Foot skate measures how much a planted foot slides along the ground when it should be still. Some sliding is normal contact noise. A lot of it means the motion is faking traction: the body stays upright, but only because its feet are gliding across the floor in a way no real surface would allow. Hydra treats runaway foot skate as gaming the physics and rejects it outright, even if the robot never fell.

Duration

Did the motion last long enough to mean anything? Hydra records how many seconds of the attempt it could track before things fell apart. A run that collapses almost immediately is not a motion, it is a stumble, and it carries no usable trajectory. Very short runs are scored down or rejected; a motion has to stay coherent long enough to be worth learning from.

Validation tier

The validation tier is a discrete grade the underlying evaluator assigns to the run, summarising how strict a standard the motion cleared. Hydra carries this tier through alongside the other signals rather than re-deriving it, so the strictness of the original physics evaluation travels with the motion into the library. It is the signal that lets two motions with similar surface scores still be told apart by how rigorously each was checked.

From signals to accept or reject

Hydra does not average these signals into a vague vibe. It applies them in order, and the first hard failure wins. A motion that falls almost immediately is rejected before anything else is considered. A motion whose feet slide past the allowed limit is rejected as physics-gaming even if it stayed upright. A motion that survives but only briefly is scored down for being too short to learn from. Only once a motion clears those gates does tracking quality set how high it scores, on a scale that rewards trajectories a real body could follow closely and penalises ones that drifted.

The scoring is deterministic by design. The same motion fed in twice produces a byte-identical result, because the verified library depends on the score meaning the same thing every time and to everyone. A Hydra score is not an opinion that can quietly shift; it is a reproducible measurement that anyone re-running the check would land on.

The outcome is what determines whether a motion becomes part of the verified library. A motion that clears the gates earns its quality score and its validation tier, and those travel with it as metadata. That is what "verified" means here: not that we liked the motion, but that it passed a physics check whose rules are fixed and inspectable.

How the Foundation Dataset gives Hydra context

A score in isolation tells you a motion is internally coherent. It does not tell you where that motion sits in the wider space of things a humanoid can do. That context comes from the Foundation Dataset: an independent corpus of 46,893 episodes, grouped into behavior families and embedded in a shared 3D behavior space.

Because the dataset is laid out as a navigable space rather than a flat list, a new motion can be placed relative to what already exists, near the families it resembles and far from the ones it does not. That turns scoring from a blind, one-off measurement into a judgement made with a map. Hydra is not asking only "is this motion physically valid?" in a vacuum; the Foundation Dataset gives it the surrounding terrain to judge a new motion against, rather than scoring it with no sense of where it belongs. The dataset is read-only context inside the app, not a data-export portal, and it is there to give every new motion a frame of reference, not to be shipped out.

Proof for the motions that pass

Accepted motions can receive a proof record on Base Sepolia, the rehearsal chain, during testing. That record is a tamper-evident anchor for the motion's score and validation metadata, so a passing claim is something external parties can point at rather than something we merely assert internally. This is env-gated and tied to the current testing phase. For the full picture of how a score becomes a proof, see Hydra Proof.

Next: Foundation Dataset →