X Square Robot open sources its robot-free data collection framework

Companies building robots for physical work spend large amounts of time and money operating machines by hand to gather training examples. Each session with a physical robot produces a small number of demonstrations per day, which slows the growth of datasets used to train embodied AI. Human demonstrators offer a cheaper source of data, and X Square Robot has put a system for this approach into public release.

robot-free data collection

The Shenzhen company released XRZero-G0, a hardware and software framework for collecting robot training data from human operators, generating policies, and testing them on physical robots. The code carries an MIT license and sits on GitHub alongside G0-Dataset, a multimodal dataset built with the framework.

Matching human demonstrations to robot perception

Physical robots read their surroundings through several cameras. A head-mounted camera supplies wider context, and wrist-mounted cameras capture detailed hand and object movement. Many human-operated collection setups rely only on wrist views, which creates a gap between the data and the way a robot sees a task during deployment.

XRZero-G0 uses a head-mounted camera together with two wrist cameras to record both wide context and close interaction. The system maps these synchronized views into a shared representation that matches robot perception. A wearable VR interface and interchangeable grippers let one operator produce demonstrations that carry over to different robot bodies.

Quality control built into the pipeline

Data from human demonstrators has carried quality problems that limit its value for training. XRZero-G0 runs a closed loop of collection, inspection, training, and evaluation to govern which samples reach training.

At the observation level, multi-view geometric consistency reduces misalignment between images and motion. At the kinematic level, whole-body inverse kinematics with collision and joint-limit constraints removes invalid trajectories. At the policy level, playback on a physical robot serves as the final check. X Square Robot reports an effective data yield near 85 percent under controlled settings.

A 10-to-1 data ratio

The company reports that robot-free data and real-robot data work together. Combining about 10 human-collected episodes with 1 real-robot episode reaches performance comparable to datasets made entirely from real-robot recordings in the tasks measured. Human-collected data supplies broad behavioral coverage, and the small portion of real-robot data anchors physical factors such as motor latency and friction. The ratio lowers real-robot data requirements by up to 20 times under the conditions tested.

The G0-Dataset

G0-Dataset contains over 2,000 hours of validated demonstrations across vision, tactile, and audio. It covers 3,000 distinct manipulation tasks that range from basic operations to fine-grained semantic actions and follow a long-tail distribution. Operators reached a peak collection speed of 93.2 episodes per hour. The dataset supports large-scale pretraining and cross-embodiment transfer research.

X Square Robot reports that policies trained with the framework generalize across collection environments with varying robot poses, table heights, and viewpoints. The policies also show zero-shot transfer to robot platforms outside the training set, performing tasks without platform-specific fine-tuning.

Don't miss