Refiner aims to clean up robotics data for AI agents

Former members of Hugging Face’s pre-training team released Refiner, a library for refining robotics data. It can take in many robotics data formats, including Parquet, HDF5, MCAP, Zarr, RLDS, and LeRobot.

It also supports common preparation steps such as tracking hands in video, marking smaller parts of a task, and running a reward model. For teams building robot-like AI agents, the main value is making messy training data easier to prepare and reuse.

Key points

  • Refiner is an open-source library for robotics data refinement.
  • It supports Parquet, HDF5, MCAP, Zarr, RLDS, and LeRobot formats.
  • It includes flows for hand tracking, task annotation, and reward model runs.
  • It may lower development effort for teams preparing robotics data.

Quick term guide

Hugging Face
An online place where AI models and datasets are shared.
pre-training
The early training step where an AI model learns general patterns before a specific task.
reward model
A model that gives scores to help an AI learn which actions or results are better.
training data
The collection of information used to teach an AI how to recognize patterns and answer questions.
infrastructure
The technical systems that keep a website or app running.
data pipelines
A set of steps that move, clean, combine, or export data.
data pipeline
An automated path that moves data from where it's collected to where it's needed, such as an AI model.
open-source
Software whose code is shared publicly so others can inspect, use, or change it.
Read original