Google has introduced the Gemini Robotics On-Device AI model, which can be tailored to different types of robots. Google DeepMind created this voice-language-action (VLA) model, which can learn new tasks with 50-100 demos and runs exclusively on a local device. The approach is intended for latency-sensitive applications and operates without the requirement for a data network.
The Gemini Robotics On-Device is designed to run effectively on the robot and can be easily tested across tasks and settings. The model may be tested in the MuJoCo physics simulator and easily converted to new domains with as few as 50 to 100 demos. Developers can obtain the SDK by enrolling in the trusted tester program.
Gemini Robotics On-Device takes few computational resources and may adapt to new tasks with fine-tuning. It runs locally with low-latency inference and provides excellent visual, semantic, and behavioural generalisation across a variety of testing settings. It follows natural language directions and does chores such as unzipping bags and folding garments right on the robot.
Also Read: Samsung 2025 Bespoke AI Appliances Launched In India
According to Google, Gemini Robotics On-Device excels at difficult tasks and complex instructions, providing cutting-edge results without on-device limits for developers looking for advanced results.
Gemini Robotics On-Device is touted to be the first VLA model that can be fine-tuned, allowing developers to optimise application performance. The model exceeds the current best on-device VLA in fine-tuning tasks like manipulating a lunchbox, drawing a card, and pouring salad dressing. The model can apply its basic knowledge to new tasks, demonstrating its generalisability.
Also Read: The pre-reserve window for upcoming Galaxy Z foldable phones is now open: Details
The Gemini Robotics On-Device model will be applied to a variety of robot embodiments, including the ALOHA robots and the bi-arm Franka FR3 robot. The model follows general-purpose instructions, handles unseen objects and scenes, completes dexterous tasks, and carries out industrial belt assembly tasks on the Franka that require precision and dexterity.
“On the Apollo humanoid, we adapt the model to a significantly different embodiment. The same generalist model can follow natural language instructions and manipulate different objects, including previously unseen objects, in a general manner.”