P R I M E

A framework for the optimized, worldwide distribution of AI model training across the internet

"With the technology at our disposal, the possibilities are unbounded"

In Prime, we’ve introduced a new distributed abstraction called ElasticDeviceMesh, which manages dynamic global process groups for fault-tolerant communication across the internet

Collectively advancing the frontier of co-owned AI models

Int8 quantization reduced the all-reduce payload by 4x with no accuracy loss. We built a custom C++ ring-reduce kernel, replacing slow Torch ops with multithreaded uint8 ops, boosting speed by 60x and fully utilizing our 4 Gbps bandwidth.

We improved bandwidth utilization between nodes in similar data center environments by up to 40x compared to our OpenDiLoCo release, achieving up to 4 Gb/s connections across data centers nationwide.