Prime Intellect has announced INTELLECT-2, a groundbreaking language model featuring 32 billion parameters. This model distinguishes itself with a decentralized architecture, employing fully asynchronous reinforcement learning to optimize efficiency in untrusted environments. Central to its operation is the PRIME-RL training framework, which facilitates independent components for rollout generation, policy updates, and model broadcasting. Notably, INTELLECT-2 employs SHARDCAST for weight distribution and TOPLOC for result verification. The model's training utilized 285,000 math and coding tasks, achieving superior performance on relevant benchmarks compared to its predecessors, rendering it a significant advancement in RL training methodologies.
INTELLECT-2 improves upon previous models by utilizing a decentralized asynchronous reinforcement learning approach, enhancing efficiency in training and inference in untrusted environments.
The system employs a new training framework called PRIME-RL, which allows for distributed policy updates and utilizes components like SHARDCAST and TOPLOC for verification.
Collection
[
|
...
]