Nvidia acquires SchedMD, developer of workload manager Slurm
Briefly

Nvidia acquires SchedMD, developer of workload manager Slurm
"Slurm is used to schedule computing tasks and allocate resources within large server clusters in research, industry, and government. SchedMD was founded in 2010 by the original developers of Slurm. The company not only focuses on the further development of the software, but also provides commercial support and advice to organizations that use Slurm in production. According to SiliconANGLE, SchedMD serves several hundred customers, including government agencies, banks, and organizations in the healthcare sector."
"Slurm is designed for environments in which large numbers of parallel tasks are performed. The system determines which computing resources are used and when, preventing workloads from being unnecessarily slowed down by poorly distributed resources. In practice, this means, among other things, that GPUs do not remain unused while others are overloaded. Slurm can manage clusters with more than 100,000 GPUs, making it suitable for both supercomputers and large-scale AI training."
"In AI environments, Slurm is often compared to Kubernetes, which is also used for cluster management. Both platforms can schedule and distribute workloads, but Slurm is more focused on HPC-like scenarios with strict requirements for performance and scalability. For example, Slurm offers more options for fine-grained scheduling, such as placing tasks that exchange a lot of data physically close to each other on the cluster. Kubernetes can perform similar optimizations, but often requires additional extensions to do so."
Nvidia acquired SchedMD, the company that develops and maintains Slurm, an open-source workload manager used across research, industry, and government. SchedMD, founded in 2010 by Slurm's original developers, provides software development, commercial support, and consulting to several hundred customers including government agencies, banks, and healthcare organizations. Slurm schedules tasks and allocates resources across large server clusters, optimizing parallel workloads to prevent GPUs from sitting idle and supporting clusters with over 100,000 GPUs for supercomputers and large-scale AI training. Slurm emphasizes fine-grained, HPC-style scheduling and performance. SchedMD also maintains Slinky, enabling Slurm to run on top of Kubernetes and reducing the need for separate clusters.
Read at Techzine Global
Unable to calculate read time
[
|
]