#llm-serving
#llm-serving

[ follow ]

Alibaba reveals 82 percent GPU resource savings

Titled "Aegaeon: Effective GPU Pooling for Concurrent LLM Serving on the Market", the paper [PDF] opens by pointing out that model-mart Hugging Face lists over a million AI models, although customers mostly run just a few of them. Alibaba Cloud nonetheless offers many models but found it had to dedicate 17.7 percent of its GPU fleet to serving just 1.35 percent of customer requests.

Artificial intelligence

Growth hacking

fromInfoQ

10 months ago

Scaling Large Language Model Serving Infrastructure at Meta

LLM serving is evolving into a foundational technology similar to an operating system.

[ Load more ]

#llm-serving#llm-serving

Alibaba reveals 82 percent GPU resource savings

Scaling Large Language Model Serving Infrastructure at Meta

#llm-serving
#llm-serving