#llm-serving-systems

[ follow ]
fromHackernoon
6 months ago

vAttention: Efficacy of Physical Memory Allocation for LLMs | HackerNoon

In contrast, vAttention needs to invoke CUDA's kernel driver while mapping a new physical page in a request's KV-cache.
Scala
[ Load more ]