#model-quantization

[ follow ]
Gadgets
fromInfoQ
2 weeks ago

Cactus v1: Cross-Platform LLM Inference on Mobile with Zero Latency and Full Privacy

Cactus enables fast, energy-efficient on-device AI inference with sub-50ms latency, cross-platform SDKs, privacy-by-default, model versioning, and optional cloud fallback.
Artificial intelligence
fromTheregister
4 months ago

How to run LLMs on PC at home using Llama.cpp

Running LLMs locally is practical on modest hardware using Llama.cpp, offering performance, CPU/GPU assignment, quantization, and improved privacy without cloud costs.
[ Load more ]