Alibaba Cloud reveals uptime and efficiency secrets
Briefly

Alibaba Cloud reveals uptime and efficiency secrets
"a fast failure recovery service that ensures global bypass in large-scale cloud networks within seconds."
"As a result, tenants are forced to develop their own recovery solutions, which typically involve redundant resources or protocol stack modifications, thereby increasing capital and operating expenses,"
"reduces daily worker hangs by 99.8 percent and lowers the unit cost of L7 LB infrastructure by 18.9 percent."
"rely on I/O event notification mechanisms such as epoll to dispatch connections from the kernel to userspace workers,"
Alibaba achieved a 92.71 percent reduction in cumulative network outage time by deploying ZooRoute, a fast failure recovery service that constantly probes for viable routes and switches traffic to working paths within seconds. The approach eliminates the need for tenants to provision redundant resources or modify protocol stacks to handle failures. Alibaba also deployed Hermes to reduce daily worker hangs by 99.8 percent and to lower layer-7 load-balancer unit costs by 18.9 percent. Hermes addresses bottlenecks from I/O event notification mechanisms such as epoll by leveraging eBPF. Performance improvements also include SmartNIC workload offloading to idle infrastructure.
Read at Theregister
Unable to calculate read time
[
|
]