AWS launches Flexible Training Plans for inference endpoints in SageMaker AI
Briefly

AWS launches Flexible Training Plans for inference endpoints in SageMaker AI
"However, the auto-scaling nature of these inference endpoints might not be enough for several situations that enterprises may encounter, including workloads that require low latency and consistent high performance, critical testing and pre-production environments where resource availability must be guaranteed, and any situation where a slow scale-up time is not acceptable and could harm the application or business."
"According to AWS, FTPs for inferencing workloads aim to address this by enabling enterprises to reserve instance types and required GPUs, since automatic scaling up doesn't guarantee instant GPU availability due to high demand and limited supply."
"FTPs support for SageMaker AI inference is available in US East (N. Virginia), US West (Oregon), and US East (Ohio), AWS said."
Auto-scaling inference endpoints can be insufficient for workloads that require low latency, consistent high performance, guaranteed resource availability, or where slow scale-up is unacceptable and could harm applications or business operations. FTPs for inferencing workloads enable enterprises to reserve specific instance types and required GPUs to address the limitation that automatic scaling does not guarantee instant GPU availability due to high demand and limited supply. Reserved capacity ensures predictable resource availability for critical testing, pre-production, and production workloads. FTP support for SageMaker AI inference is available in US East (N. Virginia), US West (Oregon), and US East (Ohio).
Read at InfoWorld
Unable to calculate read time
[
|
]