Amazon SageMaker now offers OpenAI-compatible API support for inference endpoints

Amazon SageMaker Inference now supports OpenAI-compatible APIs, allowing seamless integration with existing tools by simply changing the endpoint URL.

Amazon SageMaker Inference has introduced support for OpenAI-compatible APIs, enabling users to leverage familiar tools and frameworks such as the OpenAI SDK, LangChain, and Strands Agents for direct connection to SageMaker endpoints. This update simplifies the process by requiring only a change in the endpoint URL, eliminating the need for custom integration code, SDK wrappers, or extensive rewrites.

With this enhancement, users are not required to adopt a different API format or alter their authentication methods. By simply adjusting the endpoint URL, existing SDK calls, streaming logic, and framework integrations will continue to function seamlessly. This flexibility allows users to select their preferred GPU instances, maintain data within their own Virtual Private Cloud (VPC), run any open-source or fine-tuned models, and scale using auto-scaling policies optimized for their specific workloads.

Authentication is streamlined through existing AWS credentials, which include automatic token refresh, minimizing additional management tasks in production environments.

This new capability is currently available in several regions, including US East (N. Virginia), US West (Oregon), US East (Ohio), Asia Pacific (Mumbai), Asia Pacific (Jakarta), Europe (Ireland), Europe (Frankfurt), South America (São Paulo), Asia Pacific (Tokyo), Asia Pacific (Seoul), Europe (London), Asia Pacific (Singapore), Asia Pacific (Sydney), and Canada (Central). For more detailed information and to begin utilizing this feature, users are encouraged to read the launch blog or consult the SageMaker Inference documentation.