Amazon Bedrock enhances observability with new metrics for latency and quota usage

Amazon Bedrock now supports two new CloudWatch metrics, TimeToFirstToken and EstimatedTPMQuotaUsage, enhancing visibility into latency and quota consumption. These metrics help monitor performance and manage quota usage efficiently.

Amazon Bedrock, a fully managed service designed to facilitate the creation of generative AI applications using top-tier foundation models from leading AI providers, has introduced support for two new CloudWatch metrics. These metrics, TimeToFirstToken and EstimatedTPMQuotaUsage, aim to provide users with enhanced insight into inference performance and quota consumption.

The TimeToFirstToken metric measures the latency from the moment a request is sent until the first token is received. This applies specifically to streaming APIs such as ConverseStream and InvokeModelWithResponseStream. By utilizing this metric, users can set CloudWatch alarms to monitor latency issues and establish service level agreement (SLA) baselines without needing to implement any client-side instrumentation.

On the other hand, the EstimatedTPMQuotaUsage metric tracks the estimated Tokens Per Minute (TPM) quota consumption. This includes cache write tokens and output burndown multipliers, applicable across all inference APIs including Converse, InvokeModel, ConverseStream, and InvokeModelWithResponseStream. This metric allows users to set proactive alarms to prevent reaching quota limits, monitor quota consumption across various models, and request additional quota before usage becomes restricted.

Both metrics are available in all commercial Bedrock regions and are applicable to models accessed via cross-region inference profiles and in-region inference. They are updated every minute for successfully completed requests and are readily accessible in CloudWatch without the need for API changes or opt-in processes. Users are only charged for the model inference they utilize.

For more detailed information about these metrics, including how to implement them, users can refer to the documentation page on Monitoring Amazon Bedrock.