Amazon ElastiCache introduces new CloudWatch metrics for enhanced network and engine diagnostics
Amazon ElastiCache has introduced thirteen new CloudWatch metrics for better network and engine diagnostics, enabling enhanced monitoring without additional costs.
Amazon ElastiCache has enhanced its monitoring capabilities by introducing thirteen new Amazon CloudWatch metrics aimed at improving network capacity planning and engine diagnostics for node-based clusters. These metrics enable customers to identify issues such as network throttling, memory fragmentation, and connection exhaustion without the need to run INFO commands on individual nodes or calculate baselines from raw byte counters.
Network Capacity: The new metrics, including NetworkBaselineUsageInPercentage, NetworkBaselineUsageOutPercentage, NetworkBaselineMaxUsageInPercentage, and NetworkBaselineMaxUsageOutPercentage, provide insights into network utilization relative to the instance baseline. This allows for the creation of portable alarms that remain effective even when instance types change. Metrics exceeding 100 percent indicate that a host is utilizing burst credits, which can lead to credit exhaustion and subsequent throttling. Variants capturing maximum usage help identify per-second bursts that averaged metrics might overlook.
Memory Health: Metrics such as UsedMemoryDataset reveal the memory used by stored data, excluding engine overhead. AllocatorFragmentationBytes and AllocatorFragmentationRatio focus on memory fragmentation issues that can be addressed by the activedefrag parameter. MajorPageFaults metric captures operating system-level page faults, indicating memory pressure beyond what the engine itself can detect.
Connectivity Health: BlockedConnections and RejectedConnections metrics highlight connections waiting on blocking commands and those denied when the maxclients limit is reached. A non-zero RejectedConnections metric suggests the need to increase the maxclients limit or investigate client-side connection pool leaks.
Pub/Sub Workloads: The metrics PubSubChannels and PubSubShardChannels provide information on active classic and sharded channels on each node. An increase in classic channel counts with utilization may suggest a need to switch to sharded pub/sub for horizontal scaling.
Command Throughput: The ProcessedCommands metric offers a comprehensive view of total command throughput across all command types.
These newly introduced metrics are accessible for node-based clusters in all commercial AWS Regions, including AWS China and AWS GovCloud (US) Regions, where ElastiCache is supported, at no extra cost. Users can access these metrics via the ElastiCache console monitoring tab or through the AWS/ElastiCache namespace in the CloudWatch console. For further details, users can refer to Host-Level Metrics and Metrics for Valkey and Redis OSS.