AWS identified a degradation in the underlying hardware supporting one of our production load balancer instances. While AWS had scheduled automated maintenance to address this, the instance became unavailable earlier than the maintenance window and stopped handling traffic.
Traffic routing through this load balancer was disrupted while the instance was in an unexpected state.
The issue was caused by degradation of the AWS host hardware. Additionally, our monitoring did not immediately alert on the instance becoming unavailable due to a configuration gap in our monitoring coverage.
The affected instance was restored once AWS completed the stop/start process required to move it off the degraded hardware, and normal traffic handling resumed.
We are updating our monitoring configuration to ensure all critical production components are actively monitored and alerted on, reducing time to detection for similar issues in the future.