Incident Summary
On 24 Septmber, a significant spike in email volume caused delivery delays of up to 30 minutes. The root cause was the current email service reaching its throughput limits under increased load.
Root Cause
The existing architecture was not designed to scale dynamically to handle large, sudden increases in outbound email traffic. As a result, the system became a bottleneck during high-volume events.
Remediation & Next Steps
The email service is currently undergoing a major re-architecture focused on scalability and performance. The new system is expected to support an order of magnitude higher throughput than the current implementation.
We are targeting a full cutover to the new service in October 2025.
Status
Mitigation measures have been applied to reduce the likelihood of recurrence before the new architecture is deployed. Monitoring and alerting thresholds have also been adjusted to provide earlier detection of delays.