Pulse Software - Production Sites Experiencing Timeouts – Incident details

Production Sites Experiencing Timeouts

Resolved
Degraded performance
Started about 2 months agoLasted 15 minutes

Affected

Pulse Web Service

Operational from 12:20 AM to 12:30 AM, Degraded performance from 12:30 AM to 12:35 AM

Forms Service

Operational from 12:20 AM to 12:30 AM, Degraded performance from 12:30 AM to 12:35 AM

Workflow Service

Operational from 12:20 AM to 12:30 AM, Degraded performance from 12:30 AM to 12:35 AM

Public Job Sites

Operational from 12:20 AM to 12:30 AM, Degraded performance from 12:30 AM to 12:35 AM

File Service

Operational from 12:20 AM to 12:30 AM, Degraded performance from 12:30 AM to 12:35 AM

Updates
  • Resolved
    Resolved

    A misbehaving Redis cluster caused requests to stall, leading to degraded performance across affected site.
    The Redis cluster was flushed and restored to a healthy state. All systems have now returned to normal operation and performance has stabilised.

  • Identified
    Identified
    • Engineering team notified and investigating root cause.

    • Checking:

      • Application service health across environments

      • API gateway / load balancer logs

      • Database performance and connection saturation

      • Recent deployments or infra changes

  • Investigating
    Investigating

    Multiple production environments are currently experiencing HTTP timeouts when accessing the application. Users are unable to load key pages, and the sites fail with a generic timeout screen.