Issues accessing app and API

Incident Report for FireText

Postmortem

At 13:43 on 09/04/2020 one of our database servers suffered a critical fault, resulting in requests not getting fulfilled. Automated attempts to recover and re-spawn the instance failed.

At 13:47 the team initiated manual attempts to mitigate affected traffic by isolating the affected db.

At 13:57 affected app and api services were restored, and messages were getting accepted again. As the changes were propagating, a small subset of requests will have still seen some timeouts in amongst the successes. These will have completely tailed off by 14:12.

A small backlog built up as services restarted and processes resumed. This resulted in delays for some customers.

Whilst the failed instance was being restored and re-synchronised, app users will have seen delays in seeing message reports although messages were being sent and delivered as normal.

By 14:57 the failed instance was back online, and all queues and delays cleared.

We would once again like to extend our apologies for this incident and the inconvenience caused, and would like to thank you for your patience. We take every incident extremely seriously. We review, evaluate and learn from issues in full in order to mitigate against them in the future. This includes the auto-failover and how we can minimise the time taken during a manual failover. The team were alerted and all positioned within seconds on this incident. As expected, the team continued to focus on the root cause of the critical fault established within this database, and are confident a solution has been applied and tested.

Posted Apr 10, 2020 - 13:43 BST

Resolved

This incident has now been resolved, and a postmortem will follow. We would like to apologise for any issues you may have experienced using the platform during this time.

Posted Apr 09, 2020 - 17:13 BST

Monitoring

A fix has been implemented and we are monitoring the systems.

Posted Apr 09, 2020 - 14:59 BST

Identified

The issue has been identified and a fix is in progress.

Posted Apr 09, 2020 - 13:59 BST

Investigating

We are investigating reports that some users are seeing requests to access the app and API not succeeding. We are investigating further, and will update here.

Posted Apr 09, 2020 - 13:52 BST

This incident affected: API and Application.