September 18 2018
Customers unable to access the Authentic8 service
32 minutes (14:14-14:46 UTC)
Access to Silo and Toolbox was slow or unavailable. Existing sessions may have expired. Access to webapps was unavailable.
Root cause analysis
Known issue exposed by an overly large Internal API request.
On 9/18/2018 at 14:08:00 UTC Authentic8 Operations staff began receiving alerts from monitoring that Internal API calls were taking longer than normal to return. Customers also began reporting issues connecting to the service. Additional systems that utilize the Internal API began reporting errors.
Operations began to investigate the issue and found that the Internal API systems were available at the Operating System level but the Internal API service itself was unresponsive. Database activity was also increased during this time, but the databases were responsive to other services and monitoring health checks.
After a brief discussion with the Operations team, it was determined that the best course of action was to restart the Internal API service.
At 14:38 UTC the restart began and completed roughly ten minutes later. Members of the Customer Success team began testing and confirmed that service has been restored.
Customers began reporting that service had been restored at 14:45 UTC.
Authentic8 is aware of the action that caused this issue and has taken steps to place an internal process to prevent them from happening in the future. A Customer Success Engineer, unintentionally, made a broadly impacting change which caused a large amount of Internal API activity.
Any maintenance event that includes broad changes will be performed after hours. Engineering is also performing root cause analysis and will prepare a fix so that large scale changes will not negatively impact the back end Internal API service.
Request for information from you
If you were impacted and your experience doesn’t match what was described, please share your experience with us.
Please let us know if you want to be notified when the engineering fix has been deployed.