Date

March 22, 2017

Description

A subset of customers were unable to access Silo

Duration

Resolved 2 hours after first incident report

Affected components

Silo customers who had implemented Machine Lock and using email address as login IDs were unable to access the service

Affected customers

Any Silo customer with Machine Lock enabled

Root cause analysis

A bug in a release completed on 3/21/2017 did not accurately process user login IDs for some customers that had Machine Lock configured.  


Event Log

On 2017-03-21 10:46 PM PDT – Operations finished deploying a new web gateway integration feature.  This release contained some enhancements around the Machine Lock feature.


On 2017-03-22 4:20 AM PDT – The first customer reports of users being unable to login reached the Support queue.  Customers were reporting issues through the support email alias and over the phone.  Customer Support began to investigate the issue.  Reports were triaged and acknowledged as they arrived.  


On 2017-03-22 5:00 AM PDT – A8 Customer Support contacted A8 Operations to further investigate.  All system monitoring reported normal operations with the overall infrastructure.  There were a large number of customers using the service during this time.  Initial tests to reproduce the issue failed so support requested additional details from impacted customers.


On 2017-03-22 5:30-6:00 AM PDT – A8 Operations and A8 Engineering discuss possible solutions and determine that a partial rollback of the database and API from the previous release is the safest resolution.  The error was an edge case that was not easily reproduced.


On 2017-03-22 6:00-6:16 AM PDT – Rollback commenced at 6:03 AM and took roughly 10 minutes to complete.  The service was smoke tested and passed.  Impacted customers were notified and asked to verify that the issue had been resolved.


All impacted customers reported back throughout the morning that the issue had been resolved.


Resolution

This bug impacted a subset of customers that have very sophisticated configurations and environments.  The Authentic8 QA team strives to cover as many of these scenarios as possible, however one scenario was not tested which would have caught the error.  


The specific configuration that was missed was an organization configured for Machine lock, using email addresses as the login identifier which matched the user’s email address.  Silo has the ability to honor several different login configurations which all passed testing.



Moving Forward

A8 Engineering and A8 Operations Teams are always researching ways to simulate customer environments so that most if not all scenarios can be appropriately tested and issues resolved before going to production with new releases.


As a result of this issue there will be a formal step each week to solicit direct feedback from the sales and customer support teams as well as review all customer configuration changes.


We apologize for any inconvenience this may have caused and appreciate the customer feedback that was shared while this incident was happening.