This document serves as a Root Cause Analysis for the European Data Center intermittent service interruption experienced by Ex Libris customers on October 1st, 2018.
The goal of this document is to share our findings regarding the event, specify the root cause analysis, outline actions to be taken to solve the downtime event, as well as preventive measures Ex Libris is taking to avoid similar cases in future.
A service interruption was experienced by Ex Libris customers served by the European Data Center during the following hours:
October 1st, 2018 from 9:55 AM until 10:08 AM Amsterdam time
October 1st, 2018 from 10:44 AM until 12:51 PM Amsterdam time
During the event intermittent performance issues were encountered on multiple systems.
The intermittent service interruption will be calculated as a full system down in the Exlibris availability reports.
Root Cause Analysis
Ex Libris Engineers investigated this event to determine the root cause with the following results:
After a thorough investigation done by Exlibris experts and the relevant external vendor, the root cause was found to be a bug within the European Datacenter Load Balancer device.
The bug affected the SSL\TLS component engine which did only processed SSL\TLS connections intermittently to the Ex Libris applications.
Other connections, which did not use SSL\TLS, continued to work as usual.
To overcome the bug during the event, Ex Libris performed a failover of the traffic to the redundant network and redirected a part of the traffic to an additional Load Balancer device.
Technical Action Items and Preventive Measures
Ex Libris has taken the following action and preventive measures to avoid such an occurrence in future:
- Ex Libris performed a fail over of the traffic to the redundant network and redirected a part of the traffic to an additional Load Balancer device.
- Ex Libris engineers are working with the Load Balancer vendor to find a mitigation to this bug and will implement the fix across the board as soon it will become available.
- Ex Libris engineers are working with the Load Balancer vendor to receive official RCA.
- ExLibris is committed to providing customers with prompt and ongoing updates during Cloud events. Ongoing and prompt updates on service interruptions appear in the system status portal at this address: http://status.exlibrisgroup.com/These updates are automatically sent as emails to registered customers.
- To allow customers an opportunity to discuss the disruption, review the RCA and hear more about the technical details a webex session will be held on October 11th.
|October 7, 2018||Initial Publicaiton|