Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Alma EU00 Instance RCA April 17 and May 4,11,13 2016

    Confidential Information, Disclaimer and Trade Marks

    Introduction

    This document serves as a Root Cause Analysis for the Alma service interruption experienced by Ex Libris customers on April  17 and May 4,11 and 13 - 2016.

     

    The goal of this document is to share our findings regarding the event, specify the root cause analysis, outline actions to be taken to solve the downtime event, as well as preventive measures Ex Libris is taking to avoid similar cases in future.

    Event Timeline

    Service interruption was experienced by Ex Libris customers served by the Alma EU00 instance at the Europe Data Center during the following hours:

     

    April  17,2016  from 11:32 AM until 11:58 AM  Central Europe time zone (CET) .

    May    4,2016   from 5:50  PM until  5:55 PM   Central Europe time zone (CET) .

    May  11,2016   from 8:29  AM until  8:52 AM   Central Europe time zone (CET) .

    May  13,2016   from 2:31  PM until  2:39 PM   Central Europe time zone (CET) .

    During the event, the service was unavailable for the environment.

    Root Cause Analysis

    Ex Libris Engineers investigated this event to determine the root cause analysis with the following results:

    • The Database had suffered a short network disconnect. The disconnection had triggered a failover to the redundant database, which had failed to start.

    Technical Action Items and Preventive Measures

    Ex Libris has taken the following action and preventive measures to avoid such an occurrence in future:

    • Ex Libris Engineers are working closely with the Database vendor experts to identify the problematic trigger for the disconnection. Several configuration change options are being looked at and will be tested in the lab before being implemented in production.
    • Following the later events and consultant with the Database vendor experts, Ex Libris Engineers  have done hardware upgrades to the server on which the Database  resides on.
    • With the Database vendor experts recommendation, a configuration change had been performed to allow a higher tolerance of the system to network disconnects.

    Customer Communication

     

    ExLibris is committed to providing customers with prompt and ongoing updates during Cloud events. Ongoing and prompt updates on service interruptions appear in the system status portal at this address: http://status.exlibrisgroup.com/

    • Was this article helpful?