Summon - RCA - October 22, 25 and 30, 2018
This document serves as a Root Cause Analysis for the Summon service interruption experienced by Ex Libris customers on October 22, 25 and 30, 2018
The goal of this document is to share our findings regarding the event, specify the root cause analysis, outline actions to be taken to solve the downtime event, as well as preventive measures Ex Libris is taking to avoid similar cases in future.
Service interruption was experienced by Ex Libris customers served by theSummon instance at the North America Data Center during the following hours:
- October 22nd 2:20PM-5:20PM CT
- October 25th 9:50AM – 11:48 CT
- October 30th during 11:00AM-1:00PM CT
During those periods Summon search application was intermittently unavailable.
Ex Libris systems engineers immediately identified the cause of the issue and disabled some features in Summon in order to allow search functionality to work properly.
All features were re-activated after ~2 hours, once system stability could be assured.
Root Cause Analysis
Ex Libris Engineers investigated this event to determine the root cause analysis with the following results:
The issues were found to be the result of a load of new content in addition to high search traffic experienced during weekly update of Summon. This caused some search requests to return slower than usual or time out.
Technical Action Items and Preventive Measures
Ex Libris has taken the following action and preventive measures to avoid such an occurrence in future:
- The software was rolled back to a previous version to alleviate the issues for our clients while we research and test.
- R&D are currently both enhancing the Summon software to handle this situation in a way that protects Summon end users as well as researching what is possible to prevent the slowness that was seen with the Summon/Beacon connection during this period, particularly focusing on the new Summon data center in DC01.
ExLibris is committed to providing customers with prompt and ongoing updates during Cloud events. Ongoing and prompt updates on service interruptions appear in the system status portal at this address: http://status.exlibrisgroup.com/
- Optimized the content on Summon in order to reduce the impact of high search rate on performance. This helped reduce the duration of the performance degradation but has not resolved it entirely.
- Additional hardware has been added to Summon in order to further optimize performance and help prevent future performance degradation.
|November 6, 2018||Initial Publication|