This document serves as a Root Cause Analysis for the service interruption experienced by Ex Libris customers on Higher-Ed Platform NA07
The goal of this document is to share our findings regarding the event, specify the root cause analysis, outline actions to be taken to solve the downtime event, as well as preventive measures Ex Libris is taking to avoid similar cases in future.
Primo VE NA07
Service interruption was experienced by Ex Libris customers served by the Higher-Ed Platform NA07 instance at the Seattle Data Center between Jan 18, 2023, from 10:14 until 12:15 Seattle time, and between Jan 19, 2023, from 09:07 until 09:22 Seattle time
During this time frame the service was either slow or unresponsive for Primo VE. Due to the nature of the problem some customers were affected for the entire above durations while others were affected for only part of the above duration
Root Cause Analysis
Ex Libris engineers investigated this event to determine the root cause of this issue and concluded the following:
One of the search index shards was highly fragmented causing slowness and eventually service disruption.
Technical Action Items and Preventive Measures
Ex Libris has taken the following action and preventive measures to avoid such an occurrence in future:
· During the event Ex Libris Engineers performed a cluster restart to restore normal service
· Ex Libris Engineers rebuilt the fragmented cluster on Sat Jan 21, 2023
· Ex Libris Engineers are checking option to identify/prevent fragmentation ahead of time
ExLibris is committed to providing customers with prompt and ongoing updates during Cloud events. Ongoing and prompt updates on service interruptions appear in the system status portal at this address: http://status.exlibrisgroup.com/
These updates are automatically sent as emails to registered customers.